x

LESSWRONG

LW

piotrm — LessWrong

piotrm

piotrm

Message

Head of AI at realmlabs.ai .

https://piotr.mardziel.com

1

1

9mo

piotrm

Head of AI at realmlabs.ai .

https://piotr.mardziel.com

Introspection via localization

Small question/concern whether this accuracy can be attributed to "introspection" or something that we wouldn't call introspection. Depending on the injected concept, I could see it being far from introspection. I'm unsure what concepts where injected but I would find it plausible that some could cause the accuracy independent of the instructions given to the LLM. For example, a concept that would \emph{always} result in the LLM generating the index of the sentence it is located in, regardless of introspection task. Is there a way to control for such things?