Small question/concern whether this accuracy can be attributed to "introspection" or something that we wouldn't call introspection. Depending on the injected concept, I could see it being far from introspection. I'm unsure what concepts where injected but I would find it plausible that some could cause the accuracy independent of the instructions given to the LLM. For example, a concept that would \emph{always} result in the LLM generating the index of the sentence it is located in, regardless of introspection task. Is there a way to control for such things?
Small question/concern whether this accuracy can be attributed to "introspection" or something that we wouldn't call introspection. Depending on the injected concept, I could see it being far from introspection. I'm unsure what concepts where injected but I would find it plausible that some could cause the accuracy independent of the instructions given to the LLM. For example, a concept that would \emph{always} result in the LLM generating the index of the sentence it is located in, regardless of introspection task. Is there a way to control for such things?