I think this can be a useful experiment to disabuse people of the idea that the LLM is accurately reporting its internal states via its text output. Clearly that's not what it does, and this can be a good way to show that.
I'm not so sure that this is a good demonstration of non-consiousness, though. As in all arguments of this type, my first test is to ask, "is this something that humans also do?" And to this, I think the answer is "Yes". Humans do often confabulate their inner states when questioned about what they were thinking, and of course that doesn't disprove that we're conscious.
I think this can be a useful experiment to disabuse people of the idea that the LLM is accurately reporting its internal states via its text output. Clearly that's not what it does, and this can be a good way to show that.
I'm not so sure that this is a good demonstration of non-consiousness, though. As in all arguments of this type, my first test is to ask, "is this something that humans also do?" And to this, I think the answer is "Yes". Humans do often confabulate their inner states when questioned about what they were thinking, and of course that doesn't disprove that we're conscious.