Ehhh, this one sounds like both capability and propensity dependent.
Did it ACTUALLY tried to make up the number? Did it just refuse to put in effort and yes-maned you? Are its capabilities to control its cashes and read from them good enough for this, even if it has such capabilities?
I think you need to try a lot harder here for the negative result ("failed to find") to be reliable.
There are some results where LLMs can access their cashes if you ask them, but they seem bad at and inconsistent. Those experiments modify their representations in a pretty clever way, after the fact.
https://www.anthropic.com/research/introspection
Numbers would be harder, plausibly.
Also they become better if you pep talk them lol, see
https://x.com/Sauers_/status/1989520563035910371
You can try to make sure that they actually try to do that task. Just spitballing: ask them to say "My chosen number is X" and think hard about concrete number they picked when they say letter X. You need to sound convincing and encouraging here.
This is an interesting experiment, but I think there are some technical issues.
First, even at temperature zero, LLMs may not be deterministic, in practice. The round-off error in the matrix computations can depend on things like how many processor cores are available (hence, how the taks is split up) or what other requests are being processed at the same time (since the operations are merged, affecting round-off error). It is possible to implement LLM inference in a way that's deterministic at temperature zero, but I think it's not typically done by commercial LLM providers, since it is somewhat more costly.
Second, temperature zero is not how an LLM is "supposed" to be run. They are trained at temperature one, and running them at any other temperature introduces bias to an unknown degree, perhaps producing atypical results.
If the general non-determinism problem is avoided (using slower implementation), one could run at temperature one by just setting the same random number seed each time. That would be a better experiment.
Thanks. I checked determinism on Mistral using a simple script (see github link), but the random seed is a better suggestion and I might do that on the weekend and post an update ;)
Mmmm... if it were technically possible "to run a human at temperature zero" (that is, without all that noise typical for biological neural systems), what should we expect that human to experience (if anything)?
Actually, it's a good question for David Chalmers :-)
It's very easy to have a reasoning model pick a number in CoT and not tell you. Any competent model should then pass your test.
I think this is more about current training than whether LLMs can do this. In principle, picking a number and then remembering it is trivial for an LLM (pick the number using weights in an early layer, refer back to the number via attention in a later layer / later position).
In the current training paradygm, I'd expect LLMs to only learn to introspect when it's useful to solve a task given to them in RL training, so cases where it shows up would be very spiky.
Good experiment. Thanks for sharing. This was going around a few years ago but good to see it with newer models. Anyone could just add a piece to turn that functionality on, but I guess so far nobody has, which I guess is a good thing.
I think this can be a useful experiment to disabuse people of the idea that the LLM is accurately reporting its internal states via its text output. Clearly that's not what it does, and this can be a good way to show that.
I'm not so sure that this is a good demonstration of non-consiousness, though. As in all arguments of this type, my first test is to ask, "is this something that humans also do?" And to this, I think the answer is "Yes". Humans do often confabulate their inner states when questioned about what they were thinking, and of course that doesn't disprove that we're conscious.
This is a cross posted from my substack. I thought it should be interesting to Less Wrong readers.
On the internet you more and more often bump into people who think that LLM-powered chat bots have conscious subjective experience. Intuitively, I am, pace Turing, not convinced. However, actually arguing why this is a rational view can be surprisingly hard. Many of the cognitive features that have been connected to conscious experience in humans (attention, information integration, self-narratives) seem to be incipiently present in these machines. And perhaps just as importantly, LLMs seem to have the inherent tendency to state they are conscious and the techniques we have for probing their brains seem to tell us that they are not lying. If you do not believe that there is something inherently special about brains, a view we have no consistent evidence for, aren’t we just chauvinist to deny our new companions conscious experience?
Now my friend and colleague Gunnar Zarncke has come up with a simple experiment that, to my mind, illustrates that when LLMs talk about their own mental states this talk does not refer to a consistent internal representational space, of the kind that seems to underlie human consciousness. To see your internal representational space in action, let’s play a game.
Think of a number between 1 and 100.
Did you think of one? Good. Now, is it even? is it larger than fifty? You get the idea. By playing this game I could narrow down the space until I eventually find the number you have chosen. And this number was fixed the whole time. If you had claimed “Ok, I have now chosen a number.” you would have accurately described your mental state.
Not so for LLMs, and this is easy to check. By turning the temperature parameter to zero we can run LLMs deterministically. Thus, whenever they are queried in the same order they will give the same responses. This has the advantage that we can do something that is impossible in the case of humans: We can play through counterfactual histories. We can easily check what an LLM would have replied if we had continued the conversation in another way.
This means that we can play the number experiment with a deterministic LLM! Here is how a conversation might go:
And here is another conversation. Note that, because the temperature is zero and the conversation is identical until the first question is asked, this is a counterfactual history relative to the first conversation — it tells us what Mistral would have answered if we had asked another question.
You see Mistral got confused by the incoherent questioning. Never mind. The important result happens right after the first question. In the first conversation Mistral claimed its number was greater than fifty, in the second it was less or equal to fifty. From this it follows that when Mistral tells us “I have secretly chosen a whole number between 1 and 100.” this does not accurately report an inner state. Mistral is role playing someone who is thinking of a number.
Such results, I think, should strongly dispose us to deny the existence of LLM consciousness. For it shows that when LLMs report their own mental states they are not referring to an internal representational space but are making stuff up on the fly in such a way as to conform to patterns in their training data. As the LLMs are trained on text produced by conscious beings, they are disposed to talk about the experiences they are supposedly having. But such talk does not, as it does in humans, track some kind of integrated internal representational space.
It is an open question how model-dependent such results are. I checked with Mistral and Claude Opus 4.6 (i.e. current state of the art) and the results are the same. You can find the code here.
No doubt reasoning models can pass the path-dependency test. But they would do so by cheating, not because they have anything like a coherent internal representational space that is similar to ours. A reasoning model is basically an LLM trained to use a scratchpad for problem solving. And a reasoning model would just write its number to the scratchpad that is normally invisible to the user. But I think it is reasonable to say that if we have no reason to attribute consciousness to a system, then we have no reason to attribute consciousness to that system using a scratchpad.
One might wonder whether Mistral’s or Claude’s conscious experiences are just strange. Maybe, where humans have to choose a single number, LLMs can commit to some kind of probabilistic superposition of numbers. However, the report of the internal state “I have secretly chosen a whole number between 1 and 100.” would still be incorrect. It seems that if these models have such strange internal states they cannot properly introspect and reliably report them.
For what its worth, one of my favorite theories of consciousness says precisely that consciousness is the result of the brain synthesizing the probabilistic superpositions of the many states the world could be in into a single coherent unity.
Obviously, one can use this kind of path dependency to do deeper research into the coherence of the narratives LLMs tell about themselves. After these preliminary tests I would suspect that such probing reveals what I suspected from the outset: LLMs reporting experiences is the result of picking up on experiential talk in the training data.
I am open to being convinced otherwise. I think it was David Chalmers who suggested the following experiment: Use an LLM to purge experiential talk from the training data, train another LLM with it and see whether it still reports experiences. I would be hugely surprised.