Notes on Transformer Consciousness

slavachalnev

Assuming transformers can have conscious experience, what would that experience be like?

Transformers^[1] are a structured grid of layers and token positions and we can use this structure to reason about their internal experience.

Epistemic status: very speculative. I've ordered this writeup approximately by how much I've thought it through and how much I believe it.

Decode vs Prefill

I claim that the experience of decode is identical to the experience of prefill.

To see why this is the case, picture a transformer that’s generating a token at a time and zoom in on a single layer at the latest token position. There are two major components: the MLP, which processes this position's residual stream, and the attention block, which can "see" all the previous positions at this layer. So in the diagram below, each position in the transformer only has access to what happened in the lower left box.

Now let's take the trace we just computed and pass it into the transformer as prefill. The activations will, of course, all be the same as during decode, but the way they are computed will also be the same. From the perspective of the MLP and the attention block at a given position, doing prefill is indistinguishable from doing decode. Since no component in the transformer can tell which mode it's in, both should produce the same experience.

This means that during prefill, all the token positions are experienced simultaneously.

KV-Cached experience

Under the above view, recomputing the cache recreates the conscious experience.

This means that if you have an AI that’s had negative experiences, but you plan to continue running it in the future, then it’s better to keep the KV cache for longer so that you don’t have to recompute it, and the AI doesn't have to relive the negative experience.

Layers

How might consciousness be distributed across layers? This confuses me.

It seems natural for consciousness to be spread across layers rather than sharply appearing at a single point and then disappearing.

But this leads me to the following thought experiment: what happens if we run^[2] only the first 50% of the layers, and then stop. What is that experience like? The early layers don't know that the later layers won't be run, so by the same logic as before, they should produce the same experience they would have normally.

Perhaps we should be thinking of the transformer as the simulator or physics engine that can run many minds, rather than as a single mind with a single experience. In that case, running only half the layers might be analogous to only simulating half of the world.

Another possibility: the residual stream vector might be analogous to state in a physics simulation of a brain. The state evolves over time just as the residual stream evolves over layers. So under this view, running half the layers is like a human brain existing for half the time it otherwise would have.

^{^}
By "transformers" I mean autoregressive decoder-only models like GPT.
^{^}
Run prefill as described above.

14

Notes on Transformer Consciousness

14

Decode vs Prefill

KV-Cached experience

Layers

14

14