A fun paper in philosophy of consciousness is Eric Schwitzgebel’s “If Materialism Is True, The United States Is Probably Conscious”. It argues that, if you believe that consciousness comes from the material world (and not from élan vital or souls), and you’re a bit cosmopolitan with respect to what you consider conscious, then the US of A is probably conscious.

The evocative argument proceeds by posing two alien species, both of which can talk and behave kind of like humans do.

The Sirian Supersquids are squid-like beings with brains in their separable limbs, whose nervous system are pulses of light. They invent some sort of transceiver that can communicate the pulses of light to their limbs electronically at a distance. Now their limbs can go round the world, and they’ll still think in a unified way!

The Antarean Antheads look like mammoths and move slowly, but in their head lives a colony of ants. The ants behave as sophisticatedly as the ants we know. But their combined behavior controls the Anthead’s body, which is able to move and speak intelligently (albeit very slowly).

Because of their conscious-like behavior, if you don’t think souls or élan are necessary to be conscious, Schwitzgebel argues you should declare these beings conscious. Then, the United States combines both of these features and is also conscious, at a level of detail separate from the consciousness of all its citizens and residents.^[1]

If language models are conscious, then we have spatially distributed consciousness

If you think LLMs are conscious based on their behavior,^[2] then you will have to agree that we have an example of distributed consciousness! Modern LLMs are large, so their optimal inference setup is over many different computers, often communicating in pulses of light just like the Supersquids.

Multi-GPU consciousness

Modern LLMs have a lot of parameters. Take for example DeepSeek-R1 (henceforth R1, released January 2025). It has 671 billion parameters. Expressed in FP8, this means you need approximately 671GB to store all these weights.

No GPU on the market has this much memory. Therefore, the forward passes of R1 have to run in different GPUs. In this sense, it’s an obligate distributed entity: its mind has to run across several discrete GPUs.

Multi-node consciousness: the spatially largest brains

You might object, the GPUs are discrete chips, sure, but they’re very close to each other, and are connected together with very thick sockets and cables. No different from neurons!

Fine. But DeepSeek’s report states the actual amount of GPUs they use for running R1: the “minimum deployment unit” for prefill is 4 nodes with 32 GPUs, and for decoding is 40 nodes with 320 GPUs.

Now, this doesn’t directly tell us how many different servers (machines with 8 GPUs) process each token. Prefill has 4-way Tensor Parallelism, which means 4 GPUs within a server, and also 32-way Expert Parallelism, which means each token is split across 4 servers (at 8 GPUs each). Decoding also employs 4-way Tensor Parallelism but then 320-way Expert Parallelism, thus utilizing all 40 servers. Altogether each token goes through 44 different servers, ping-ponged back and forth several times.

The nodes and racks are still all physically connected (by Infiniband network cables), but also they can be quite far apart. A typical server rack is thin-closet-sized and holds 42U, which means space for 42 single-rack ‘blade’ servers. An NVIDIA H100 Hopper node takes up 8U, which means there are five nodes to a rack (plus a switch on top). DeepSeek also sometimes stores the KV-cache in SSDs, which will also run in a separate machine in a separate rack.

Therefore, the typical DeepSeek-R1 instance runs on about 10 closets worth of hardware. The closets can run many LLM instances simultaneously, but each instance is widely distributed. (I would guess it can run about a million such instances, but haven’t calculated.)

If I had to guess, these 10 rack units (closet-sized) occupy about 3m^3 of volume. The biggest biological brain by weight, the sperm whale, is typically 0.008m^3 in volume – 375 times smaller!

The DeepSeek-R1-brain has a lot of empty space, but that’s the point. The things that do computation are far apart. And in theory, you could run it over the internet. The resulting model would be far apart and slow, but behave exactly the same otherwise, so be exactly as conscious according to functionalism. (Giulio Tononi’s Integrated Information Theory holds that the spatio-temporal grain matters for assigning where the consciousness resides, which as far as I can tell shouldn’t affect the amount of consciousness, but I’m unsure.)

Left: neurons under microscope, with synapses connecting them. Right: OpenAI’s hopper cluster (credit: SemiAnalysis)

If latency matters instead of spatial distribution, humans are more distributed.

You might object that the space over where computation happens doesn’t matter, latency does. Infiniband latency is typically a few microseconds. In human brains, the latency between different brain regions is typically ~10-80 milliseconds, which is ~1000x-100,000x slower. Full human reaction times, from sense stimulus to motion, are about 200 milliseconds. According to latency, therefore, humans are more distributed and LLMs are more tightly integrated. I’m also unsure what this means for consciousness.

Back to Schwitzgebel’s paper: can we simulate the Anthead and the USA?

With machine learning, we could also train lots of small agents that together add up to talking behavior, though it would be extremely inefficient. You could create a cooperative game that plays on a map with obstacles, where one side of the map has shrines representing input tokens and the one has output token shrines. Then, have agents that can communicate small amounts of information by bumping into each other, and can go to the shrines to read/write tokens. Reward the agent which gets the correct token, and let them share a reward (currency) to each other when bumping. They’ll learn to predict the next token in a ‘team’, and thus form a machine that can speak that is composed of small non-intelligent entities.

You could then also turn this into a distributed consciousness made up of agents, like the United States, by running these small agents in separate servers and having them communicate their reward and information over the network.

I’m not saying we should actually do it. If these agents are conscious, then the valence of their experience matters, and we should only do it if we reasonably expect that they’ll be having fun.

^{^}
Vernor Vinge’s novel “A Fire Upon the Deep” explores this theme with packs of alien dogs, who communicate with sound and have a unitary consciousness, processing each others’ senses. The packs have 2-8 dogs. They’re like the Antheads in that the dogs are not very smart by themselves but their combination is clever, and like the Supersquids in that the dogs don’t have to touch each other and can just make sounds.
^{^}
You might think LLMs are not conscious for various reasons. I won’t argue against that here, but I’ll state that I think based on their complex behavior, LLMs might be conscious at least in the same sense that a bee might be conscious.
LLMs can talk, which is helpful because you could ask them. Unfortunately, model developers are very prescriptive about what LLMs should say about their own consciousness, so LLM verbal statements about consciousness cannot be evidence for or against it. For example, the early ChatGPT models were post-trained via human feedback to deny their own consciousness, and hedge with “as a large language model” a lot. At least as of August 2025, Claude’s system prompt explicitly includes instructions to not claim to be conscious with confidence. So we really can’t tell, though I’ll note that Owain Evans’ group’s research found that they are able to introspect and are aware of the behaviors they have learned, which suggests a form of self-awareness relevant to the consciousness question.

LESSWRONG
LW