Written as part of MATS 7.1. Math by Claude Opus 4.6.
I know that models are able to represent exponentially more concepts than they have dimensions by engaging in superposition (representing each concept as a direction, and allowing those directions to overlap slightly), but what does this mean concretely? How many concepts can "fit" into a space of a given size? And how much would those concepts need to overlap?
This felt especially relevant working on SynthSAEBench, where we needed to explicitly decide how many features to cram into a 768-dim space. We settled on 16k features to keep the model fast - but does this lead to enough superposition to be realistic? Surely real LLMs have dramatically more features and thus far more superposition?
As it turns out: yes, 16k features is plenty! In fact, as we'll see in the rest of this post, 16k features in a 768-dim space actually leads to more superposition than trillions of features in a 4k+ dim space, as is commonly used for modern LLMs.
Personally I found the answers to this fascinating - high dimensional spaces are extremely mind-bending. We'll take a geometric approach and try to answer this question. Nothing in this post is ground-breaking, but I found thinking about these questions enlightening. All code for this post can be found in on Github or Colab.
Quantifying superposition
First, let's define a measure of how much superposition there is in a model. We'll use the metric mean max absolute cosine similarity, , defined as follows:
This metric represents a "worst-case" measure of superposition interference for each vector in our space. It's answer the question: on average, what's the most interference (highest absolute cosine similarity) each vector will have with another vector in the space?
Superposition of random vectors
Perfectly tiling the space with concept vectors is challenging, so let's just consider the superposition from random vectors (We'll see later that this is already very close to perfect tiling). If we have random unit-norm vectors in a -dimensional space, what should we expect to be? We can try this out with a simple simulation.
Simulation picking N random vectors in a d-dimensional space, and calculating superposition
We vary from 256 to 1024, and from 4,096 to 32,768 and calculate , showing the results above. This is still very small-scale though. Ideally, we'd like to know how much superposition we could expect with billions or even trillions of potential concepts, and that's too expensive to simulate. Fortunately, we can find a formula that we can use to directly calculate without needing to actually run the simulation.
Calculating directly
We can compute the expected exactly (up to numerical integration) using the known distribution of cosine similarity between random unit vectors.
For two random unit vectors in , the squared cosine similarity follows a Beta distribution: . This means the CDF of is the regularized incomplete beta function:
For each vector, its max absolute cosine similarity with others has CDF (treating pairwise similarities as independent, which is excellent for large ). The expected value of a non-negative random variable gives us:
We can calculate this integral using scipy.integrate. Let's see how well this matches our simulation:
The predicted values exactly match what we simulated!
Scaling to trillions of concepts
Let's use this to see how much superposition interference we should expect for a some really massive numbers of concepts. We'll go up to 10 trillion concepts (10^13) and 8192 dimensions, which is the hidden size of the largest current open models. 10 trillion concepts seems like a reasonable upper bound for the max number of concepts that could conceivably be possible, since that would be roughly 1 concept per training token in a typical LLM pretraining run.
10 trillion concepts in 8192 dimensions has far less superposition interference than just 100K concepts in 768 dimensions (the hidden dimension of GPT-2)! That's a 100,000,000x increase in number of concept vectors! Even staying at a given dimension, increasing the number of concepts by 100x doesn't really increase superposition interference by all that much.
At least, I found this mind-blowing
What if we optimally placed the vectors instead?
Everything above assumes random unit vectors. But what if we could arrange them optimally — placing each vector to minimize the worst-case interference? Would we do significantly better?
The intuition is that each vector "excludes" a spherical cap around itself, and we're counting how many non-overlapping caps fit on the unit sphere in .
When (which holds for all practical settings — even in gives ), we can Taylor-expand:
which gives:
This is exactly the leading-order term of the random vector formula! So random placement is already near-optimal — there's essentially nothing to gain from clever geometric arrangement of the vectors, at least for the and values relevant to real neural networks.
This is a remarkable consequence of high-dimensional geometry: in spaces with hundreds or thousands of dimensions, random directions are already so close to orthogonal that you can't do meaningfully better by optimizing.
What does this mean for SynthSAEBench-16k?
At the start, I mentioned that we used 16k concept directions in a 768-dim space for the SynthSAEBench-16k model. So is this enough superposition interference?
The answer is a resounding: yes. The SynthSAEBench-16k model has a of 0.14, which is still dramatically more superposition interference than 10 trilllion concept vectors in 8196-dim (the hidden dimension of Llama-3.1-70b). It's roughly equivalent to 1 billion concept vectors in a 2048-dim space (the hidden size of Gemma-2b).
Written as part of MATS 7.1. Math by Claude Opus 4.6.
I know that models are able to represent exponentially more concepts than they have dimensions by engaging in superposition (representing each concept as a direction, and allowing those directions to overlap slightly), but what does this mean concretely? How many concepts can "fit" into a space of a given size? And how much would those concepts need to overlap?
This felt especially relevant working on SynthSAEBench, where we needed to explicitly decide how many features to cram into a 768-dim space. We settled on 16k features to keep the model fast - but does this lead to enough superposition to be realistic? Surely real LLMs have dramatically more features and thus far more superposition?
As it turns out: yes, 16k features is plenty! In fact, as we'll see in the rest of this post, 16k features in a 768-dim space actually leads to more superposition than trillions of features in a 4k+ dim space, as is commonly used for modern LLMs.
Personally I found the answers to this fascinating - high dimensional spaces are extremely mind-bending. We'll take a geometric approach and try to answer this question. Nothing in this post is ground-breaking, but I found thinking about these questions enlightening. All code for this post can be found in on Github or Colab.
Quantifying superposition
First, let's define a measure of how much superposition there is in a model. We'll use the metric mean max absolute cosine similarity, , defined as follows:
This metric represents a "worst-case" measure of superposition interference for each vector in our space. It's answer the question: on average, what's the most interference (highest absolute cosine similarity) each vector will have with another vector in the space?
Superposition of random vectors
Perfectly tiling the space with concept vectors is challenging, so let's just consider the superposition from random vectors (We'll see later that this is already very close to perfect tiling). If we have random unit-norm vectors in a -dimensional space, what should we expect to be? We can try this out with a simple simulation.
We vary from 256 to 1024, and from 4,096 to 32,768 and calculate , showing the results above. This is still very small-scale though. Ideally, we'd like to know how much superposition we could expect with billions or even trillions of potential concepts, and that's too expensive to simulate. Fortunately, we can find a formula that we can use to directly calculate without needing to actually run the simulation.
Calculating directly
We can compute the expected exactly (up to numerical integration) using the known distribution of cosine similarity between random unit vectors.
For two random unit vectors in , the squared cosine similarity follows a Beta distribution: . This means the CDF of is the regularized incomplete beta function:
For each vector, its max absolute cosine similarity with others has CDF (treating pairwise similarities as independent, which is excellent for large ). The expected value of a non-negative random variable gives us:
We can calculate this integral using
scipy.integrate. Let's see how well this matches our simulation:The predicted values exactly match what we simulated!
Scaling to trillions of concepts
Let's use this to see how much superposition interference we should expect for a some really massive numbers of concepts. We'll go up to 10 trillion concepts (10^13) and 8192 dimensions, which is the hidden size of the largest current open models. 10 trillion concepts seems like a reasonable upper bound for the max number of concepts that could conceivably be possible, since that would be roughly 1 concept per training token in a typical LLM pretraining run.
10 trillion concepts in 8192 dimensions has far less superposition interference than just 100K concepts in 768 dimensions (the hidden dimension of GPT-2)! That's a 100,000,000x increase in number of concept vectors! Even staying at a given dimension, increasing the number of concepts by 100x doesn't really increase superposition interference by all that much.
What if we optimally placed the vectors instead?
Everything above assumes random unit vectors. But what if we could arrange them optimally — placing each vector to minimize the worst-case interference? Would we do significantly better?
From spherical coding theory, the answer is: barely. The minimum achievable max pairwise correlation for optimally-placed unit vectors in dimensions is given by the spherical cap packing bound:
The intuition is that each vector "excludes" a spherical cap around itself, and we're counting how many non-overlapping caps fit on the unit sphere in .
When (which holds for all practical settings — even in gives ), we can Taylor-expand:
which gives:
This is exactly the leading-order term of the random vector formula! So random placement is already near-optimal — there's essentially nothing to gain from clever geometric arrangement of the vectors, at least for the and values relevant to real neural networks.
This is a remarkable consequence of high-dimensional geometry: in spaces with hundreds or thousands of dimensions, random directions are already so close to orthogonal that you can't do meaningfully better by optimizing.
What does this mean for SynthSAEBench-16k?
At the start, I mentioned that we used 16k concept directions in a 768-dim space for the SynthSAEBench-16k model. So is this enough superposition interference?
The answer is a resounding: yes. The SynthSAEBench-16k model has a of 0.14, which is still dramatically more superposition interference than 10 trilllion concept vectors in 8196-dim (the hidden dimension of Llama-3.1-70b). It's roughly equivalent to 1 billion concept vectors in a 2048-dim space (the hidden size of Gemma-2b).