The Stability of Understanding: What Compression Decay Reveals About LLMs

Rejected for the following reason(s):

This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.

Read full explanation

I've been thinking about a pretty straightforward idea: real understanding should hold up under compression.

Humans do this kind of thing instinctively. You can give someone a detailed explanation, then boil it down to bullet points, and finally to a quick hint. If they truly grasp the concept, they can still think through it, even when a lot of the wording is stripped away.

LLMs, on the other hand, behave quite differently. Once I realized this, I couldn't just brush it off.

To dig deeper, I created a test I call CDCT, which stands for Compression Decay Comprehension Test. The process is pretty simple: take some information, compress it step by step using different reductions, and then ask the model the same question at each stage. From there, you can see how well it maintains comprehension as things get more condensed.

Here’s the part that really caught me off guard:

Larger models don’t always lose understanding more gracefully. In fact, some of them start to lose comprehension faster than smaller models do.

This pattern kept cropping up. A model that does great with clean, fully detailed input can suddenly fall apart when even a little structure is taken away. Meanwhile, some smaller models manage to hold their ground much better as you compress the information.

At this point, it became clear that scale isn’t behaving as people usually think it does. Here’s a clear example:

Figure: Comprehension Stability (CSI) vs Model Scale

(The values represent the average CSI across different tasks. The trend line dips slightly downwards. Larger models don’t retain conceptual stability any better than smaller ones.)

For anyone curious about digging into the results, I’ve set up a simple dashboard that highlights the key metrics and trends from these tests: https://cdct-web-ranking.onrender.com/

This trend shifts the way I think about what scale really offers:

Scale boosts surface-level performance, but not the stability of the core concepts.

If a model has truly built an internal representation, that shouldn’t crumble as soon as the phrasing gets thinner or more reduced. When it hasn’t, you clearly see a significant drop in comprehension. I refer to this as a comprehension cliff. CDCT lays these cliffs bare, and once you recognize them, a lot of model behavior starts making more sense.

This also brings up a bigger issue. If understanding demonstrates a stability profile, and that profile doesn’t correlate with scale as we expect, then judging models just on their clean-context accuracy misses something crucial.

A few paths I’m currently exploring:

Comprehension stability might show how well a model can generalize when the input is messy or incomplete.
Smaller models may actually be trainable to be more stable than much larger ones.
The ability to maintain conceptual integrity under compression could be a better indicator of “understanding” compared to many existing benchmarks.

You can find the latest draft of my paper here: https://zenodo.org/records/17528428

I’m keeping the implementation under wraps while I wrap up the full write-up, but the key phenomenon is stable and reproducible. If anyone’s interested in trying CDCT on their own models, I’m open to sharing details privately.

I’m curious about what others think. Does this approach to measuring model understanding resonate with you? Or does it seem like just a measurement artifact that stands out only because it hasn’t been closely examined before?

Either way, I'm going to keep pursuing this. There’s definitely more to uncover.

LESSWRONG
LW

LESSWRONG
LW

1

The Stability of Understanding: What Compression Decay Reveals About LLMs

1

1

1