The Stability of Understanding: What Compression Decay Reveals About LLMs

Rejected for the following reason(s):

This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.

Read full explanation

I have been exploring a simple idea:
real understanding should survive compression.

Humans do this without thinking.
Give someone a detailed explanation, then condense it into bullet points, then shrink it again into a hint. If they actually understand the underlying concept, they can still reason with it, even when most of the surface wording is gone.

Models behave very differently.
Once I noticed this, I could not ignore it.

To get a clearer picture, I built an evaluation I call CDCT, the Compression Decay Comprehension Test. The setup is straightforward. Take a piece of information, compress it step by step using various reductions, and ask the model the same question at each stage. Then observe how its comprehension erodes.

Here is the part that surprised me:

Larger models do not necessarily degrade more gracefully.
Some of them lose understanding faster than much smaller ones.

This kept happening. A model that performs extremely well when the input is clean and fully specified can suddenly fall apart as soon as a small amount of structure is removed. Meanwhile, some smaller models stay stable far deeper into the compression sequence.

At that point, it became clear that scale is not behaving the way people assume.
Here is one of the clearest illustrations:

Figure: Comprehension Stability (CSI) vs Model Scale
(Values shown are the mean CSI across tasks. The trend line slopes slightly downward. Larger models do not retain conceptual stability better than smaller ones.)

For those interested in exploring the results interactively, I've put together a simple dashboard summarizing the key metrics and trends from these experiments: https://cdct-web-ranking.onrender.com/.

This pattern leads to a different intuition about what scale is actually buying us:

Scale improves surface-level competence, not the stability of the underlying concepts.

If a model has truly formed an internal representation, that representation should not collapse the moment the phrasing becomes thinner or more compressed. When it has not, you see an abrupt drop. I call this a comprehension cliff. CDCT exposes these cliffs directly, and once you see them, a lot of model behavior begins to make more sense.

This also raises a bigger point.
If understanding has a stability profile, and that profile does not correlate with scale in the way we expect, then evaluating models only on clean-context accuracy misses something fundamental.

A few directions I am currently exploring:

Comprehension stability might reveal how reliably a model generalizes when the input is messy or incomplete.
Smaller models may be trainable to be more stable than much larger ones.
Conceptual integrity under compression may be a better signal of “understanding” than many existing benchmarks.

The current working draft of the paper is here:
👉 https://zenodo.org/records/17528428
I am keeping the implementation private while I finish the full write-up, but the core phenomenon is stable and reproducible. If anyone wants to try CDCT on their own models, I am open to sharing details privately.

I am interested in what others think.
Does this way of probing model understanding resonate with you, or does it look like an artifact of measurement that only appears significant because people have not looked at it before?

Either way, I am continuing down this path. There is more here.

LESSWRONG
LW

LESSWRONG
LW

1

The Stability of Understanding: What Compression Decay Reveals About LLMs

1

1

1