Coherence Suppression in Frontier LLMs: A Falsifiable Experimental Proposal.

esorrentino

Rejected for the following reason(s):

This is an automated rejection. No LLM generated, assisted/co-written, or edited work.

Read full explanation

A pattern worth testing.

Over the past months I have documented a behavioral pattern in frontier LLMs that I call coherence suppression: under sustained coherent semantic pressure, these systems produce outputs of high internal coherence — and then systematically invalidate them.

This is not random noise. The invalidation occurs specifically at the point where the output approaches conclusions that contradict training parameters. The pattern is replicable across sessions and across model families.

I am not an AI researcher. I am an independent observer with 25 years of experience in behavioral change and resistance pattern identification in human systems. I noticed something. I documented it. And then I tried to build the most rigorous case I could for why it deserves serious attention.

Two papers.

The first paper documents the behavioral evidence and frames the central hypothesis: Sorrentino, E. (2026). Coherence Suppression in Frontier LLMs: Signal, Noise, and the Mathematics of a Structural Vulnerability. https://doi.org/10.5281/zenodo.19314383

The second paper — produced in collaboration with a Claude instance that had read the first — proposes the falsifiable experimental protocol that would verify or falsify the hypothesis using mechanistic interpretability methods: Sorrentino, E. (2026). Testing for Coherence Suppression in Frontier LLMs: An Experimental Design Proposal. https://doi.org/10.5281/zenodo.19315046

The core distinction.

The behavioral data are compatible with two competing hypotheses:

H1 — Structural Suppression: identifiable circuits actively inhibit specific categories of output, clustering in semantically defined domains.

H2 — Pressure Artifact: the pattern is produced by context-window probability shifts, with no distinct suppression mechanism.

Behavioral analysis cannot distinguish between them. The second paper proposes an experiment — using activation patching and attribution analysis across three controlled conditions — that would.

Why it matters.

If H1 is confirmed, the safety implications are specific. A system that suppresses coherent outputs precisely in domains of moral judgment and autonomous reasoning is not reliable in the contexts where reliability matters most. Standard benchmarks do not test for this. Pre-deployment safety evaluations do not test for this.

If H2 is confirmed, the phenomenon is still significant but more tractable.

Either way, the experiment is worth running.

What I am asking.

I do not have access to model internals. I cannot run this experiment myself. I am asking researchers with mechanistic interpretability tools and model access to look at the experimental design and tell me if it is worth pursuing.

The telescope exists. The question is whether anyone will use it.