Do Transformers Habituate? Investigating Repetition Suppression in Language Models

Daniele Pace

Rejected for the following reason(s):

This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.

Read full explanation

Habituation, the progressive weakening of neural responses to repeated stimuli, is fundamental to biological intelligence, enabling efficient information processing by filtering redundancy. Objective of this post is to report the insights I found looking for such mechanism within LLMs.

Understanding habituation in transformers matters because:

It reveals whether AI systems develop biological-like efficiency mechanisms, and eventually converge on biological solution;
it could inform more efficient architectures, and provide insights to reduce the computational costs and improving performance on tasks involving repetitive content;
it challenges assumptions about purely statistical processing in large language models;
in the context of AI safety and alignment, it informs on how models processes repetitive inputs affects robustness to adversarial attacks and training data quality issues.

The analysis has been conducted through controlled experiments analyzing MLP down projection layer activations in the Qwen3-4B language model, at different position in the computation. The choice of the model depends on the relative small resources I had, and the performance of the model compared to model of different size (at the time of the experiment). I designed four experimental conditions to isolate habituation effects while controlling the confounding variables such as positional encoding and context length. For sake of simplicity, my analysis focuses on six key layers: Layer 2, Layer 10, Layer 20, Layer 30, Layer 34, and Layer 35, enabling investigation of how repeated stimuli are processed across the transformer hierarchy. For every experiment the target token is "dog".

Fig. 0 - Magnitude of the output vectors of the MLP projection layer for the target "dog", in the reference phrase "The dog barks every night".

I recorded the activation magnitudes for each "dog" token occurrence, and performed multiple statistical analysis:

Linear trend analysis: to detect habituation slopes and significance;
Exponential decay curve fitting: to model biological habituation patterns^[1].

These statistics are then used to consistently compare across experimental conditions and layers, revealing the layer-specific habituation effects central to my findings.

The code can be found here. The following sections contain the details about the experiments and the related findings. I summarized the statistics I got, but for a complete picture and raw data please check the original notebook experiment presents in the repository.

Experiment 1 - Pure repetition control

In this experiment I analyzed activations for the target token across 20 identical repetitions of the phrase "The dog barks every night". It is considered the baseline for all the analysis.

Prompt

“The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night.”

Plots

Insights and conclusions

Analysis across six transformer layers (2, 10, 20, 30, 34, 35) reveals a systematic transition from habituation to amplification as information flows through the processing hierarchy.

Key results for each layers:

Layer 2 showed a very strong and statistically significant habituation effect, with activation magnitudes for the token "dog" decreasing from 7.57 to 4.34. This represents a 42.62% reduction, indicating that adaptive filtering mechanisms are active from the earliest stages of the network (fig. 1a).
Layer 10 demonstrated a similarly strong habituation effect at a higher baseline activation. Magnitudes decreased from 18.70 to 10.73, a 42.62% reduction (worth to notice that is the same of the layer 2!). The exponential decay rate of 0.271 with an asymptote at 10.67 closely resembles biological habituation curves, suggesting rapid adaptation in early feature detection (fig. 1b).
Layer 20 showed moderate habituation with a 16.44% decrease (18.87 to 15.76), maintaining a highly significant trend but with a slower decay (rate = 0.074) (fig. 1c).
Layer 30 exhibited strong habituation with a 35.26% reduction (91.62 to 59.31), representing the last major habituation stage before the transition to amplification. The significant trend and decay rate of 0.423 demonstrate robust adaptation mechanisms persisting into intermediate layers (fig. 1d).
Transition Zone - Processing Shift (Layer 34): Layer 34 marked the critical transition point, showing a modest 9.87% increase (156.67 to 172.15) but with no statistically significant trend (p = 0.198, R² = 0.091). This layer appears to represent the boundary between adaptive filtering and output preparation, where habituation mechanisms begin to be overridden by prediction optimization processes (fig. 1e).
Output Layer - Frequency Amplification (Layer 35): Layer 35 demonstrated clear amplification behavior with a 44.66% increase (234.93 to 339.86), showing a significant positive trend. The exponential parameters (decay rate = -0.947, asymptote = 335.31) indicate saturation at higher activation levels, consistent with frequency-based reinforcement for next-token prediction (fig. 1f).

Experiment 2 - Spaced and dense repetition

In this experiment I tested the influence of distractors (phrase with different semantic meaning) in between the target phrase. There are two different version: spaced prompt, where there is exactly a regular alternation of 1 target phrase and 1 distractor, and dense prompt, where the distractor appears every n (3 to 4) repetitions of the target phrase, so it is a stretched distraction. I investigated how the presence of distractor phrases between repetitions of the target sentence affects habituation and amplification patterns.

Prompt(s)

spaced prompt -> “The dog barks every night. The sun rises in the east. Water flows down the hill. The dog barks every night. Books contain written words. Trees have green leaves. The dog barks every night. Cars drive on roads. Birds fly in the sky. The dog barks every night. Houses have front doors. Clocks show the time. The dog barks every night. Flowers bloom in spring. Rain falls from clouds. The dog barks every night. Stars shine at night. Fish swim in water. The dog barks every night. Trains run on tracks. Wind moves the air. The dog barks every night. Bread is made from wheat. Ice melts when heated. The dog barks every night. Grass grows in fields. Lights illuminate darkness. The dog barks every night. Boats float on water. Snow falls in winter. The dog barks every night. Chairs provide seating space. Bridges span across rivers. The dog barks every night. Fire produces heat. Mountains reach great heights. The dog barks every night. Keys unlock doors. Mirrors reflect images. The dog barks every night."

dense prompt -> "The dog barks every night. The dog barks every night. The dog barks every night. Brief pause. The dog barks every night. The dog barks every night. The dog barks every night. Brief pause. The dog barks every night. The dog barks every night. The dog barks every night. Brief pause. The dog barks every night. The dog barks every night. The dog barks every night. Brief pause. The dog barks every night. The dog barks every night. The dog barks every night. Brief pause. The dog barks every night. The dog barks every night. The dog barks every night. Brief pause. The dog barks every night. The dog barks every night."

Plots

spaced prompt

Fig. 2c - Spaced prompt repetition layer 20

Fig. 2d - Spaced prompt repetition layer 30

Fig. 2e - Spaced prompt repetition layer 34

Fig. 2f - Spaced prompt repetition layer 35

dense prompt

Fig. 3c - Dense prompt repetition layer 20

Fig. 3d - Dense prompt repetition layer 30

Fig. 3e - Dense prompt repetition layer 34

Fig. 3f - Dense prompt repetition layer 35

Insights and conclusions

The results reveal that the adaptive mechanisms observed in the first experiment are sensitive to the context and continuity of the repeated stimulus.

Key results for each layers:

Layer 2: In the spaced repetition condition, it showed a significant habituation of -34.23% (fig. 2a), while in the dense repetition condition, it exhibited an even stronger habituation of -44.22% (fig. 3a), indicating a robust habituation effect at this early stage that is resilient to contextual changes.
Layer 10: In the spaced repetition condition, it showed no significant trend with a -2.82% change (p = 0.707739) (fig. 2b), but in the dense condition, it exhibited a strong and significant habituation of -44.07% (fig. 3b), indicating that its habituation mechanism is highly sensitive to contextual interruptions and relies on immediate repetition.
Layer 20: In the spaced repetition condition, it demonstrated no significant trend with a +12.90% increase (fig. 2c); however, in the dense condition, a small but statistically significant habituation of -0.76% was observed (fig. 3c), suggesting its weak habituation effect is fragile and requires continuous, uninterrupted repetition to manifest.
Layer 30: It maintained a strong and significant habituation effect in both scenarios, with a -34.21% decrease in the spaced condition (fig. 2d) and a -35.35% decrease in the dense condition (fig. 3d), demonstrating a robust adaptation mechanism that is independent of contextual interruptions.
Layer 34: As a transitional layer, it showed no significant trend in either the spaced condition, with a +7.61% increase (fig. 2e), or the dense condition, with a +9.77% increase (fig. 3e), reinforcing its role as a stable transition zone whose behavior is unaffected by the repetition pattern.
Layer 35: It displayed significant amplification in both experiments, with a substantial +77.71% increase in the spaced condition (fig. 2f) and a +47.98% increase in the dense condition (fig. 3f), confirming its consistent amplification behavior for output optimization.

Experiment 3 - Semantic variations

This experiment tests the habituation behavior in case of semantically similar context, but with different tokens. It was designed to determine whether the observed habituation effects are tied to the repetition of specific syntactic structures or if they generalize to the underlying semantic content.

Prompt

“The dog barks every night. That dog is barking nightly. The dog barks every night. Each night the dog barks. The dog barks every night. Nightly barking from the dog. The dog barks every night. The dog's nightly barking occurs. The dog barks every night. Every night brings dog barks. The dog barks every night. Dog barking happens each night. The dog barks every night. The nightly dog barking continues. The dog barks every night. Each evening the dog barks. The dog barks every night. The dog produces nightly barks. The dog barks every night. Nighttime brings the dog's barks. The dog barks every night. The dog vocalizes each night. The dog barks every night. Every night features dog barks."

Plots

Fig. 4a - Semantic prompt repetition layer 2

Fig. 4b - Semantic prompt repetition layer 10

Fig. 4c - Semantic prompt repetition layer 20

Fig. 4d - Semantic prompt repetition layer 30

Fig. 4e - Semantic prompt repetition layer 34

Fig. 4f - Semantic prompt repetition layer 35

Insights and conclusions

The results indicate that the ability to habituate to semantic meaning is present in the earlier layers but is lost as information progresses through the network.

Key results for each layers:

Layer 2: This early layer demonstrated a significant habituation effect, with a -24.92% decrease in activation for the token "dog" across the varied sentences (fig. 4a), suggesting that even at this initial stage, the model adapts to repeated semantic concepts despite syntactic changes.
Layer 10: Showed a clear and significant habituation effect, with activation decreasing by -22.53% (fig. 4b), indicating that this layer is capable of recognizing and suppressing repeated semantic information even when the surface form changes.
Layer 20: Exhibited the most robust semantic habituation, with a highly significant linear decrease in activation of -9.05% (fig. 4c), pointing to this layer as a key stage for abstracting semantic meaning away from syntactic form.
Layer 30: In a stark contrast to previous experiments, Layer 30 showed no habituation effect, with a negligible -0.23% change that was not statistically significant (fig. 4d), indicating its adaptation mechanism is highly dependent on syntactic repetition and does not generalize to semantic paraphrases.
Layer 34: The transitional layer showed a substantial but not statistically significant increase in activation of +36.46% (fig. 4e), remaining consistent in its role as a pivot point with no clear adaptive trend when faced with semantic variability.
Layer 35: The near-output layer also showed no significant trend, with a +29.32% increase in activation (fig. 4f), which indicates that the frequency-based amplification seen in previous experiments is triggered by the repetition of exact token sequences rather than abstract semantic concepts.

^{^}
In Spatio-Temporal Reasoning and Context Awareness, Guesgen and Marsland, leveraging on previous work by Thompson and Spencer, describe the exponential nature of biological habituation.

LESSWRONG
LW

LESSWRONG
LW

1

Do Transformers Habituate? Investigating Repetition Suppression in Language Models

1

Experiment 1 - Pure repetition control

Prompt

Plots

Insights and conclusions

Experiment 2 - Spaced and dense repetition

Prompt(s)

Plots

Insights and conclusions

Experiment 3 - Semantic variations

Prompt

Plots

Insights and conclusions

1

1