Do Transformers Habituate? Investigating Repetition Suppression in Language Models
Habituation, the progressive weakening of neural responses to repeated stimuli, is fundamental to biological intelligence, enabling efficient information processing by filtering redundancy. Objective of this post is to report the insights I found looking for such mechanism within LLMs. Understanding habituation in transformers matters because: * It reveals whether AI systems develop biological-like efficiency mechanisms, and eventually converge on biological solution; * it could inform more efficient architectures, and provide insights to reduce the computational costs and improving performance on tasks involving repetitive content; * it challenges assumptions about purely statistical processing in large language models; * in the context of AI safety and alignment, it informs on how models processes repetitive inputs affects robustness to adversarial attacks and training data quality issues. The analysis has been conducted through controlled experiments analyzing MLP down projection layer activations in the Qwen3-4B language model, at different position in the computation. The choice of the model depends on the relative small resources I had, and the performance of the model compared to model of different size (at the time of the experiment). I designed four experimental conditions to isolate habituation effects while controlling the confounding variables such as positional encoding and context length. For sake of simplicity, my analysis focuses on six key layers: Layer 2, Layer 10, Layer 20, Layer 30, Layer 34, and Layer 35, enabling investigation of how repeated stimuli are processed across the transformer hierarchy. For every experiment the target token is "dog". Fig. 0 - Magnitude of the output vectors of the MLP projection layer for the target "dog", in the reference phrase "The dog barks every night". I recorded the activation magnitudes for each "dog" token occurrence, and performed multiple statistical analysis: * Linear trend analysis: to dete