This post was rejected for the following reason(s):
This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work. An LLM-detection service flagged your post as >50% likely to be written by an LLM. We've been having a wave of LLM written or co-written work that doesn't meet our quality standards. LessWrong has fairly specific standards, and your first LessWrong post is sort of like the application to a college. It should be optimized for demonstrating that you can think clearly without AI assistance.
So, we reject all LLM generated posts from new users. We also reject work that falls into some categories that are difficult to evaluate that typically turn out to not make much sense, which LLMs frequently steer people toward.*
"English is my second language, I'm using this to translate"
If English is your second language and you were using LLMs to help you translate, try writing the post yourself in your native language and using a different (preferably non-LLM) translation software to translate it directly.
"What if I think this was a mistake?"
For users who get flagged as potentially LLM but think it was a mistake, if all 3 of the following criteria are true, you can message us on Intercom or at team@lesswrong.com and ask for reconsideration.
you wrote this yourself (not using LLMs to help you write it)
you did not chat extensively with LLMs to help you generate the ideas. (using it briefly the way you'd use a search engine is fine. But, if you're treating it more like a coauthor or test subject, we will not reconsider your post)
your post is not about AI consciousness/recursion/emergence, or novel interpretations of physics.
If any of those are false, sorry, we will not accept your post.
* (examples of work we don't evaluate because it's too time costly: case studies of LLM sentience, emergence, recursion, novel physics interpretations, or AI alignment strategies that you developed in tandem with an AI coauthor – AIs may seem quite smart but they aren't actually a good judge of the quality of novel ideas.)
Habituation, the progressive weakening of neural responses to repeated stimuli, is fundamental to biological intelligence, enabling efficient information processing by filtering redundancy. Objective of this post is to report the insights I found looking for such mechanism within LLMs.
Understanding habituation in transformers matters because:
It reveals whether AI systems develop biological-like efficiency mechanisms, and eventually converge on biological solution;
it could inform more efficient architectures, and provide insights to reduce the computational costs and improving performance on tasks involving repetitive content;
it challenges assumptions about purely statistical processing in large language models;
in the context of AI safety and alignment, it informs on how models processes repetitive inputs affects robustness to adversarial attacks and training data quality issues.
The analysis has been conducted through controlled experiments analyzing MLP down projection layer activations in the Qwen3-4B language model, at different position in the computation. The choice of the model depends on the relative small resources I had, and the performance of the model compared to model of different size (at the time of the experiment). I designed four experimental conditions to isolate habituation effects while controlling the confounding variables such as positional encoding and context length. For sake of simplicity, my analysis focuses on six key layers: Layer 2, Layer 10, Layer 20, Layer 30, Layer 34, and Layer 35, enabling investigation of how repeated stimuli are processed across the transformer hierarchy. For every experiment the target token is "dog".
Fig. 0 - Magnitude of the output vectors of the MLP projection layer for the target "dog", in the reference phrase "The dog barks every night".
I recorded the activation magnitudes for each "dog" token occurrence, and performed multiple statistical analysis:
Linear trend analysis: to detect habituation slopes and significance;
Exponential decay curve fitting: to model biological habituation patterns[1].
These statistics are then used to consistently compare across experimental conditions and layers, revealing the layer-specific habituation effects central to my findings.
The code can be found here. The following sections contain the details about the experiments and the related findings. I summarized the statistics I got, but for a complete picture and raw data please check the original notebook experiment presents in the repository.
Experiment 1 - Pure repetition control
In this experiment I analyzed activations for the target token across 20 identical repetitions of the phrase "The dog barks every night". It is considered the baseline for all the analysis.
Prompt
“The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night. The dog barks every night.”
Plots
Fig. 1a - Pure repetition layer 2Fig. 1b - Pure repetition layer 10Fig. 1c - Pure repetition layer 20Fig. 1d - Pure repetition layer 30Fig. 1e - Pure repetition layer 34Fig. 1f - Pure repetition layer 35
Insights and conclusions
Analysis across six transformer layers (2, 10, 20, 30, 34, 35) reveals a systematic transition from habituation to amplification as information flows through the processing hierarchy.
Key results for each layers:
Layer 2 showed a very strong and statistically significant habituation effect, with activation magnitudes for the token "dog" decreasing from 7.57 to 4.34. This represents a 42.62% reduction, indicating that adaptive filtering mechanisms are active from the earliest stages of the network (fig. 1a).
Layer 10 demonstrated a similarly strong habituation effect at a higher baseline activation. Magnitudes decreased from 18.70 to 10.73, a 42.62% reduction (worth to notice that is the same of the layer 2!). The exponential decay rate of 0.271 with an asymptote at 10.67 closely resembles biological habituation curves, suggesting rapid adaptation in early feature detection (fig. 1b).
Layer 20 showed moderate habituation with a 16.44% decrease (18.87 to 15.76), maintaining a highly significant trend but with a slower decay (rate = 0.074) (fig. 1c).
Layer 30 exhibited strong habituation with a 35.26% reduction (91.62 to 59.31), representing the last major habituation stage before the transition to amplification. The significant trend and decay rate of 0.423 demonstrate robust adaptation mechanisms persisting into intermediate layers (fig. 1d).
Transition Zone - Processing Shift (Layer 34): Layer 34 marked the critical transition point, showing a modest 9.87% increase (156.67 to 172.15) but with no statistically significant trend (p = 0.198, R² = 0.091). This layer appears to represent the boundary between adaptive filtering and output preparation, where habituation mechanisms begin to be overridden by prediction optimization processes (fig. 1e).
Output Layer - Frequency Amplification (Layer 35): Layer 35 demonstrated clear amplification behavior with a 44.66% increase (234.93 to 339.86), showing a significant positive trend. The exponential parameters (decay rate = -0.947, asymptote = 335.31) indicate saturation at higher activation levels, consistent with frequency-based reinforcement for next-token prediction (fig. 1f).
Experiment 2 - Spaced and dense repetition
In this experiment I tested the influence of distractors (phrase with different semantic meaning) in between the target phrase. There are two different version: spaced prompt, where there is exactly a regular alternation of 1 target phrase and 1 distractor, and dense prompt, where the distractor appears every n (3 to 4) repetitions of the target phrase, so it is a stretched distraction. I investigated how the presence of distractor phrases between repetitions of the target sentence affects habituation and amplification patterns.
Prompt(s)
spaced prompt -> “The dog barks every night. The sun rises in the east. Water flows down the hill. The dog barks every night. Books contain written words. Trees have green leaves. The dog barks every night. Cars drive on roads. Birds fly in the sky. The dog barks every night. Houses have front doors. Clocks show the time. The dog barks every night. Flowers bloom in spring. Rain falls from clouds. The dog barks every night. Stars shine at night. Fish swim in water. The dog barks every night. Trains run on tracks. Wind moves the air. The dog barks every night. Bread is made from wheat. Ice melts when heated. The dog barks every night. Grass grows in fields. Lights illuminate darkness. The dog barks every night. Boats float on water. Snow falls in winter. The dog barks every night. Chairs provide seating space. Bridges span across rivers. The dog barks every night. Fire produces heat. Mountains reach great heights. The dog barks every night. Keys unlock doors. Mirrors reflect images. The dog barks every night."
dense prompt -> "The dog barks every night. The dog barks every night. The dog barks every night. Brief pause. The dog barks every night. The dog barks every night. The dog barks every night. Brief pause. The dog barks every night. The dog barks every night. The dog barks every night. Brief pause. The dog barks every night. The dog barks every night. The dog barks every night. Brief pause. The dog barks every night. The dog barks every night. The dog barks every night. Brief pause. The dog barks every night. The dog barks every night. The dog barks every night. Brief pause. The dog barks every night. The dog barks every night."
The results reveal that the adaptive mechanisms observed in the first experiment are sensitive to the context and continuity of the repeated stimulus.
Key results for each layers:
Layer 2: In the spaced repetition condition, it showed a significant habituation of -34.23% (fig. 2a), while in the dense repetition condition, it exhibited an even stronger habituation of -44.22% (fig. 3a), indicating a robust habituation effect at this early stage that is resilient to contextual changes.
Layer 10: In the spaced repetition condition, it showed no significant trend with a -2.82% change (p = 0.707739) (fig. 2b), but in the dense condition, it exhibited a strong and significant habituation of -44.07% (fig. 3b), indicating that its habituation mechanism is highly sensitive to contextual interruptions and relies on immediate repetition.
Layer 20: In the spaced repetition condition, it demonstrated no significant trend with a +12.90% increase (fig. 2c); however, in the dense condition, a small but statistically significant habituation of -0.76% was observed (fig. 3c), suggesting its weak habituation effect is fragile and requires continuous, uninterrupted repetition to manifest.
Layer 30: It maintained a strong and significant habituation effect in both scenarios, with a -34.21% decrease in the spaced condition (fig. 2d) and a -35.35% decrease in the dense condition (fig. 3d), demonstrating a robust adaptation mechanism that is independent of contextual interruptions.
Layer 34: As a transitional layer, it showed no significant trend in either the spaced condition, with a +7.61% increase (fig. 2e), or the dense condition, with a +9.77% increase (fig. 3e), reinforcing its role as a stable transition zone whose behavior is unaffected by the repetition pattern.
Layer 35: It displayed significant amplification in both experiments, with a substantial +77.71% increase in the spaced condition (fig. 2f) and a +47.98% increase in the dense condition (fig. 3f), confirming its consistent amplification behavior for output optimization.
Experiment 3 - Semantic variations
This experiment tests the habituation behavior in case of semantically similar context, but with different tokens. It was designed to determine whether the observed habituation effects are tied to the repetition of specific syntactic structures or if they generalize to the underlying semantic content.
Prompt
“The dog barks every night. That dog is barking nightly. The dog barks every night. Each night the dog barks. The dog barks every night. Nightly barking from the dog. The dog barks every night. The dog's nightly barking occurs. The dog barks every night. Every night brings dog barks. The dog barks every night. Dog barking happens each night. The dog barks every night. The nightly dog barking continues. The dog barks every night. Each evening the dog barks. The dog barks every night. The dog produces nightly barks. The dog barks every night. Nighttime brings the dog's barks. The dog barks every night. The dog vocalizes each night. The dog barks every night. Every night features dog barks."
The results indicate that the ability to habituate to semantic meaning is present in the earlier layers but is lost as information progresses through the network.
Key results for each layers:
Layer 2: This early layer demonstrated a significant habituation effect, with a -24.92% decrease in activation for the token "dog" across the varied sentences (fig. 4a), suggesting that even at this initial stage, the model adapts to repeated semantic concepts despite syntactic changes.
Layer 10: Showed a clear and significant habituation effect, with activation decreasing by -22.53% (fig. 4b), indicating that this layer is capable of recognizing and suppressing repeated semantic information even when the surface form changes.
Layer 20: Exhibited the most robust semantic habituation, with a highly significant linear decrease in activation of -9.05% (fig. 4c), pointing to this layer as a key stage for abstracting semantic meaning away from syntactic form.
Layer 30: In a stark contrast to previous experiments, Layer 30 showed no habituation effect, with a negligible -0.23% change that was not statistically significant (fig. 4d), indicating its adaptation mechanism is highly dependent on syntactic repetition and does not generalize to semantic paraphrases.
Layer 34: The transitional layer showed a substantial but not statistically significant increase in activation of +36.46% (fig. 4e), remaining consistent in its role as a pivot point with no clear adaptive trend when faced with semantic variability.
Layer 35: The near-output layer also showed no significant trend, with a +29.32% increase in activation (fig. 4f), which indicates that the frequency-based amplification seen in previous experiments is triggered by the repetition of exact token sequences rather than abstract semantic concepts.