Thanks @Ariel Cheng for helping a lot in refining the idea, with her thorough understanding of FEP
I am not claiming to explain depression fully by the theory, it is a probably wrong mechanistic model explaining maybe just a tiny fraction of depression etiology, there are many more biological explanations that may apply better to many cases.
Intro:
Depression is often (usually implicitly) conceived of as "fixed priors" on the state of oneself and the world, with an overly pessimistic bias. Depressed people's views are considered to be a mere product of a "chemical imbalance" (which chemical? serotonin almost certainly not[1]). The standard psychotherapeutic treatment of depression, CBT, is based on this idea; Your problems are cognitive distortions, and by getting into a better epistemic state about them, they diminish.
However, depressive realism seems to hold for at least some cognitive tasks, and increased activity of the same neurotransmitter appears to mediate both the effects of many "cognitive enhancers" (nootropics) and depression. This may be explained by depression being an attractor state achieved by pathologically increased learning rate.
In this text, I propose a theory of the mechanism behind this connection, using mostly an Active Inference model of the mind.
TL;DR (by GPT-5.1):
basic idea
FEP posits that any self-organizing system (like a human) must act to resist increasing entropy to preserve itself. In information theory terms, this means the agent must continually minimize Suprisal (the negative log evidence of its sensory observations).
Computing suprisal () is intractable for a brain, since it can't know the summation of all possible causes for a sensation. So, the brain minimizes a tractable proxy: the Variational Free Energy (VFE).
The VFE () is an upper bound on surprisal. Mathematically, it decomposes like this:
This equation gives the brain two mechanism to stay alive:
But you cannot minimize VFE directly through action, because you cannot control the present instant. You can only control the future.
This requires Expected Free Energy (EFE):
To minimize Surprisal over time, the agent "rolls out" its generative model into the future and selects policies () that minimize the VFE expected to occur.
When you unpack this, the EFE drives two competing behaviors:
Standard RL agent theory usually separates the world-state (is) from the reward function (ought). Active Inference reduces this distinction by using the same "currency" for utility and epistemic value- prediction error (PE). In this framework, desires are just probability distributions- specifically, priors over observations ().
In standard RL, the agent has a state space and a separate reward function. The agent checks the state, consults the reward function, and computes a policy.
The brain (in the FEP framework) just has a generative model of what it expects to happen.
The cost function is simply the probability of the observation:
If you "want" to be warm, your brain implies a generative model where the prior probability of observing a body temperature of (around) is extremely high. which is the basic mechanism behind life-preserving homeostasis.
In a standard Bayesian update, if you observe you are cold, you should update your prior to expect coldness. The reason why this doesn't happen, is that the deep, homeostatic priors (temperature, oxygen) are not standard beliefs.
Mathematically, this means that the parameters of these innate prior distributions – encoding the agent’s expectations as part of its generative model – have hyperpriors that are infinitely precise (e.g., a Dirac delta distribution) and thus cannot be updated in an experience dependent fashion.
Because the hyperprior is a Dirac delta, the agent cannot update its expectation of what its temperature "should" be based on experience. No matter how long you stand in the snow, you will never "learn" that hypothermia is your natural state. The prediction error between the fixed prior () and the sensory reality () remains essentially infinite, forcing the system to minimise VFE the only way left: by acting to heat the body up.
While generally encodes these fixed preferences, beliefs about hidden states, , often encode epistemic beliefs. The deeper you go in the hierarchy, further from immediate sensory input, the more these p(s) distributions begin to resemble stubborn preferences or identity/self-concepts, and the slower they are to update.
In this post, when I talk about priors/beliefs/desires, it means this hierarchy of expectations, where the deepest layers act as the immovable "oughts" that the agent strives to fulfill.
For example, an agent with an abnormally high learning rate might have the prior of "I am worthy/competent", but a single failed exam might update it to "I am incompetent/dumb/worthless". This depressed state becomes an attractor, because the brain, aiming to minimize prediction error, subsequently filters and discounts positive data to confirm the new, negative self-belief.
The neurotransmitter acetylcholine (ACh) is present both in all parts of the CNS and in the PNS. In the brain, there are two classes of receptors for acetylcholine; the nicotinic receptors (the target of nicotine), and muscarinic receptors, both of which are known for having central roles in memory-formation and cognition, as well as (indirectly) being aware targets of common Alzheimer's disease medication.
In the 1950s, the correlation of increased central ACh and depression was discovered, and in the 70s it was formalised as the cholinergic-adrenergic hypothesis of mania and depression[2]. Later, experimental increase in central acetylcholine has been shown to induce analogues of depression in animal models, such as "learned helplessness"[3].
The cholinergic (affecting ACh receptors) system is also the target of many "cognitive enhancers", such as the first explicitly labelled "nootropic" piracetam, as well as nicotine. The mechanism of these cholinergic nootropics has been proposed by Scott Alexander, Steven Byrnes, and firstly Karl Friston, to be an increase in something called "learning rate" in ML, and "precision" (of bottom-up transmission) in the Free Energy Principle approach to neuroscience[4]. In essence, this parameter, encoded by ACh, determines how "significant" the currently perceived signals are, and thus how significantly they may "override" prior models of the perceived object/situation. In ActInf terms, the prediction error in bottom-up signal is made more significant[5], independent of the actual significance of the "wrongness" in one's prior understanding of the given sensed situation. Since prediction error may be perceived as suffering/discomfort, this seems relevant to the dysphoria[6], that is part of depression.
This is similar to the concept of Direction of Fit, where the parameter is [mind-to-world/world-to-mind]. In other words, how strongly one imposes their will to change the world when perceived data conflicts with their desires (~low sensory precision), as opposed to “The signals I perceive differ significantly from my prior beliefs, so I must change my beliefs” (~high sensory precision).
In another model, ACh can be viewed as strengthening the arrows from the blue area, causing the "planning" part to be relatively less influenced by the green (goal) nodes 32 and 33, whereas dopamine is doing the opposite (which suggests the proposed tradeoff between the "simulation" being more "epistemically accurate" vs. "will-imposing").
More handwavily, if agency is time travel, ACh makes this time travel less efficient, for the benefit of better simulation of the current state of the world.
The post assumes that desires and epistemic priors are encoded in a similar way in the brain (explained in the previous section), and a state of high acetylcholine signalling is thus able to "override" not only prior beliefs, but also desires about how the world (including the agent) should be, leading to loss of motivation and goals (long-term and immediate, even physiological needs in severe cases), compromising a part of the symptomatics of depression.
There is also some evidence for ACh modulating updating on aversive stimuli specifically[7][8], as well as acetylcholineesterase (the enzyme breaking down ACh) increasing in the recovery period after stress[9] (suggesting the role of ACh as a positive stress modulator). However, it seems too unclear, so I'll assume for the rest of the post that ACh modulates precision on ascending information (PEs), in general.
Dopamine is a neurotransmitter belonging to the class of catecholamines (together with (nor)-epinephrine) and, more broadly, to the monoamines (with serotonin).
Dopamine (DA) seems to be the reward signal created by the human's learned "reward function", coming from the striatum. In Big picture of phasic dopamine, Steve Byrnes proposes this idea in more detail. In short, there is a tonic level of dopaminergic neuron activation, and depending on whether the current situation is evaluated as positive or negative by the brain, more or less dopaminergic firing will occur than at baseline. At the same time, this reward mechanism applies to internally generated thoughts and ideas on potential actions. This is why dopamine-depleted rats will not put in any effort into finding food (but will consume it if placed into their mouth).
In this theory, dopamine is (very roughly and possibly completely incorrectly) the "top-down" signal enforcer; the mechanism for enforcing priors about oneself (which, according to FEP theory, are all downstream of the prior on one's continued existence). In ActInf literature, dopamine has the role of increasing policy precision , balancing bottom-up information precision.[10]
Overactivity of dopaminergic signalling (in certain pathways, certain receptors) leads to mania[11], and in different pathways, to psychosis[12]. Both seem somewhat intuitive; mania seems like the inverse of depression, as a failure to update realistically based on reality and instead enforcing grandiose ideas propagating top-down. Psychosis seems like the more "epistemic" counterpart to this - internally generated priors on the state of reality are enforced on perception, while bottom-up, epistemically-correcting signalling is deficient. If a psychotic person has a specific delusion or specific pattern/symbol that they are convinced is ever-present, pattern-matching will be extremely biased towards these patterns, enforcing the prior.
Then, should we just give dopamine agonists or amphetamines to depressed people?
Depression usually begins after, or during, some unpleasant life situation. This then leads to the adaptive increase in Acetylcholine and rumination, often reinforced by inflammatory cytokines, causing one to prefer to spend time withdrawn, passive. This adaptation has the role of enabling intense reevaluation and mistake-focused analysis to isolate what exactly one might have done wrong, causing this large clash of one's value function with reality.
In the modern environment, these unpleasant states can often be chronic stress, success anxiety, feeling left out of fully living, etc. If this is the case, enough and/or intense enough situations of failure (in career, socially, ..) can lead to this adaptive hypervigilance to mistakes and rumination, mediated by ACh, as well as expectation of uncertainty.[13]
This increases one's focus on past mistakes, but also on repeated situations where mistakes have occurred in the past. Since (as described before) this high-ACh state erodes confidence in top-down processing (such as values/goals/self-concepts), the observed situation, such as an exam, or a socially stressful situation, is already objectively perceived as being "out of one's control", as the human is less confident in their ability to impose their will on the situation, as opposed to the situation imposing its implications on the human's beliefs/aliefs.
This leads to a positive feedback cycle leading to withdrawal, passivity, pessimism about one's own abilities, etc.
This state seems consistent with the later evolutionary explanation, but usually leads to an inflexible and hard-to-escape attractor, making recovery quite hard. This may plausibly be explained by the fact, that in modern times, the specific "mistakes" leading to this cycle tend to be less tractable, or amplified by contrast to a global set of humans to compare oneself with.
In addition, the depressed state may in part be an adaptation to reduce dysphoria caused by constant prediction error. Specifically, as the world becomes perceived as unpredictable and uncontrollable, it is a simple fallback strategy to predict constant failure. While depression is often seen as a condition of intense suffering, dysphoria (the opposite of euphoria) is not a central symptom (as opposed to e.g. OCD or BPD). This may be because once one is already in a depressed state, the depression can become a sort of "comforting", predictable state, where at least the prediction "it will not get better" is getting confirmed by reality.
The lack of things (success, action, happiness, executive function) is easier to predict than their presence (including their presence to a normal degree - functioning existence is still more variable than a severely depressed state).
How might this be escaped?
[Minimizing prediction error] can be achieved in any of three ways: first, by propagating the error back along cortical connections to modify the prediction; second, by moving the body to generate the predicted sensations; and third, by changing how the brain attends to or samples incoming sensory input.
from Interoceptive predictions in the brain
Using notation from Mirza et al. (2019)[14]
Used notation:
εt = prediction error
Π(o) = sensory precision (inverse variance)
Π(μ) = prior precision
ζ = log-precision; ACh increases ζ → Π(o) = exp(ζ)
γ = policy precision (dopaminergic inverse temperature)
η_eff = effective learning rate induced by precision
G(π) = expected free energy of policy π
Variational free energy for a generative model , approximated as a density is:
Under a Gaussian predictive-coding formulation, and with sensory prediction errors
, free energy can be locally approximated as:
where is the sensory precision (inverse covariance) at time . “Complexity” collects the KL terms over states and any higher-level priors.
Gradient descent on yields the canonical update of sufficient statistics :
Increasing steepens the contribution of sensory prediction errors and thus
increases the effective learning rate , while increasing
stabilises by tightening priors .
Claim:
Acetylcholine primarily modulates the log-precision on ascending prediction errors,
so that
and high ACh corresponds to high sensory precision , producing a high effective learning rate .
Catecholamines (especially dopamine) encode policy precision and contribute to
the stability of higher-level priors (increasing ). Policies are inferred via
, where ) is expected free energy.
Thus:
Depression is characterised by
high (ζ↑ via ACh),
low,
low (DA↓) (but not extremely low, that would probably cause DDMs like Athymhomia[15])
This regime overweights bottom-up errors, underweights stabilising priors, and flattens
the posterior over policies . Small mismatches produce large belief-updates,
leading to unstable self-models, helplessness, anhedonia, and rumination.
Mania is characterised by
low (ζ↓),
high,
high (DA↑).
Prediction errors are underweighted, priors and policies become over-precise, and
becomes sharply peaked. This suppresses corrective evidence and produces
grandiosity, overconfidence, and reckless goal pursuit.
[source].
On ancestral timescales, encountering a persistent, catastrophic model failure (social defeat, resource collapse) justifies switching into a high‑ACh, high‑learning regime that suspends costly goal pursuit and reallocates compute to problem solving (analytic rumination), until a better policy emerges. The cost of false negatives (missing the hidden cause of a disaster) exceeded the cost of prolonged withdrawal; hence a design that forces extended search even when the cause is exogenous.
Hollon et al 2021 justifies long depressive episodes as evolutionarily adaptive because they force rumination & re-examination of the past for mistakes.
One might object that such rumination is merely harmful in many cases, like bereavement from a spouse dying of old age—but from the blackbox perspective, the agent may well be mistaken in believing there was no mistake! After all, an extremely bad thing did happen. So better to force lengthy rumination, just on the chance that a mistake will be discovered after all. (This brings us back to RL’s distinction between high-variance evolutionary/Monte Carlo learning vs smarter lower-variance but potentially biased learning using models or bootstraps, and the “deadly triad”.)
from 'Evolution as a backstop for Reinforcement Learning' by Gwern[16]
This is related to the model of depression as sickness behaviour; an adaptive behaviour caused by an increase in inflammatory cytokines (which are also implicated in depression)[17], causing social withdrawal, physical inactivity and excessive sleep.
This might serve a dual role - giving the immune system the opportunity to focus on combating the pathogen in case of infection, and when combined with increased ACh, allowing the mind to focus on ruminating about how one might have done things differently to avoid failures/mistakes committed.
Depressed patients' sleep tends to have a higher proportion of REM sleep and REM deprivation (REM-D) has been found to be an effective treatment for depression.[18] The standard medications for depression (SSRIs, SNRIs, DNRIs, MAOis,...) increase REM latency and shorten its duration (by preventing the decrease of monoamines necessary for REM sleep to occur), effectively creating REM sleep deprivation, which may be a possible mechanism of their effectiveness.[19] (Interestingly, it doesn't seem like the significantly reduced amount of REM sleep due to SSRI usage causes any severe cognitive side effects.)
How this relates to the theory:
REM sleep (when most dreaming occurs) is characterized by high ACh and relative monoaminergic silence (NE/5‑HT/DA strongly reduced). If ACh scales precision on ascending signals, what does it do in REM when there is no external sensory stream? It amplifies the precision of internally generated activity, treating spontaneous (often related to that day's memories) cortical/hippocampal patterns as high‑fidelity “data,” while weakened monoaminergic tone relaxes top‑down priors. Acetylcholine in REM sleep is theorized to function as following;
"Cholinergic suppression of feedback connections prevents hippocampal retrieval from distorting representations of sensory stimuli in the association neocortex".
This seems to suggest that REM sleep funcions essentially as the stage of sleep in which most new/prior memories are not consolidated (as happens in slow-wave-sleep), but rather the space is given for "learning" of new (synthetic) information, without interference from existing models. This happens during waking life when ACh is high, but during dreaming this process is radically upregulated, while there is an absence of external stimuli. (Karl Friston explains this as REM sleep portraying the basal "theatre of perception", which in waking life updates based on sensory information, but during dreaming, the generative "virtual reality model" exists by itself, to be refined for the next time it's used for waking perception).[20]
In Active Inference terms, REM is a regime where precision on ascending (internally generated) errors is high and priors are pliable; the model explores and re‑parameterizes itself by fitting narratives to internally sampled data. If the waking baseline is already ACh‑biased and monoamine‑depleted (the depressed phenotype), REM further erodes stabilizing priors about one's values and self. If REM sleeps dominates compared to slow-wave-sleep, more space is given to increasing uncertainty related to dream subjects (which may be related to the previous day's experiences), rather than consolidating existing priors.[21]
Acetylcholine causes updating based on prediction errors - the learning occurs in uncertain situations, when the agent needs to be hyperaware of possible mistakes that are expected to happen (or have happened, as in the case of rumination).[23] Long-term potentiation (LTM) or long-term depression (LTD) are more likely to occur, in existing synaptic connections.[24]
BDNF, on the other hand, stimulates the creation of entirely new synapses and maintains the survival of existing neurons, such as in the hippocampus. BDNF expression tends to be decreased in depressed individuals, and hippocampal volume usually seems to be lowered.
This enables the emergence of "local plasticity" leading into the depression-phenotype attractor state, while "global plasticity" is lowered. Synapses in the hippocampus die, while a small subset gets continually amplified.
In FEP, the type of learning that's facilitated by BDNF might be structure learning, specifically bayesian model expansion[25], though I have not read much about this.
It seems like Sequences-style epistemic rationality favours a state similar to the high-ACh state described above. There appears to be a divide between the Rationalist and the Bay-area-startup-founder archetypes, the former of which is notably identified with the "doomer", while the other wishes to "accelerate", not worrying about risks.
In addition, it seems like many of the ones closest to the former camp tend to either become disillusioned with their work (such as MIRI dissolving its research team) or switch into the other camp, starting work in AI capabilities research (thus moving right on the diagram).
While I don't have actual data, it anecdotally seems to me like depression is quite common among lesswrongers and is to some extent connected to the emphasis on careful epistemic rationality (through the relative downregulation of policy precision and upregulation of ascending information precision).
It would be foolish to propose taking anticholinergics and dopaminergics because of this; rather, it seems good to be aware of the potential emotional fragility of a highly cautious, high-learning-rate state and the tradeoff that might exist between motivation (limbic dopamine activity driving up policy precision) and learning rate - amphetamines may not necessarily make you smarter/wiser.
Most importantly: avoid nootropics such as acetylcholinesterase inhibitors (huperzine, galantamine), piracetam, Alpha-GPC, CDP choline, ... (anything cholinergic) when depressed.
Potentially effective alternative nootropics:
Targeting ACh receptos- Anticholinergics:
Targeting sleep - REM deprivation:
Targeting sickness behaviour - antiiflamatory drugs:
Targeting dopamine:
Targeting BDNF:
many many others
The topic of why SSRIs and other serotonergics work is very vast; it may involve indirect increase of a neuroplastogen, increase in the GABAergic neurosteroid allopregnanolone, decrease in inflammatory cytokines, 1A serotonin receptor activation decreasing substance P release, sigmaergic agonism, nitric oxide inhibition, REM sleep suppression (mentioned in text), and many more.
Ketamine seems to act by redirecting glutamate from NMDA receptors (which it blocks) to AMPA receptors (which upregulate neuroplasticity). Glutamate in general is quite important in depression.
The HPA axis, specifically overproduction of CRF, which promotes cortisol release, is really important to depression too.[39]
The body also has it's endogenous "dysphoriant" (unpleasant-qualia-inducer), called dynorphin, which is quite understandably linked to depression.[40]
The trace amine phenethylamine (PEA) seems to also be implicated in depression[41], it acts as a sort of endogenous amphetamine and is increased in schizophrenia[42], so maybe it plays a large part in what I attribute to dopamine in the post. Selegiline, mentioned above, inhibits its breakdown.
That is to say, acetylcholine and dopamine are far from being the sole factors in depression, and targeting them may mean not targeting the central factor in some people's depression. Nonetheless, it seems useful not to ignore this model when thinking about depression, as some high-impact interventions that are otherwise ignored depend on this model (or the underlying True Mechanism).
SSRIs increase intersynaptic serotonin acutely, but take weeks to have an effect - there must be something other than serotonin increase going on.
Janowsky et al. (1972)
more detail in: The Role of Acetylcholine Mechanisms in Affective Disorders
https://en.wikipedia.org/wiki/Disorders_of_diminished_motivation, Athymhormia being a severe variant, where motivation is so low it destroys even motivation to move (https://en.wikipedia.org/wiki/Athymhormia)
An interesting anecdotal report of @Emrik using these facts about REM sleep to increase their sleep efficiency.
unlikely the reader hasn't already heard of this research...