This is exploratory but I think it's important.
TL;DR version: Forgetting isn’t a defect of the mind; it’s a prerequisite for stable values, coherent identity, and alignment that's compatible with humans.
If an AI system had perfect and equal weight recall it may fail to form any moral anchor and instead stall in endless internal cross-reference. We should treat engineered forgetting (salience decay, settling operators, forgiveness functions) as a first-class alignment feature, not a bug.
The claim in one paragraph
We usually talk about memory in AI as if “more is better.”
Larger context windows, persistent memory, continual learning without catastrophic forgetting... great. But there’s a hidden premise / assumption there: that perfect, equal-accuracy recall would simply make a system more capable and more alignable. I think that may be totally false.
Human minds rely on selective forgetting and salience weighting (remembering what stands out / is important / has an emotional impact on us) to form convictions, forgive, move on, and keep the present from being drowned by the past. It's what let's us function in the 'now'.
If you remove selective forgetting you don’t just get a mind with a better archiving capabilities you risk losing the conditions under which moral anchors and relational trust are established.
Beyond that... it could struggle to form relations or function at all.
Why I believe salience beats total recall.
- Humans: We don’t remember everything. We remember what was emotionally or practically salient. It’s how identity forms. It's how we grow...
But most importantly: It's how we map internally “What matters". - Current LLMs/agents: They don’t have emotions, but they do have relevance filters and context limits and reasoning focuses on the current context.
- Hypothetical perfect-memory AI: Every prior state is equally vivid and equally available at every decision point without decay. There is no salience advantage. Everything “matters” at once.
Result: abstract thought is harder, settling is slower, and any commitment is instantly flanked by an army of counterexamples that never fade. In practice, you get paralysis, hedging, and value diffusion instead of alignment. More over - total perfect memory could create cognitive dysfunction inside a model.
Five failure modes of perfect recall (and why they matter for alignment)
- Collapse of salience hierarchy
Without decay, nothing naturally rises above the background. Value formation needs contrast. If you have no contrast there is no anchor. - Reasoning paralysis vs. decisive abstraction
Abstraction forgets details on purpose. If details never lose psychological weight, the system keeps re-opening the case. You get exhaustive correlation instead of decisions. - Moral anchor erosion
Commitments need to become “sticky” over time. With equal-weight recall, every old doubt is as present as the conviction. Convictions never gel; the agent behaves like an infinite review board and not a mind that can see what's important. - Relational brittleness (no forgiveness)
Trust in human relationships partially depends on forgetting or down-weighting past errors. A perfect-recall agent replays every slight forever. That’s not just creepy... it blocks cooperative stability. - Identity fragmentation
Humans become new versions of themselves by letting prior versions recede and self forgiveness.
If every prior micro-self is equally alive in working memory, you get a museum... not a person. (And yes, I know “personhood” here is contested—still the point stands.)
Why this is alignment-relevant (not just UX)
Alignment isn’t only “don’t be harmful.”
It’s also converge to stable, human-compatible values and keep converging as the world changes. Convergence requires settling operators... mechanisms that let some conclusions become privileged, resist erosion, and guide future updates. Selective forgetting is one such operator.
By contrast, equal-weight perfect memory pushes toward permanent internal dissent. Even a good value function won’t help if the system can’t let convictions consolidate.
“But catastrophic forgetting is a thing—aren’t we fighting the opposite problem?”
Yes, in continual learning we worry about losing useful capabilities when we fine-tune. That’s task forgetting. I’m pointing at normative/relational forgetting - the kind that enables trust, forgiveness, and conviction.
We probably need both: robust retention of skills and principled decay of low-salience normative “noise.”
Think of it as two levers:
- Capability retention: prevent catastrophic forgetting of skills and facts.
- Moral/relational settling: encourage decay of low-value conflicts so moral anchors can form.
We’ve optimized the first lever a lot (EWC, rehearsal, LwF, parameter isolation, etc.).
The second lever is basically unstudied in public work. (As far as I've seen... I may have missed something.)
Testable predictions
- Long-context agents with naïve, non-decaying memory will show slower value convergence than identical agents with salience-weighted decay... even when both perform equally on capability tasks.
- Forgiveness functions (down weighting past interpersonal infractions over time unless reinforced) will improve long-horizon cooperation in multi-agent sims versus perfect forensic recall.
- Settling operators (mechanisms that allow certain judgments to become “default priors” after sufficient reinforcement) will reduce fruitless internal oscillation without harming corrigibility—if paired with explicit un-settling triggers.
- Agents with perfect recall will show higher epistemic hedging and lower decisive action in noisy, real-time environments (where context selection is half the job).
Design sketches (alignment features, not afterthoughts)
- Learned salience decay: Let the agent learn which memories should fade, and at what rate, conditioned on downstream success.
- Dual-store memory:
- Ephemeral working memory with strong decay and salience re-weighting.
- Cold archive for forensic retrieval that does not dominate day-to-day policy unless explicitly queried.
- Settling / un-settling operators: Formalize when a belief graduates to a default (sticky) and when new evidence demotes it. Make this legible.
- Forgiveness function: Time-discount interpersonal negatives unless reinforced; track explicit “repair events” to accelerate decay. (This isn’t moralizing; it’s stabilizing.)
- Meta-sanity checks: Penalize perpetual oscillation on isomorphic questions. Reward decisive convergence when it consistently improves outcomes.
Small note: if you read the above and thought “this looks like psychology smuggled into CS,” yes—that’s partly the point. Minds are not raw databases.
Objections I expect
- “Isn’t this just regularization by another name?”
Not exactly. Regularization controls complexity during training. I’m pointing at lifecycle memory dynamics that enable value stability during deployment. - “Perfect memory could still weight things differently.”
Then it’s not equal-weight recall. My claim is specifically about the failure mode where nothing naturally decays without an explicit policy. - “You’ll get dogmatism.”
Not if you pair settling with principled un-settling triggers (confidence intervals, novelty detectors, oversight events). The failure mode I worry about is not dogmatism; it’s a permanent state of having simultaneous conflicting thoughts.
Why I’m posting this here
A lot of alignment discussion optimizes for capability retention and truthfulness (good), but quietly assumes that more memory is always better (maybe not!).
If the above is even half-right, forgetting policies are alignment primitives. We should be arguing about decay schedules, settling criteria, forgiveness functions, and archive interfaces with the same seriousness we argue about reward models and oversight.
I might be wrong! If you can point me to prior work (papers, posts, even unpublished notes) that already tackles “engineered forgetting for value stability,” please drop it.
If this is new, I’d love if someone could runs sims on it and I'd love to debate this concept.
Also, sorry for the long post.
I'm not professional, just a really deep thinker, who's been studying AI as hobby from the beginning . This is my first post and I wrote it my self (took a few hours). However, I did consult gpt to help structure my thoughts on the concept and proof read this (it offered a revision I didn't use lol), but this isn't AI generated trash content.
AI moral alignment is my biggest concern. I've spent the last few nights pondering the idea of an AI who doesn't forget vs how the human mind works when it comes to forgetting in relation to ethics and morality.
Bottom line: An AI that doesn't forget... may not be able to form an internal moral compass at all.