Trying to fix "Identity Drift" with a simple decay formula (because RAG isn't cutting it)

Schumzt

1 Trying to fix "Identity Drift" with a simple decay formula (because RAG isn't cutting it)

by Schumzt

7th Dec 2025

1 min read

0

1

Rejected for the following reason(s):

Formatting.

Read full explanation

I've been building long-context agents for a while, and honestly, the current state of "memory" is frustrating. No matter how huge the context window gets, the agent eventually hallucinates or forgets who it is.

Just las week, I had an agent that was explicitly instructed to be polite and use honorifics. Three days later, it completely forgot that rule and started talking to me like a high school freind. It didn't lose the infromation — it lost its character.

RAG helps with retrieving facts, but it's terrible at maintaining a coherent "Self.". It treats my grandmother's birthday and my core moral philosophy as equally retrievable chunks. That feel structurally wrong.

So I spent the last few weekends trying to model identity not as just "more prompts", but as a math problem.

The basic idea I'm playing with: Human memory doesn't just decay by time. It decays by relevance to the self. If a piece of info attack or supports my core identity, I remember it forever. If it's random noise, it fades in second.

I tried to write a decay function for this. I call it the "Identity-Relevance Weight"

(I'm not sure this formula is even the right direction — it's basically a first attempt to express an intuition numerically.):

If the relevance score is high (close to 1): The decay rate drops to almost zero. (It sticks around like a core belief.)
If the relevance score is low (close to 0): The memorey fades away much faster.

Here is the repo where I dumped my messy code and proofs:

https://github.com/schumzt/schumzt---portfolio/tree/main/identity-superposition

I'm still not fully confident in the math behind this, and there's a good chance I'm misunderstanding something important — but the idea felt compelling enough to share.

Does this logic hod up ? Or am I just reinventing the wheel ? I'd appreciate any harsh feedback.

AIRationalityWorld Modeling

1

New Comment

Moderation Log

Curated and popular this week

LESSWRONG
LW

LESSWRONG
LW

1

Trying to fix "Identity Drift" with a simple decay formula (because RAG isn't cutting it)

1

1