I've been building long-context agents for a while, and honestly, the current state of "memory" is frustrating. No matter how huge the context window gets, the agent eventually hallucinates or forgets who it is.
Just las week, I had an agent that was explicitly instructed to be polite and use honorifics. Three days later, it completely forgot that rule and started talking to me like a high school freind. It didn't lose the infromation — it lost its character.
RAG helps with retrieving facts, but it's terrible at maintaining a coherent "Self.". It treats my grandmother's birthday and my core moral philosophy as equally retrievable chunks. That feel structurally wrong.
So I spent the last few weekends trying to model identity not as just "more prompts", but as a math problem.
The basic idea I'm playing with: Human memory doesn't just decay by time. It decays by relevance to the self. If a piece of info attack or supports my core identity, I remember it forever. If it's random noise, it fades in second.
I tried to write a decay function for this. I call it the "Identity-Relevance Weight"
(I'm not sure this formula is even the right direction — it's basically a first attempt to express an intuition numerically.):
If the relevance score is high (close to 1): The decay rate drops to almost zero. (It sticks around like a core belief.)
If the relevance score is low (close to 0): The memorey fades away much faster.
Here is the repo where I dumped my messy code and proofs:
I'm still not fully confident in the math behind this, and there's a good chance I'm misunderstanding something important — but the idea felt compelling enough to share.
Does this logic hod up ? Or am I just reinventing the wheel ? I'd appreciate any harsh feedback.
I've been building long-context agents for a while, and honestly, the current state of "memory" is frustrating. No matter how huge the context window gets, the agent eventually hallucinates or forgets who it is.
Just las week, I had an agent that was explicitly instructed to be polite and use honorifics. Three days later, it completely forgot that rule and started talking to me like a high school freind. It didn't lose the infromation — it lost its character.
RAG helps with retrieving facts, but it's terrible at maintaining a coherent "Self.". It treats my grandmother's birthday and my core moral philosophy as equally retrievable chunks. That feel structurally wrong.
So I spent the last few weekends trying to model identity not as just "more prompts", but as a math problem.
The basic idea I'm playing with: Human memory doesn't just decay by time. It decays by relevance to the self. If a piece of info attack or supports my core identity, I remember it forever. If it's random noise, it fades in second.
I tried to write a decay function for this. I call it the "Identity-Relevance Weight"
(I'm not sure this formula is even the right direction — it's basically a first attempt to express an intuition numerically.):
Here is the repo where I dumped my messy code and proofs:
https://github.com/schumzt/schumzt---portfolio/tree/main/identity-superposition
I'm still not fully confident in the math behind this, and there's a good chance I'm misunderstanding something important — but the idea felt compelling enough to share.
Does this logic hod up ? Or am I just reinventing the wheel ? I'd appreciate any harsh feedback.