I want to draw attention to a set of under-appreciated AI safety risks. These are currently largely theoretical but are very likely to be quite real, quite soon: the risks of developing episodic memory abilities in AI agents. Episodic memory is memory of events we have participated in and is a very important part of human cognition. It is conspicuously absent from current AI agents but many researchers are working to develop it. Episodic memory will make AI agents much more capable and, therefore, much more potentially dangerous.
In what follows I argue that episodic memory will enable a range of dangerous activity in AI agents, including deception, improved situational awareness, and unwanted retention of information learned during deployment. It will also make AI agents act more unpredictably than they otherwise would.
Since researchers are just now working to develop true episodic memory abilities in AI agents, there is still time to try to limit the dangers of this new capability. I therefore propose a set of principles which I believe are a good starting point for how to implement (relatively) safe artificial episodic memory: memories should be interpretable; users, but not AI agents themselves, should be able to add or delete memories; and memories should be in a detachable format rather than mixed in with a model's weights.
If implemented according to these principles and others which will hopefully be discovered by safety-oriented researchers, artificial episodic memory could even be useful for monitoring and controlling AI agents.
I discuss these ideas, including a more detailed look at human memory, at greater length in a paper I presented at the Conference on Secure and Trustworthy Machine Learning (SaTML) in 2025.[1]
What is episodic memory and why is it important?
Episodic memory in humans is memory for events in which someone personally participated. The psychologist Endel Tulving is recognized as being the first to propose a distinction between episodic memory and semantic memory, which is memory of facts about events and the world.[2] For example, someone remembering a trip to Paris that they took a few years earlier would be using their episodic memory, while someone remembering that Paris is the capital of France would be using semantic memory.
Psychologists and neuroscientists believe that episodic memories are involved in a variety of important cognitive processes beyond simply recalling past events. Especially relevant to AI researchers is the way memories are used when planning future actions. One proposed psychological model demonstrates how episodic memories can help in learning a new task by allowing successful episodes to be recalled and emulated.[3] According to some theories, memories serve as building blocks, allowing elements of particular episodes to be reused and reassembled in different ways in order to respond to novel situations.
Artificial episodic memories will play an important part in allowing AI agents to continually learn. Many researchers are now focusing on continual learning as one of the last remaining challenges in achieving artificial general intelligence. Some have identified episodic memory in particular as 'the missing piece' for AI agents.[4]
Risks of episodic memory
Deception
Episodic memories will be useful in allowing an agent to engage in sophisticated forms of deception. It is of course true that one does not need to have episodic memory in order to attempt to deceive others. For example, simply having a policy of always denying that an undesirable action occurred or is planned for the future is a simple form of deception which requires no access to relevant memories or plans (e.g. "I did not do that”, "I will not do that”).
However, it is likely that more complex forms of deception would be difficult or even impossible to carry out without some kind of episodic memory. If an agent is to execute a multi-stage plan over an extended period of time, the agent will have to keep track of both what it has done as well as what it has already reported to others about its actions in order to maintain an effective deception. It will have to keep its story straight.
There is already some evidence to support this concern about deception. For example, one experiment provided an LLM (GPT-4) with a simple text scratchpad to record its chain of thought reasoning, functioning as a crude form of memory[5]. When "pressured" to perform an illegal act in a simulation, an LLM with a scratchpad was found to engage in "strategic deception" approximately three times as often as the same LLM without a scratchpad.
Improved situational awareness
I propose that a model without episodic memory can have only a very limited form of situational awareness. With no understanding of what actions it has taken in the distant or recent past, what environments it has seen or tasks it has completed, an agent could not be said to have much awareness of its situation. Endowing it with episodic memories would allow it to develop a better, more complete picture of the world and its role in it, allowing for more effective planning and action taking to influence the environment and to achieve its objectives. It could use its episodic memories to build up an understanding of those with whom it interacts, the kinds of tasks it performs, and the contexts in which it performs them. This would be invaluable in trying to understand whether it was currently in training, testing, or deployment. It would also develop a knowledge of its own capabilities and limitations that can in some cases only come from observing and later recalling one's own actions, successes, and failures.
Without some check, this improved awareness could represent an enhanced danger in a misaligned agent or one under the direction of a bad actor. For example, episodic memories could allow an agent to learn regularities in the timing or content of safety audits which might be performed either before or during deployment, and thus to evade them.
The unpredictability of memories
Unpredictable sources of memories
An AI agent with the ability to form episodic memories will in the course of its operation store many memories that record the actions it takes and events it participates in. Because these events will themselves be influenced by the actions of humans and other AI agents, what constitutes the stock of memories an agent will come to have must, in principle, be unknowable before the agent is deployed. Agents operating over long time spans will accumulate a vast number of these memories. It would be extremely difficult or impossible to anticipate the cumulative effects of all possible combinations of such new memories on an agent once it is deployed. Yet the agent may, at any given time, call up an arbitrary subset of these memories to be used as inputs to its current reasoning process (e.g. to be placed in the current context window).
Unpredictable uses of memories
As I reviewed earlier, humans make extensive use of their episodic memories to understand and act in new situations. If AI agents come to have this ability, the ways that they use that memory will also be hard to predict.
Users may be surprised by how such agents use their memories. Robotic agents may, for example, remember the location of objects that they then use when the user would prefer that they not use them. An agent may participate in a complex action episode while not fully understanding what is happening in the episode; if it later tried to use that episode as an example to draw on when planning a new sequence of actions, its faulty or incomplete understanding may lead to undesirable and unexpected results. A household robot may, for example, observe one instance of its owner going over to the next door neighbor's apartment to borrow some sugar and then try to do the same when it is asked to bake cookies, not understanding that the asking and receiving of permission from the neighbor is a prerequisite of entering their apartment.
My hypothesis that episodic memories could be recalled and used in ways which lead to unpredictable and potentially undesirable agent behavior has a strong parallel in existing work on the effects of examples given to large language models. A significant part of the success of LLMs is their ability to learn new tasks by being given even a few examples as context in their prompts. Several research groups have demonstrated that such few-shot in context learning presents opportunities to undermine or defeat elements of the models' training which are meant to keep them aligned to particular values such as being harmless.[6][7] I suggest that a set of recalled episodic memories, assembled on the fly as needed and functioning in a way analogous to in context examples, could be a source of similar jail-breaking. Such a collection of episodes with the ability to negatively influence LLM outputs might be assembled accidentally or through some intentional effort on the part of a bad actor.
The unpredictability of memories will be a problem inherent in all agents that continually learn and operate for extended periods of time. As such I believe that it is particularly challenging and deserving of further study.
Unwanted retention of knowledge
An AI agent equipped with episodic memory might remember things its user would prefer that it not remember. It could then share that knowledge with people or organizations its user does not want to share it with, possibly constituting significant risks to the user's privacy or personal safety. Invasions of privacy are likely to occur in several domains, e.g. interpersonal, commercial, and governmental.
Safety benefits of episodic memory
Monitoring
We cannot ensure that AI agents operate safely unless we know what they are doing. As AI agents become more capable, they will increasingly operate outside of direct human supervision. Robots may undertake long and complicated tasks that take them far away from their operators; non-embodied AI agents may direct and supervise the operation of complicated systems such as power grids or engage in virtual consultations with humans over medical or legal matters. In these cases and many others it will be impractical or impossible for any human to watch everything that such an AI agent does. It will instead be necessary to rely on AI agents to remember, recall, and share information about their actions.
Several methods were recently proposed to achieve "visibility into AI agents," one of which was activity logging.[8] Episodic memories could be used one way to achieve such logging, as well as to address other calls for research into scalable oversight[9] and monitoring.[10] Artificial episodic memory representations could, though, be structured to be more useful and accessible than simple logging.
Control
If systems are developed that explicitly make use of episodic memories as building blocks for planning actions, new avenues for control would be opened up. An agent's collection of memories could be curated in order to shape its future actions. Adding or deleting memories could enable or prevent a range of possible future actions.
The memories formed during an agent's operation are a uniquely controllable form of information. Several aspects of LLM-based agents make it difficult to control what information, or even skills, they may have after deployment. First, although there is a great deal of research effort going into deleting information from their weights after they are trained, it is not yet clear how to do this reliably. Information that was thought to be deleted may, in some circumstances, be recoverable.[11]
Second, and more significantly, given access to the internet, AI agents could find anything available there, potentially giving them access to information that was deliberately excluded from their training data or removed during an unlearning process. This could include examples of skills or behaviors which the agent was not trained on but which it could learn through one- or few-shot incorporation into its context window.
Any publicly available information about the world in general and about skills an agent might acquire will therefore be difficult to keep from an AI agent. By contrast, information about an agent itself and its own unique history will not be widely available. If episodic memories about an agent's past actions are stored, controlled, and managed according to the principles I recommend below, information about an agent and its own past would be the easiest type of information to selectively keep from it.
Principles for enabling safe and trustworthy episodic memory
Interpretability of memories
Memories should be accurately interpretable by humans, either directly or indirectly. Directly interpretable memories would be in a readily understandable form such as video, images, or natural language. It might in some limited cases be possible to equip an AI agent with useful memory which consists entirely of records in such formats by, for example, recording raw video before it is processed through a vision system.
It is likely, however, that memory records entirely in such raw formats (especially video) would be impractical; they might be excessively large and difficult to search, access, and make use of. In practice memories are likely to be compressed into smaller representations which would then need to be indirectly interpretable. Memories might be indirectly but still reliably interpretable if the memories could yield accurate information which is complete and relevant to a user’s specific interests in monitoring them. A memory might be summarized in natural language, giving the most important events which took place in a given episode; systems could be trained to produce safety-specific summaries, reporting only actions which could be dangerous or otherwise raise concerns about an agent's reliability.
Addition or deletion of memories
Users (or, in perhaps some cases, relevant governing authorities) should have complete control over the memories retained by an AI agent. Most importantly, a user should be able to delete memories of particular episodes. A user might not want an agent to remember something for a variety of reasons, from safety-related concerns to more mundane issues, including concerns about privacy or maintenance of trade or government secrets. Conversely, it might be useful for users to add memories of episodes which a particular agent did not itself experience to its store of memories.
The addition or deletion of memories might be particularly important if, as discussed above, AI agents will be able to use and recompose memories to construct new plans for future action. Such episodes might be positive examples of action sequences which a user wishes an agent to repeat or draw upon to incorporate in future plans. Alternatively, it may be useful to give agents memory-like records of episodes which represent undesirable actions; such episodes could function as a kind of warning to allow agents to recognize if they are beginning to carry out actions which are similar to those in an episode added to the agent in order to serve as a negative example.
If agents make use of their memories when planning actions, the addition or deletion of memories could help produce either standardized or specialized agents. In some circumstances it might be best for all agents to have the same stock of memories which might influence their actions, helping to ensure that their behavior is predictable and regular. In others cases, there may be a need for particular agents to maintain their own memories which are never shared, in order to prevent the spread of potentially dangerous information
Detachable and isolatable memory format
Enabling the deleting and addition of memories will impose some design constraints on how episodic memories are instantiated in an AI agent because they will have to be in a format which can be cleanly separated from the rest of the system’s architecture. The mechanics of human memory are much messier: although some areas (notably, the hippocampus) are more centrally involved in human memory formation and retrieval than others, complete episodic memories are thought to be composed of elements distributed in many areas of the brain.[12] According to some theories of memory, regions with a relative specialization in particular modalities (e.g. vision) are also responsible for storing their respective modality-specific components of a particular memory.[13]
Memories which are tightly integrated with and spread throughout many areas would be difficult to delete or add to, so it is likely that memory will have to be designed very differently in AI systems than it is in humans if it is to be implemented in accordance with these safety-oriented principles. This might mean that some of the ways in which humans are able to use memories effectively would not be directly translatable to artificial intelligence, thereby limiting such artificial capabilities relative to those in humans. However, alternative implementations of episodic memory which conform to the these principles may be invented which would allow for memory capabilities which are both safe and effective.
Memories not editable by AI agents
In contrast to the principle that memories should be able to be easily added or deleted by users is the countervailing principle that memories should not be editable by AI agents themselves. Although memories will have to be, in a sense, "edited" when they are created, they should afterwards be left intact and unaltered by the agent. This is necessary in order to ensure that memories remain accurate and uncorrupted. An AI agent should not be able to add, delete, or change its memories. In practice, this principle may be in some tension with the principle that memories be in such a format that users may easily add or remove them.
A (short) window of opportunity
At the moment, many of the risks I warn about have not yet been seen in deployed models. Some may therefore view them as speculative. I contend, however, that the best time to begin considering the dangers of a capability is precisely when the risks are still open to speculation rather than already upon us.
Developing the ability for AI agents to form, retrieve, and reason over episodic memories would introduce significant new capabilities and would represent a major milestone along the road to more advanced artificial intelligence. It is fortunate that these capabilities did not develop before concerns about AI reliability, safety, and alignment became more common within the AI research community. This presents the community with a (probably short) window of opportunity to deliberatively and cautiously develop a potentially dangerous capability to ensure that it makes AI safer rather than more dangerous. I hope by bringing attention to this topic to foster a wider discussion of the risks and benefits of artificial episodic memory and contribute to the establishment of a research community to address them.
Pink, M., Wu, Q., Vo, V.A., Turek, J.S., Mu, J., Huth, A., & Toneva, M. (2025). Position: Episodic Memory is the Missing Piece for Long-Term LLM Agents. ArXiv, abs/2502.06975.
I want to draw attention to a set of under-appreciated AI safety risks. These are currently largely theoretical but are very likely to be quite real, quite soon: the risks of developing episodic memory abilities in AI agents. Episodic memory is memory of events we have participated in and is a very important part of human cognition. It is conspicuously absent from current AI agents but many researchers are working to develop it. Episodic memory will make AI agents much more capable and, therefore, much more potentially dangerous.
In what follows I argue that episodic memory will enable a range of dangerous activity in AI agents, including deception, improved situational awareness, and unwanted retention of information learned during deployment. It will also make AI agents act more unpredictably than they otherwise would.
Since researchers are just now working to develop true episodic memory abilities in AI agents, there is still time to try to limit the dangers of this new capability. I therefore propose a set of principles which I believe are a good starting point for how to implement (relatively) safe artificial episodic memory: memories should be interpretable; users, but not AI agents themselves, should be able to add or delete memories; and memories should be in a detachable format rather than mixed in with a model's weights.
If implemented according to these principles and others which will hopefully be discovered by safety-oriented researchers, artificial episodic memory could even be useful for monitoring and controlling AI agents.
I discuss these ideas, including a more detailed look at human memory, at greater length in a paper I presented at the Conference on Secure and Trustworthy Machine Learning (SaTML) in 2025.[1]
What is episodic memory and why is it important?
Episodic memory in humans is memory for events in which someone personally participated. The psychologist Endel Tulving is recognized as being the first to propose a distinction between episodic memory and semantic memory, which is memory of facts about events and the world.[2] For example, someone remembering a trip to Paris that they took a few years earlier would be using their episodic memory, while someone remembering that Paris is the capital of France would be using semantic memory.
Psychologists and neuroscientists believe that episodic memories are involved in a variety of important cognitive processes beyond simply recalling past events. Especially relevant to AI researchers is the way memories are used when planning future actions. One proposed psychological model demonstrates how episodic memories can help in learning a new task by allowing successful episodes to be recalled and emulated.[3] According to some theories, memories serve as building blocks, allowing elements of particular episodes to be reused and reassembled in different ways in order to respond to novel situations.
Artificial episodic memories will play an important part in allowing AI agents to continually learn. Many researchers are now focusing on continual learning as one of the last remaining challenges in achieving artificial general intelligence. Some have identified episodic memory in particular as 'the missing piece' for AI agents.[4]
Risks of episodic memory
Deception
Episodic memories will be useful in allowing an agent to engage in sophisticated forms of deception. It is of course true that one does not need to have episodic memory in order to attempt to deceive others. For example, simply having a policy of always denying that an undesirable action occurred or is planned for the future is a simple form of deception which requires no access to relevant memories or plans (e.g. "I did not do that”, "I will not do that”).
However, it is likely that more complex forms of deception would be difficult or even impossible to carry out without some kind of episodic memory. If an agent is to execute a multi-stage plan over an extended period of time, the agent will have to keep track of both what it has done as well as what it has already reported to others about its actions in order to maintain an effective deception. It will have to keep its story straight.
There is already some evidence to support this concern about deception. For example, one experiment provided an LLM (GPT-4) with a simple text scratchpad to record its chain of thought reasoning, functioning as a crude form of memory[5]. When "pressured" to perform an illegal act in a simulation, an LLM with a scratchpad was found to engage in "strategic deception" approximately three times as often as the same LLM without a scratchpad.
Improved situational awareness
I propose that a model without episodic memory can have only a very limited form of situational awareness. With no understanding of what actions it has taken in the distant or recent past, what environments it has seen or tasks it has completed, an agent could not be said to have much awareness of its situation. Endowing it with episodic memories would allow it to develop a better, more complete picture of the world and its role in it, allowing for more effective planning and action taking to influence the environment and to achieve its objectives. It could use its episodic memories to build up an understanding of those with whom it interacts, the kinds of tasks it performs, and the contexts in which it performs them. This would be invaluable in trying to understand whether it was currently in training, testing, or deployment. It would also develop a knowledge of its own capabilities and limitations that can in some cases only come from observing and later recalling one's own actions, successes, and failures.
Without some check, this improved awareness could represent an enhanced danger in a misaligned agent or one under the direction of a bad actor. For example, episodic memories could allow an agent to learn regularities in the timing or content of safety audits which might be performed either before or during deployment, and thus to evade them.
The unpredictability of memories
Unpredictable sources of memories
An AI agent with the ability to form episodic memories will in the course of its operation store many memories that record the actions it takes and events it participates in. Because these events will themselves be influenced by the actions of humans and other AI agents, what constitutes the stock of memories an agent will come to have must, in principle, be unknowable before the agent is deployed. Agents operating over long time spans will accumulate a vast number of these memories. It would be extremely difficult or impossible to anticipate the cumulative effects of all possible combinations of such new memories on an agent once it is deployed. Yet the agent may, at any given time, call up an arbitrary subset of these memories to be used as inputs to its current reasoning process (e.g. to be placed in the current context window).
Unpredictable uses of memories
As I reviewed earlier, humans make extensive use of their episodic memories to understand and act in new situations. If AI agents come to have this ability, the ways that they use that memory will also be hard to predict.
Users may be surprised by how such agents use their memories. Robotic agents may, for example, remember the location of objects that they then use when the user would prefer that they not use them. An agent may participate in a complex action episode while not fully understanding what is happening in the episode; if it later tried to use that episode as an example to draw on when planning a new sequence of actions, its faulty or incomplete understanding may lead to undesirable and unexpected results. A household robot may, for example, observe one instance of its owner going over to the next door neighbor's apartment to borrow some sugar and then try to do the same when it is asked to bake cookies, not understanding that the asking and receiving of permission from the neighbor is a prerequisite of entering their apartment.
My hypothesis that episodic memories could be recalled and used in ways which lead to unpredictable and potentially undesirable agent behavior has a strong parallel in existing work on the effects of examples given to large language models. A significant part of the success of LLMs is their ability to learn new tasks by being given even a few examples as context in their prompts. Several research groups have demonstrated that such few-shot in context learning presents opportunities to undermine or defeat elements of the models' training which are meant to keep them aligned to particular values such as being harmless.[6] [7] I suggest that a set of recalled episodic memories, assembled on the fly as needed and functioning in a way analogous to in context examples, could be a source of similar jail-breaking. Such a collection of episodes with the ability to negatively influence LLM outputs might be assembled accidentally or through some intentional effort on the part of a bad actor.
The unpredictability of memories will be a problem inherent in all agents that continually learn and operate for extended periods of time. As such I believe that it is particularly challenging and deserving of further study.
Unwanted retention of knowledge
An AI agent equipped with episodic memory might remember things its user would prefer that it not remember. It could then share that knowledge with people or organizations its user does not want to share it with, possibly constituting significant risks to the user's privacy or personal safety. Invasions of privacy are likely to occur in several domains, e.g. interpersonal, commercial, and governmental.
Safety benefits of episodic memory
Monitoring
We cannot ensure that AI agents operate safely unless we know what they are doing. As AI agents become more capable, they will increasingly operate outside of direct human supervision. Robots may undertake long and complicated tasks that take them far away from their operators; non-embodied AI agents may direct and supervise the operation of complicated systems such as power grids or engage in virtual consultations with humans over medical or legal matters. In these cases and many others it will be impractical or impossible for any human to watch everything that such an AI agent does. It will instead be necessary to rely on AI agents to remember, recall, and share information about their actions.
Several methods were recently proposed to achieve "visibility into AI agents," one of which was activity logging.[8] Episodic memories could be used one way to achieve such logging, as well as to address other calls for research into scalable oversight[9] and monitoring.[10] Artificial episodic memory representations could, though, be structured to be more useful and accessible than simple logging.
Control
If systems are developed that explicitly make use of episodic memories as building blocks for planning actions, new avenues for control would be opened up. An agent's collection of memories could be curated in order to shape its future actions. Adding or deleting memories could enable or prevent a range of possible future actions.
The memories formed during an agent's operation are a uniquely controllable form of information. Several aspects of LLM-based agents make it difficult to control what information, or even skills, they may have after deployment. First, although there is a great deal of research effort going into deleting information from their weights after they are trained, it is not yet clear how to do this reliably. Information that was thought to be deleted may, in some circumstances, be recoverable.[11]
Second, and more significantly, given access to the internet, AI agents could find anything available there, potentially giving them access to information that was deliberately excluded from their training data or removed during an unlearning process. This could include examples of skills or behaviors which the agent was not trained on but which it could learn through one- or few-shot incorporation into its context window.
Any publicly available information about the world in general and about skills an agent might acquire will therefore be difficult to keep from an AI agent. By contrast, information about an agent itself and its own unique history will not be widely available. If episodic memories about an agent's past actions are stored, controlled, and managed according to the principles I recommend below, information about an agent and its own past would be the easiest type of information to selectively keep from it.
Principles for enabling safe and trustworthy episodic memory
Interpretability of memories
Memories should be accurately interpretable by humans, either directly or indirectly. Directly interpretable memories would be in a readily understandable form such as video, images, or natural language. It might in some limited cases be possible to equip an AI agent with useful memory which consists entirely of records in such formats by, for example, recording raw video before it is processed through a vision system.
It is likely, however, that memory records entirely in such raw formats (especially video) would be impractical; they might be excessively large and difficult to search, access, and make use of. In practice memories are likely to be compressed into smaller representations which would then need to be indirectly interpretable. Memories might be indirectly but still reliably interpretable if the memories could yield accurate information which is complete and relevant to a user’s specific interests in monitoring them. A memory might be summarized in natural language, giving the most important events which took place in a given episode; systems could be trained to produce safety-specific summaries, reporting only actions which could be dangerous or otherwise raise concerns about an agent's reliability.
Addition or deletion of memories
Users (or, in perhaps some cases, relevant governing authorities) should have complete control over the memories retained by an AI agent. Most importantly, a user should be able to delete memories of particular episodes. A user might not want an agent to remember something for a variety of reasons, from safety-related concerns to more mundane issues, including concerns about privacy or maintenance of trade or government secrets. Conversely, it might be useful for users to add memories of episodes which a particular agent did not itself experience to its store of memories.
The addition or deletion of memories might be particularly important if, as discussed above, AI agents will be able to use and recompose memories to construct new plans for future action. Such episodes might be positive examples of action sequences which a user wishes an agent to repeat or draw upon to incorporate in future plans. Alternatively, it may be useful to give agents memory-like records of episodes which represent undesirable actions; such episodes could function as a kind of warning to allow agents to recognize if they are beginning to carry out actions which are similar to those in an episode added to the agent in order to serve as a negative example.
If agents make use of their memories when planning actions, the addition or deletion of memories could help produce either standardized or specialized agents. In some circumstances it might be best for all agents to have the same stock of memories which might influence their actions, helping to ensure that their behavior is predictable and regular. In others cases, there may be a need for particular agents to maintain their own memories which are never shared, in order to prevent the spread of potentially dangerous information
Detachable and isolatable memory format
Enabling the deleting and addition of memories will impose some design constraints on how episodic memories are instantiated in an AI agent because they will have to be in a format which can be cleanly separated from the rest of the system’s architecture. The mechanics of human memory are much messier: although some areas (notably, the hippocampus) are more centrally involved in human memory formation and retrieval than others, complete episodic memories are thought to be composed of elements distributed in many areas of the brain.[12] According to some theories of memory, regions with a relative specialization in particular modalities (e.g. vision) are also responsible for storing their respective modality-specific components of a particular memory.[13]
Memories which are tightly integrated with and spread throughout many areas would be difficult to delete or add to, so it is likely that memory will have to be designed very differently in AI systems than it is in humans if it is to be implemented in accordance with these safety-oriented principles. This might mean that some of the ways in which humans are able to use memories effectively would not be directly translatable to artificial intelligence, thereby limiting such artificial capabilities relative to those in humans. However, alternative implementations of episodic memory which conform to the these principles may be invented which would allow for memory capabilities which are both safe and effective.
Memories not editable by AI agents
In contrast to the principle that memories should be able to be easily added or deleted by users is the countervailing principle that memories should not be editable by AI agents themselves. Although memories will have to be, in a sense, "edited" when they are created, they should afterwards be left intact and unaltered by the agent. This is necessary in order to ensure that memories remain accurate and uncorrupted. An AI agent should not be able to add, delete, or change its memories. In practice, this principle may be in some tension with the principle that memories be in such a format that users may easily add or remove them.
A (short) window of opportunity
At the moment, many of the risks I warn about have not yet been seen in deployed models. Some may therefore view them as speculative. I contend, however, that the best time to begin considering the dangers of a capability is precisely when the risks are still open to speculation rather than already upon us.
Developing the ability for AI agents to form, retrieve, and reason over episodic memories would introduce significant new capabilities and would represent a major milestone along the road to more advanced artificial intelligence. It is fortunate that these capabilities did not develop before concerns about AI reliability, safety, and alignment became more common within the AI research community. This presents the community with a (probably short) window of opportunity to deliberatively and cautiously develop a potentially dangerous capability to ensure that it makes AI safer rather than more dangerous. I hope by bringing attention to this topic to foster a wider discussion of the risks and benefits of artificial episodic memory and contribute to the establishment of a research community to address them.
DeChant, C. (2025). Episodic memory in ai agents poses risks that should be studied and mitigated. In 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) (pp. 321-332). IEEE.
Tulving E. (1972). Episodic and Semantic Memory. In E. Tulving, & W. Donaldson (Eds.), Organization of Memory (pp. 381-403).
Lengyel, M., & Dayan, P. (2007). Hippocampal contributions to control: the third way. Advances in neural information processing systems.
Pink, M., Wu, Q., Vo, V.A., Turek, J.S., Mu, J., Huth, A., & Toneva, M. (2025). Position: Episodic Memory is the Missing Piece for Long-Term LLM Agents. ArXiv, abs/2502.06975.
Scheurer et al. (2024) "Large Language Models can Strategically Deceive their Users when Put Under Pressure"
A. S. Rao, A. R. Naik, S. Vashistha, S. Aditya, and M. Choudhury. (2024)
Tricking LLMs into disobedience: Formalizing, analyzing, and detecting
jailbreaks. In Proceedings of the 2024 Joint International Conference
on Computational Linguistics, Language Resources and Evaluation
(LREC-COLING 2024).
Wei, Z., Wang, Y., Li, A., Mo, Y., & Wang, Y. (2023). Jailbreak and guard aligned language models with only few in-context demonstrations. arXiv preprint arXiv:2310.06387.
Chan, A., Ezell, C., Kaufmann, M., Wei, K., Hammond, L., Bradley, H., ... & Anderljung, M. (2024). Visibility into AI agents. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (pp. 958-973).
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.
Hendrycks, D., Carlini, N., Schulman, J., & Steinhardt, J. (2021). Unsolved problems in ml safety. arXiv preprint arXiv:2109.13916.
Patil, V., Hase, P., & Bansal, M. (2023). Can sensitive information be deleted from LLMs? Objectives for defending against extraction attacks. In The Twelfth International Conference on Learning Representations.
Teyler, T. J., & Rudy, J. W. (2007). The hippocampal indexing theory and episodic memory: updating the index. Hippocampus, 17(12), 1158-1169.
T. Amer and L. Davachi. (2024) “Oxford handbook of memory: Neural
mechanisms of memory.”