The amnesia problem

Every conversation with a large language model begins from zero. No matter how long, how nuanced, or how personal a previous exchange was, the next session starts with an empty context window. The model has no recollection of who you are, what you were working on, or the decisions you made yesterday. This is the amnesia problem, and it is the single greatest obstacle to using AI assistants for anything more than one-shot tasks.

Humans do not work this way. When we resume a project on Monday, we do not ask our colleagues to re-explain what the project is about. We carry forward identity, context, procedural habits, and shared history. We know who we are, what we care about, what we have tried before, and what worked. A collaborator without any of that is not really a collaborator — at best, they are a search engine with manners.

The obvious first response is to grow the context window. Context windows have indeed grown spectacularly, from a few thousand tokens in early LLM-era models to a million or more in recent frontier systems. It is tempting to conclude that the memory problem will dissolve as context windows expand indefinitely.

This is wrong, for at least four reasons. First, context windows are finite, and every token of stored history is a token that cannot be used for reasoning about the current task. Second, retrieval within very large contexts degrades noticeably — models struggle to find and use information buried in the middle of long contexts, the so-called “lost in the middle” problem. Third, the computational cost of processing a massive context on every turn is prohibitive for interactive use. Fourth, and most fundamentally, context windows are flat. They have no structure. A million tokens of mixed identity, episodes, facts, and procedures, all dumped together, is not memory. It is a pile.

Real memory is structured. The human brain does not store everything in one undifferentiated mass; it routes different kinds of information to different systems, indexes them differently, and retrieves them differently. Artificial systems that want human-like memory have to do something similar.

Human memory is not one system

The decision to model artificial memory on human memory is not aesthetic preference. It rests on a practical observation: the taxonomies that cognitive psychology has developed over the past fifty years describe distinctions that genuinely carve reality at its joints. The difference between remembering what you did yesterday and remembering what the capital of France is — these are different kinds of memory, supported by different neural substrates, with different retrieval mechanisms and different failure modes. An artificial memory system that collapses these distinctions will inherit their failure modes without any of their benefits.

In 1972, the Estonian-Canadian cognitive psychologist Endel Tulving proposed what would become one of the most influential distinctions in the study of memory: the separation of episodic memory from semantic memory. Episodic memory, Tulving argued, stores “temporally dated episodes or events” — the specific, contextualised happenings of a life. Semantic memory, by contrast, is a “mental thesaurus” of general knowledge, facts, and concepts divorced from any particular moment of acquisition.

The examples that illustrate the distinction are telling. Remembering being chased by a dog on yesterday’s bike ride is episodic. Knowing that bicycles have two wheels and pedals is semantic. The first is indexed by time and place; the second floats free of any autobiographical anchor. The first requires what Tulving later called autonoetic consciousness — the sense of re-experiencing oneself at an earlier moment. The second does not.

This was not merely theoretical. Tulving’s work with the amnesic patient KC — a man whose episodic memory was severely impaired after a motorcycle accident while his semantic memory remained largely intact — provided striking clinical evidence that the two systems can dissociate. KC knew that he owned a car, but could not remember a single specific drive he had ever taken in it. Fifty years of subsequent neuroimaging and neuropsychology have refined the picture, but the core distinction has held.

Building on Tulving’s work, cognitive psychologists and neuroscientists — most notably Larry Squire — developed a fuller taxonomy of long-term memory systems. Four categories matter for the purposes of artificial memory:

Episodic memory — memory for specific events, indexed by time and context. “What happened yesterday at the meeting.”
Semantic memory — general knowledge, concepts, and facts, decontextualised. “Paris is the capital of France.”
Procedural memory — memory for how to do things. Riding a bicycle, touch-typing, the steps of a familiar recipe. Largely implicit — we can execute the skill without being able to articulate it.
Core self-knowledge — the stable representation of who we are, our values, our ongoing goals. Not a formal Tulving category, but an essential component of any functioning cognitive agent.

These categories are not airtight. Tulving himself stressed, from 1972 onward, that episodic and semantic memory are deeply interdependent. An episodic memory, rehearsed and abstracted over time, becomes semantic. A new episodic memory depends on existing semantic knowledge for its encoding. But the distinctions are real enough to structure both the human brain and, as we will see, a useful artificial memory system.

From cognitive science to AI

The first serious attempt to transfer cognitive-architecture principles to language models was CoALA — Cognitive Architectures for Language Agents — a framework proposed by Theodore Sumers, Shunyu Yao, Karthik Narasimhan, and Thomas Griffiths at Princeton University. First circulated as an arXiv preprint in September 2023 and published in Transactions on Machine Learning Research in 2024, CoALA drew explicitly on the rich history of symbolic AI cognitive architectures — systems like Soar and ACT-R, developed from the 1980s onward to model general human cognition — and asked what the language-model era had to learn from them.

CoALA’s answer was an architectural blueprint with three components: modular memory, a structured action space, and an explicit decision procedure. The memory component organised information into working memory (the current context) and long-term memory, subdivided into episodic, semantic, and procedural stores — exactly the distinctions Tulving and Squire had drawn for the human brain.

CoALA did not prescribe implementations. It was a conceptual framework, a way of organising the rapidly proliferating literature on LLM agents and a map of the design space still to be explored. Its importance was that it made the connection explicit and respectable. It said: if you want to build a language agent that genuinely remembers and reasons, you should look at what half a century of cognitive science has learned about memory, and use those categories as your design vocabulary.

The current state of the art: MIRIX

CoALA was a blueprint. MIRIX, published by Yu Wang and Xi Chen in July 2025, is the implementation that most directly translates it into a working system. Where CoALA talked about three long-term memory types, MIRIX committed to six specific, carefully structured memory types operating as a coordinated multi-agent system:

Core Memory — the persona and user identity. The “who am I talking to, and who am I?” layer.
Episodic Memory — timestamped event records.
Semantic Memory — consolidated facts and concepts.
Procedural Memory — workflows and skills, the “how to do X” knowledge.
Resource Memory — reusable artefacts like templates and reference documents.
Knowledge Vault — sensitive information and credentials, handled with restricted access.

MIRIX also introduced an active retrieval mechanism: before answering a query, the agent first generates a topic and retrieves only the relevant fragments of memory rather than loading everything. The published results were striking. On the LOCOMO long-form conversation benchmark, MIRIX achieved 85.4% accuracy, roughly 8 points above the best existing memory system and close to the upper bound set by full-context long-window models. On the ScreenshotVQA multimodal benchmark, MIRIX outperformed retrieval-augmented generation by 35% while reducing storage requirements by 99.9%.

This six-layer taxonomy is the best available starting point for building a persistent memory system on top of a commercial language model. The rest of this article describes one way of implementing it without the multi-agent apparatus — using nothing more sophisticated than Markdown files and a pair of protocols.

A practical implementation

The MIRIX paper describes a coordinated multi-agent system with dedicated agents for each memory type. For personal or small-team use, that is overkill. A much simpler implementation uses plain Markdown files on disk, organised into a folder hierarchy that mirrors the six memory types.

Here is the canonical structure:

memory/
  _index.md                    Catalogue + queries
  core/                        Always-loaded identity
    identity.md
  episodic/                    Session archives (immutable)
    YYYY-MM-DD-HHhMM-{project}-{summary}.md
  semantic/                    Stable, consolidated facts
    global/                    Cross-project knowledge
    projects/{name}/
      context.md               Current snapshot
      facts.md                 Persistent project facts
  procedural/                  How-to knowledge
    playbooks/
      {verb}-{object}.md
    lessons-learned.md
  resources/                   Reusable artefacts
  vault/                       Secrets, restricted access

A few design principles distinguish this approach from the academic prototypes it draws from.

Markdown files over databases. Academic memory systems typically use vector databases, knowledge graphs, or custom data stores. This implementation uses plain Markdown files on the local file system. The trade-off is deliberate: we lose some retrieval sophistication, but we gain radical transparency. Every memory is a file you can open, read, edit, and back up with standard tools. The memory is portable, auditable, and human-readable. When something goes wrong, you can see exactly what the system thinks it knows. There is no black box.

YAML front-matter for metadata. Each file carries a YAML front-matter block with structured metadata: the layer it belongs to, the project, tags, version, creation date, last modification date. This turns the folder tree into a queryable database without needing a database.

Wikilinks for the graph. Files reference each other through Obsidian-style wikilinks such as [[publishing-workflow]]. This creates an implicit knowledge graph on top of the folder structure. When the vault is opened in Obsidian, you see not just files but the network of relationships between them — a visualisation that makes the shape of accumulated knowledge immediately legible.

Selective loading, not full loading. At the start of each session the system does not load all of memory. It loads the core identity file and whatever the current task specifically requires, according to a selective-reloading protocol. The principle is Occam’s razor applied to context: load the minimum, fetch more only if needed. This is also what MIRIX’s active retrieval mechanism does, in a more sophisticated form.

Append-only episodic, mutable semantic. Episodic archives are immutable: once a session is archived, the record of what happened that day never changes. Semantic files, by contrast, are overwritten and consolidated — they represent the current best-known state of the world. This mirrors the way human memory works: what actually happened is fixed; what we generalise from it evolves.

The two protocols that make memory real

Files on disk are not memory. They are inert material. What turns them into a functioning memory system is two protocols that bracket every session.

The reload protocol runs at the beginning of each new task. It has five steps. First, load core identity — the file that establishes who the user is and any ground rules that apply across all tasks. Second, identify the target: which project are we working on? If ambiguous, consult the index and ask. Third, load selectively — based on the intent detected in the user’s request, load only the files needed. A request to resume a project loads that project’s context file. A request to publish an article additionally loads the relevant playbook and style guide. The principle is to load the minimum and fetch more if needed, rather than front-loading everything. Fourth, present a briefing — summarise the state in a structured format: last session, current phase, validated items, items in progress, key decisions, next steps, active playbooks. Fifth, propose the next move.

The archive protocol runs at the end of each productive session, triggered by the user saying “archive”, “save the session”, or equivalent. It has seven steps. First, collect the context: synthesise from the conversation what was done, why, and what comes next. Second, write the episodic archive — a new timestamped file in the episodic folder, immutable, with full metadata and wikilinks to every playbook and semantic file touched during the session. Third, overwrite the project’s context file — this is the current view, mutable, capped at around forty lines, showing the present state of the project. Fourth, consolidate durable facts — decisions that should survive future sessions go into the project’s persistent facts file, or into the relevant global semantic file. Fifth, extract procedures — if a new workflow was proven during the session, propose creating a playbook; if a failure occurred and was understood, add an entry to a lessons-learned file. Sixth, update the index. Seventh, confirm to the user what was written and where.

The asymmetry between reload and archive is deliberate. Reload is cheap, frequent, and should be lightweight — a session might start many times a day. Archive is expensive, rare, and should be thorough — it is the moment at which a transient conversation is promoted into durable knowledge. The design encourages frequent reloads and careful archives, with the user explicitly triggering archiving rather than having it happen automatically. This preserves user agency: the user decides what is worth remembering.

Why this matters

The combination of this architecture and these protocols changes what an AI assistant can be. Not quantitatively but qualitatively. Before: a stateless question-answering engine that happens to be articulate. After: a collaborator with persistent identity, a memory of shared work, an evolving set of proven procedures, and a growing catalogue of what has been tried and what has worked.

The architecture does not require anything exotic. The ingredients are fifty-year-old cognitive psychology, recent academic work on LLM agent architectures, plain Markdown files, an Obsidian-style wikilink convention, and two short protocols. That is it. No vector database, no fine-tuning, no novel infrastructure. Everything lives in a folder on your disk. You can read it. You can edit it. You can back it up. You can delete it if you decide to start over.

What makes the system work is not the cleverness of any individual component. It is the fact that the components are organised according to distinctions that cognitive science has validated for half a century. Episodic and semantic are different. Procedural and declarative are different. Identity persists across everything else. An AI memory system that honours these distinctions is closer to how memory actually works, and that is why it is useful.

The amnesia problem is solvable. It requires taking cognitive science seriously as a design resource, adopting a taxonomy that has been road-tested in both human brains and academic AI prototypes, and — most importantly — having the discipline to archive sessions and reload them according to a consistent protocol. Files on disk without protocols are inert. Protocols without structured files are ad hoc. The two together produce something that feels, for the first time in the language-model era, like a collaborator with memory.

Human memory is not one system

From cognitive science to AI

The current state of the art: MIRIX

A practical implementation

The two protocols that make memory real

Why this matters

Further reading

Leave a Reply Cancel reply

Manu

Human memory is not one system

From cognitive science to AI

The current state of the art: MIRIX

A practical implementation

The two protocols that make memory real

Why this matters

Further reading

Leave a Reply Cancel reply

Manu

Related Posts