Episodic memory is memory of specific events — what happened, when, and where. Semantic memory is memory of general facts independent of when they were learned. Procedural memory is memory of how to do something. The three-way split comes from cognitive science (Tulving, 1972) and turns out to be a surprisingly useful architectural lens for designing AI agent memory.
This post is for engineers building agents that need to remember things across sessions. Most teams discover the three memory types by accident — they build an event log, then realize they need a fact store, then realize the agent keeps re-deriving the same workflows. Naming the layers up front saves a refactor. The cognitive framing is the entry point; the destination is implementation. For the broader memory landscape — within-session context, provider memory, application memory — see the AI memory systems guide.
Tip
Most production agents end up with three memory primitives — an event log, a fact store, and a learned-routine cache — even when teams don't realize that's what they've built. The cognitive labels (episodic, semantic, procedural) are useful precisely because they force you to name the layer before you build it.
Key takeaways
- The three-way split — episodic memory, semantic memory, procedural memory — comes from Tulving's 1972 work in cognitive psychology. It is widely used as a teaching frame even where the boundaries blur in practice.
- In agent design, episodic maps to the event log, semantic maps to the structured fact store, and procedural maps to learned routines and reusable action sequences.
- Most production agents need all three, even simple ones. Collapsing them into one big store is the most common failure mode.
- The cognitive labels are an architectural lens, not a claim about model internals. Use them to organize what you build, not to describe what the model is doing.
- Procedural memory is the most overlooked layer. Agents without it relearn the same workflows turn after turn — a quiet tax on every repeated user task.
- The right retrieval flow usually goes semantic-first, procedural-second, episodic-on-demand. Searching the event log for every turn is expensive and noisy.
- Frameworks expose the layers differently. Letta separates them cleanly; mem0 implements multi-level memory that spans episodic and semantic; many DIY stacks have all three but no labels for them.
The three memory types in cognitive science
Cognitive psychology distinguishes between several long-term memory systems. Three of them concern us here.
Episodic memory is memory of specific events that happened to you. "I had coffee with Maria last Tuesday at the cafe near the office" is episodic. The memory is anchored in a particular time and place. You can replay it. If pressed, you can give it a date.
Semantic memory is memory of general facts you hold without remembering when you learned them. "Maria is the head of design" is semantic. So is "Paris is the capital of France." You did not file these facts with a timestamp; you just know them. Semantic memories often consolidate from many episodic encounters — you met Maria several times before "Maria is head of design" became a freestanding fact.
Procedural memory is memory of how to do something. Riding a bike. Touch-typing. Composing a status update in your team's preferred format. Procedural memories are usually not verbalizable — you can do the thing without being able to fully describe how. They are also robust: skills survive long after the conscious memory of learning them has faded.
Two honest caveats before we map this to agents. First, the three categories blur in practice — a memory can be partly episodic and partly semantic, and consolidation moves things between them. Second, modern cognitive science uses many more distinctions than these three. The Tulving split is a useful teaching frame; treat it as a starting taxonomy, not a complete one.
Mapping to AI agents
The cognitive frame becomes interesting when you notice that agent memory implementations naturally cluster around the same three shapes. Here is the mapping.
| Memory type | Cognitive definition | Agent-implementation analog | Typical storage | Typical retrieval | When this layer is load-bearing | Common framework primitive |
|---|---|---|---|---|---|---|
| Episodic | Specific events tied to time and place | Event log of past interactions | Vector store + metadata, append-only table, message archive | Similarity + metadata filter ("conversations from last week about X") | Replay debugging, "what did we discuss," cross-session reference | Letta recall storage, mem0 per-session add, conversation tables |
| Semantic | General facts independent of when learned | Structured fact store about user, project, or world | Key-value, structured rows, knowledge graph, deduplicated extracted memories | Direct lookup or vector retrieval over fact embeddings | Personalization, avoiding repeat questions, cross-session continuity | Letta human and persona blocks, mem0 user-level memory, profile tables |
| Procedural | Learned routines and skills | Learned tool-use patterns and preferred action sequences | Few-shot store, routine library, learned prompt templates | Pattern match on situation type | Repeated workflows, "do the usual" shortcuts, faster handling of recurring tasks | Tool-use exemplars, prompt template caches, learned-routine stores |
The table makes the architectural claim concrete: each cognitive memory type has a different storage shape, a different retrieval pattern, and a different failure mode when missing. That is the case for treating them as separate primitives rather than one big memory bucket.
Episodic memory in agents
The episodic layer is the event log. Every interaction the agent has — every turn, every tool call, every session — gets recorded with timestamps and identifiers. The store is usually append-only.
What it gives you: the ability to answer questions like "what did the user ask about last Tuesday," "show me the conversation where they mentioned the migration," and "what was the third thing we tried last session." It also gives you replay-debugging when something goes wrong — you can reconstruct exactly what state the agent was in.
What it does not give you: synthesized facts. The episodic store holds raw events. Asking "what does this user prefer for status updates" against an episodic-only store means scanning many past events and inferring the preference each time. That is expensive and noisy. You want the inferred preference promoted to the semantic layer.
Storage is typically a vector index over embedded turns or summarized sessions, with metadata for time, user, session, and tags. Some stacks use a Postgres table with full-text search — this works fine when episodic recall is mostly query-by-keyword rather than query-by-similarity. Letta's recall and archival storage both serve this layer; mem0's per-session add calls write events that can later be searched.
The retrieval pattern that works is similarity + metadata filter. Pure similarity over a long event log returns near-duplicates from many sessions; you almost always want to scope by user and time. The recall flow at the application layer often looks like: "fetch the last N turns of this session always, then on demand search the episodic store for similar past events from this user."
Semantic memory in agents
The semantic layer is the structured fact store. It holds the facts the agent should know about the user, the project, or the world, without needing to re-derive them from the event log.
Examples: "User prefers metric units." "User's company has 47 employees." "User's primary language is Spanish." "User is technical — skip the explanations of basic concepts." These are facts that are true across many sessions and should always (or nearly always) reach the prompt.
Storage shapes vary. The simplest is a row of structured fields per user — a profile table. More flexible is a key-value or document store of named facts. Most flexible is a deduplicated, embedded fact store where each fact is a short embedded snippet — this is the mem0 implementation guide pattern. Knowledge graphs work too, especially when relationships between facts matter.
Letta's human and persona blocks are semantic memory by construction. Both blocks are always in the agent's context window — they are not retrieved on demand because they are facts the agent should never have to look up. That is a key property of the semantic layer: the most stable, most-used facts can earn always-in-context status, while less stable facts can be retrieved on demand.
The hard problem in semantic memory is write policy: when do you add a fact, when do you update an existing fact, and when do you remove a stale one. mem0 solves this with an LLM-driven extraction-and-deduplication pass on every interaction. Letta solves it by giving the agent tools to write to its own memory blocks. The DIY pattern is usually a periodic batch job that re-extracts facts from the recent episodic log. None of these are settled best practice — pick one and instrument it.
Procedural memory in agents
The procedural layer is the most overlooked. It is the store of learned routines: the sequences of tool calls, prompt scaffolds, and action patterns the agent reaches for when a familiar situation comes up.
Concrete examples. "When the user says 'do the usual standup,' run this four-step sequence: pull yesterday's commits, summarize them by project, add open PRs, format as bullets." "When summarizing for this user, lead with metrics not narrative." "When the user asks for a draft email, use this voice and never include emojis." These are routines that emerged from past interactions and should be fast-path next time.
Most agents fake procedural memory by stuffing examples into the system prompt. That works for global procedures (every user gets the same examples) but breaks down for per-user routines (the system prompt would balloon). A real procedural-memory layer learns the pattern from interaction and retrieves it when a similar situation comes up.
Storage is typically a few-shot store keyed by situation type, or a routine library where each routine has a trigger pattern and a sequence of steps. Retrieval is pattern-matching: "what kind of request is this, and do I have a learned routine for it." The retrieved routine then enters the prompt as a scaffold rather than as raw context — "here is how you handled this last time."
The reason procedural memory matters is that it is where the agent gets faster at a particular user's repeated work. Without it, every "do the usual" request triggers fresh planning. With it, the agent recognizes the pattern and executes. Frameworks like LangGraph and the OpenAI Agents SDK do not ship a procedural-memory primitive out of the box, but you can layer one on by storing successful action sequences keyed by trigger and retrieving them in the planner.
Why most agents need a hybrid
Single-layer agents fail in predictable ways.
An agent with only episodic memory drowns in noise. It can recall every conversation but cannot answer simple personalization questions without scanning the whole log. It re-extracts the same facts every turn. The user feels like the agent has a great memory for what was said and a terrible memory for what they actually prefer.
An agent with only semantic memory loses the ability to reference specific events. "What did we talk about Tuesday" returns nothing because the agent does not store events, only facts. The agent feels stateless within a project even though it knows facts about the user.
An agent with no procedural memory keeps relearning the same workflows. Every "do the standup" request triggers fresh planning instead of recognizing the pattern. The agent gets no faster at the work it does most often. Users notice this as a kind of plateau — the agent is competent but not improving.
The right design uses all three with clear boundaries. Episodic for the event log. Semantic for the facts. Procedural for the routines. Each has its own write policy, retrieval pattern, and place in the prompt. For the broader argument about how these fit into the agent stack, see the Agentic Prompt Stack and the Context Engineering Maturity Model.
Architectural patterns
A concrete recipe for a hybrid memory architecture.
Storage. Three logical stores, even if they share infrastructure. Episodic: vector index over embedded turn summaries with metadata (user_id, session_id, timestamp, tags). Semantic: structured profile (rows or documents) plus an optional embedded fact store for less stable facts. Procedural: keyed routine library where each routine has a trigger description (embedded for similarity match), a sequence of tool calls or prompt scaffolds, and provenance ("learned from session X").
Write policy. Append to episodic on every interaction. Update semantic via an extraction pass, either inline (small agents, mem0-style) or batch (larger systems, periodic re-extraction). Add to procedural when an action sequence succeeds and the trigger is recognizably reusable — this is usually a deliberate "save this routine" call rather than automatic.
Recall flow. On a new user turn:
- Always include the most stable semantic facts in context (the always-in-context layer — Letta walkthrough for the canonical example).
- Search procedural for routines that match the current situation. If a high-confidence match exists, the routine becomes a scaffold for the agent's response.
- Search semantic for less stable facts relevant to the current request (vector retrieval over the embedded fact store).
- Optionally search episodic for specific past events the agent might need to reference. Skip this if the request is purely about facts or routines.
- Assemble the context: stable facts → matched routine → relevant facts → relevant past events → user turn.
- Call the model.
The order matters. Searching episodic first floods the context with raw history; searching semantic first gives you the facts that should always be present. Procedural sits between because it shapes how the agent will respond, not what it will respond about.
Always-in-context vs retrieved. This is the central design decision in semantic memory. Stable, frequently-used facts (user's name, role, language preference) earn always-in-context status — the prompt always carries them. Less stable or less frequently-used facts (specific project details, historical preferences) get retrieved on demand. The split is a budget problem: the always-in-context layer is precious. See the SurePrompts Quality Rubric for how to evaluate whether the prompt is well-shaped before you spend more compute on memory retrieval.
When the cognitive framing breaks down
Be honest about the limits of the analogy. LLMs do not have neuroscientific separation between episodic, semantic, and procedural systems. Their parameters mix everything from training. Any "memory" outside the parameters is whatever you choose to build. The model itself does not know which of your stores it is reading from.
The cognitive labels also do not perfectly map. Some memories are partly episodic and partly semantic. Procedural memory in humans is largely non-verbal, which is why we do not lose typing skill when we forget the experience of learning to type — the analogy to LLMs, which only operate in language, is loose. And cognitive science has many more distinctions than these three (working memory, perceptual memory, motor memory) that do not map cleanly to agent design at all.
The benefit of the framing is design clarity, not biological accuracy. When you separate episodic from semantic in your storage, you stop conflating two different retrieval problems. When you name procedural memory, you start noticing the times your agent re-derives the same routine. The labels earn their keep when they force design decisions; they stop earning their keep when you treat them as a literal description of the model. Use working memory and memory recall the same way — as borrowed structure, not as claims about what the model is doing inside.
For the relationship between this in-application memory layer and the model's context window, or its long-term memory features at the provider level, see the AI memory systems guide.
Common failure modes
A few patterns to watch for when implementing the layers.
Collapsing episodic and semantic into one big embedding store. Symptom: recall stops being precise. Asking "what does the user prefer" returns a mix of raw past turns and extracted facts, with the facts buried under more numerous events. Fix: separate the stores. Episodic stays append-only and noisy; semantic stays deduplicated and high-signal. Retrieval against each is independent.
Implementing only episodic and pretending it covers semantic. Symptom: agent asks the same questions every session. "What's your preferred unit system?" "What's your team size?" The information is in the event log, but the agent has no extracted-fact layer to make it cheap to retrieve. Fix: add the semantic layer with an extraction pass over recent episodic events.
Forgetting procedural entirely. Symptom: agent never gets faster at the user's repeated workflows. Every "do the standup" or "send the weekly update" triggers fresh planning. Fix: add a routine library, even a small one. Start with five hand-curated routines. Add a "save this routine" tool the agent can call when an action sequence succeeds.
Always-in-context bloat in the semantic layer. Symptom: the prompt grows turn by turn as more facts get promoted to always-in-context. Eventually you are paying for a 30k-token system prompt on every call. Fix: tier the semantic layer. Top tier (always in context) stays small — the five to ten most-used facts. Lower tier is retrieved on demand. See RCAF for how to keep the always-in-context part shaped well.
Stale semantic facts. Symptom: agent acts on facts that were true six months ago but are not now. Fix: every semantic fact carries a last_confirmed timestamp and an optional source episode. Periodic re-extraction updates the timestamp; facts past a staleness threshold get re-confirmed by the agent or expired.
Procedural drift. Symptom: a learned routine starts producing wrong outputs because the underlying tool or context changed. Fix: routines carry provenance and can be invalidated. The simplest version is a counter — every time the routine produces a "this doesn't look right" signal from the user, the routine is downgraded and eventually retired.
What to read next
If you want a wider survey of how Letta, mem0, LangGraph, and the OpenAI Agents SDK handle memory side by side, see agent memory architectures compared.
If you want to go deep on one framework, the Letta walkthrough shows how the human and persona blocks operationalize semantic memory while recall and archival storage cover the episodic layer.
If you want to see the multi-level memory pattern that spans episodic and semantic, the mem0 implementation guide walks through the extraction-and-deduplication approach.
For the broader context — how the in-application memory layer fits with within-session context and provider-managed memory — see the AI memory systems guide. For agent architecture more generally, see the Agentic Prompt Stack and the Context Engineering Maturity Model. For framework-specific prompting patterns that interact with memory, see the LangGraph guide and the OpenAI Agents SDK guide.