Episodic vs Semantic Memory for AI Agents (2026)

Q: What is the difference between episodic and semantic memory?

Episodic memory is memory of specific events tied to a time and place — what happened, when, and where you were. Semantic memory is memory of general facts independent of when you learned them. The distinction comes from cognitive science (Tulving, 1972), and the canonical example is the difference between remembering 'I had coffee with Maria last Tuesday' (episodic) and 'Maria is the head of design' (semantic). The first is anchored in a particular moment; the second is a fact you carry around without remembering when you first heard it. The two interact — episodic memories often consolidate into semantic facts over time — but the retrieval patterns are different. Episodic recall asks 'when did this happen,' while semantic recall asks 'what is true.' For agent design, that maps to two different storage and retrieval primitives.

Q: What is procedural memory in AI agents?

Procedural memory is memory of how to do something — the learned routines, sequences, and shortcuts an agent reaches for without re-deriving them each time. In humans, procedural memory is what makes typing or driving feel automatic. In agents, it shows up as learned tool-use patterns ('when the user says do the usual, run these four steps'), preferred prompt scaffolds ('summaries for this user always lead with metrics'), or a few-shot store of past successful action sequences. Most agents fake procedural memory by stuffing examples into the system prompt. A real procedural-memory layer learns the patterns from interaction and retrieves them when a similar situation comes up. It is the most overlooked of the three memory types and often the highest-leverage one to add — it is what makes an agent get faster at the workflows a particular user repeats.

Q: Do AI agents actually need three memory types?

Most production agents do, even when teams have not consciously designed for it. An agent with only episodic memory drowns in noise — it can recall every conversation but cannot answer 'what does this user prefer.' An agent with only semantic memory loses the ability to reference specific events ('what did we discuss Tuesday'). An agent with no procedural memory keeps relearning the same workflows turn after turn. Simple agents can collapse the layers (a shopping bot might only need a structured profile), but as soon as the agent runs across multiple sessions, handles repeated workflows, or needs to reference past interactions, the three layers earn their keep. The argument is not 'always build all three.' It is 'name which layers you have and which you are skipping, so you know what your agent will be bad at.'

Q: How does episodic memory map to vector databases?

Vector databases are the most common storage for the episodic layer because episodic recall often happens by similarity rather than exact match — 'find the past conversation that looks like this current one' is a vector-search query. Each interaction (a turn, a session, or a summarized event) is embedded and stored with metadata: timestamp, user ID, session ID, tags. Retrieval typically combines semantic similarity with metadata filters ('messages from last week about the migration'). The pattern is not exclusive to vectors — append-only Postgres tables with full-text search work too — but vectors handle paraphrased recall well, which is what users actually do ('what was that thing we talked about'). The key is keeping episodic storage separate from semantic storage; merging them tends to degrade both.

Q: Is the cognitive science framing accurate for LLMs?

Not literally, no. LLMs do not have neuroscientific separation between episodic, semantic, and procedural systems. Their parameters mix everything they were trained on, and any memory layer outside the parameters is whatever you choose to build. The cognitive labels are an architectural lens, not a claim about model internals. The benefit is design clarity: when you separate episodic from semantic from procedural in your storage, you stop conflating retrieval problems that actually have different solutions. Treat the framing as a useful organizing principle borrowed from a different field — the same way 'pipeline,' 'stack,' and 'cache' borrow from physical systems they do not literally map to. The taxonomy earns its keep when it forces design decisions, not when it claims to describe what the model is doing.

Q: Where do Letta's memory blocks fit in this taxonomy?

Letta's core memory blocks (`human` and `persona`) are semantic memory — they hold structured, deduplicated facts about the user and the agent that should always be in context. Letta's recall storage and archival storage hold the episodic layer: full conversation history (recall) and longer-term embedded archives (archival), both retrieved on demand rather than always-in-context. Letta does not have an explicit procedural-memory primitive in the same way; learned routines are typically encoded in the persona block or in the agent's tool definitions rather than in a separate learned-routine store. The split is useful: Letta makes the semantic / episodic boundary an architectural feature rather than a happy accident, which is why it is a cleaner mental model than 'one big memory store.' For a deeper walkthrough see the [Letta walkthrough](/blog/letta-memgpt-walkthrough).

SurePrompts Team

Episodic memory is memory of specific events — what happened, when, and where. Semantic memory is memory of general facts independent of when they were learned. Procedural memory is memory of how to do something. The three-way split comes from cognitive science (Tulving, 1972) and turns out to be a surprisingly useful architectural lens for designing AI agent memory.

This post is for engineers building agents that need to remember things across sessions. Most teams discover the three memory types by accident — they build an event log, then realize they need a fact store, then realize the agent keeps re-deriving the same workflows. Naming the layers up front saves a refactor. The cognitive framing is the entry point; the destination is implementation. For the broader memory landscape — within-session context, provider memory, application memory — see the AI memory systems guide.

Tip

Most production agents end up with three memory primitives — an event log, a fact store, and a learned-routine cache — even when teams don't realize that's what they've built. The cognitive labels (episodic, semantic, procedural) are useful precisely because they force you to name the layer before you build it.

Key takeaways

The three-way split — episodic memory, semantic memory, procedural memory — comes from Tulving's 1972 work in cognitive psychology. It is widely used as a teaching frame even where the boundaries blur in practice.
In agent design, episodic maps to the event log, semantic maps to the structured fact store, and procedural maps to learned routines and reusable action sequences.
Most production agents need all three, even simple ones. Collapsing them into one big store is the most common failure mode.
The cognitive labels are an architectural lens, not a claim about model internals. Use them to organize what you build, not to describe what the model is doing.
Procedural memory is the most overlooked layer. Agents without it relearn the same workflows turn after turn — a quiet tax on every repeated user task.
The right retrieval flow usually goes semantic-first, procedural-second, episodic-on-demand. Searching the event log for every turn is expensive and noisy.
Frameworks expose the layers differently. Letta separates them cleanly; mem0 implements multi-level memory that spans episodic and semantic; many DIY stacks have all three but no labels for them.

The three memory types in cognitive science

Cognitive psychology distinguishes between several long-term memory systems. Three of them concern us here.

Episodic memory is memory of specific events that happened to you. "I had coffee with Maria last Tuesday at the cafe near the office" is episodic. The memory is anchored in a particular time and place. You can replay it. If pressed, you can give it a date.

Semantic memory is memory of general facts you hold without remembering when you learned them. "Maria is the head of design" is semantic. So is "Paris is the capital of France." You did not file these facts with a timestamp; you just know them. Semantic memories often consolidate from many episodic encounters — you met Maria several times before "Maria is head of design" became a freestanding fact.

Procedural memory is memory of how to do something. Riding a bike. Touch-typing. Composing a status update in your team's preferred format. Procedural memories are usually not verbalizable — you can do the thing without being able to fully describe how. They are also robust: skills survive long after the conscious memory of learning them has faded.

Two honest caveats before we map this to agents. First, the three categories blur in practice — a memory can be partly episodic and partly semantic, and consolidation moves things between them. Second, modern cognitive science uses many more distinctions than these three. The Tulving split is a useful teaching frame; treat it as a starting taxonomy, not a complete one.

Mapping to AI agents

The cognitive frame becomes interesting when you notice that agent memory implementations naturally cluster around the same three shapes. Here is the mapping.

Memory type	Cognitive definition	Agent-implementation analog	Typical storage	Typical retrieval	When this layer is load-bearing	Common framework primitive
Episodic	Specific events tied to time and place	Event log of past interactions	Vector store + metadata, append-only table, message archive	Similarity + metadata filter ("conversations from last week about X")	Replay debugging, "what did we discuss," cross-session reference	Letta recall storage, mem0 per-session add, conversation tables
Semantic	General facts independent of when learned	Structured fact store about user, project, or world	Key-value, structured rows, knowledge graph, deduplicated extracted memories	Direct lookup or vector retrieval over fact embeddings	Personalization, avoiding repeat questions, cross-session continuity	Letta `human` and `persona` blocks, mem0 user-level memory, profile tables
Procedural	Learned routines and skills	Learned tool-use patterns and preferred action sequences	Few-shot store, routine library, learned prompt templates	Pattern match on situation type	Repeated workflows, "do the usual" shortcuts, faster handling of recurring tasks	Tool-use exemplars, prompt template caches, learned-routine stores

The table makes the architectural claim concrete: each cognitive memory type has a different storage shape, a different retrieval pattern, and a different failure mode when missing. That is the case for treating them as separate primitives rather than one big memory bucket.

Episodic memory in agents

The episodic layer is the event log. Every interaction the agent has — every turn, every tool call, every session — gets recorded with timestamps and identifiers. The store is usually append-only.

What it gives you: the ability to answer questions like "what did the user ask about last Tuesday," "show me the conversation where they mentioned the migration," and "what was the third thing we tried last session." It also gives you replay-debugging when something goes wrong — you can reconstruct exactly what state the agent was in.

What it does not give you: synthesized facts. The episodic store holds raw events. Asking "what does this user prefer for status updates" against an episodic-only store means scanning many past events and inferring the preference each time. That is expensive and noisy. You want the inferred preference promoted to the semantic layer.

Storage is typically a vector index over embedded turns or summarized sessions, with metadata for time, user, session, and tags. Some stacks use a Postgres table with full-text search — this works fine when episodic recall is mostly query-by-keyword rather than query-by-similarity. Letta's recall and archival storage both serve this layer; mem0's per-session add calls write events that can later be searched.

The retrieval pattern that works is similarity + metadata filter. Pure similarity over a long event log returns near-duplicates from many sessions; you almost always want to scope by user and time. The recall flow at the application layer often looks like: "fetch the last N turns of this session always, then on demand search the episodic store for similar past events from this user."

Semantic memory in agents

The semantic layer is the structured fact store. It holds the facts the agent should know about the user, the project, or the world, without needing to re-derive them from the event log.

Examples: "User prefers metric units." "User's company has 47 employees." "User's primary language is Spanish." "User is technical — skip the explanations of basic concepts." These are facts that are true across many sessions and should always (or nearly always) reach the prompt.

Storage shapes vary. The simplest is a row of structured fields per user — a profile table. More flexible is a key-value or document store of named facts. Most flexible is a deduplicated, embedded fact store where each fact is a short embedded snippet — this is the mem0 implementation guide pattern. Knowledge graphs work too, especially when relationships between facts matter.

Letta's human and persona blocks are semantic memory by construction. Both blocks are always in the agent's context window — they are not retrieved on demand because they are facts the agent should never have to look up. That is a key property of the semantic layer: the most stable, most-used facts can earn always-in-context status, while less stable facts can be retrieved on demand.

The hard problem in semantic memory is write policy: when do you add a fact, when do you update an existing fact, and when do you remove a stale one. mem0 solves this with an LLM-driven extraction-and-deduplication pass on every interaction. Letta solves it by giving the agent tools to write to its own memory blocks. The DIY pattern is usually a periodic batch job that re-extracts facts from the recent episodic log. None of these are settled best practice — pick one and instrument it.

Procedural memory in agents

The procedural layer is the most overlooked. It is the store of learned routines: the sequences of tool calls, prompt scaffolds, and action patterns the agent reaches for when a familiar situation comes up.

Concrete examples. "When the user says 'do the usual standup,' run this four-step sequence: pull yesterday's commits, summarize them by project, add open PRs, format as bullets." "When summarizing for this user, lead with metrics not narrative." "When the user asks for a draft email, use this voice and never include emojis." These are routines that emerged from past interactions and should be fast-path next time.

Most agents fake procedural memory by stuffing examples into the system prompt. That works for global procedures (every user gets the same examples) but breaks down for per-user routines (the system prompt would balloon). A real procedural-memory layer learns the pattern from interaction and retrieves it when a similar situation comes up.

Storage is typically a few-shot store keyed by situation type, or a routine library where each routine has a trigger pattern and a sequence of steps. Retrieval is pattern-matching: "what kind of request is this, and do I have a learned routine for it." The retrieved routine then enters the prompt as a scaffold rather than as raw context — "here is how you handled this last time."

The reason procedural memory matters is that it is where the agent gets faster at a particular user's repeated work. Without it, every "do the usual" request triggers fresh planning. With it, the agent recognizes the pattern and executes. Frameworks like LangGraph and the OpenAI Agents SDK do not ship a procedural-memory primitive out of the box, but you can layer one on by storing successful action sequences keyed by trigger and retrieving them in the planner.

Why most agents need a hybrid

Single-layer agents fail in predictable ways.

An agent with only episodic memory drowns in noise. It can recall every conversation but cannot answer simple personalization questions without scanning the whole log. It re-extracts the same facts every turn. The user feels like the agent has a great memory for what was said and a terrible memory for what they actually prefer.

An agent with only semantic memory loses the ability to reference specific events. "What did we talk about Tuesday" returns nothing because the agent does not store events, only facts. The agent feels stateless within a project even though it knows facts about the user.

An agent with no procedural memory keeps relearning the same workflows. Every "do the standup" request triggers fresh planning instead of recognizing the pattern. The agent gets no faster at the work it does most often. Users notice this as a kind of plateau — the agent is competent but not improving.

The right design uses all three with clear boundaries. Episodic for the event log. Semantic for the facts. Procedural for the routines. Each has its own write policy, retrieval pattern, and place in the prompt. For the broader argument about how these fit into the agent stack, see the Agentic Prompt Stack and the Context Engineering Maturity Model.

Architectural patterns

A concrete recipe for a hybrid memory architecture.

Storage. Three logical stores, even if they share infrastructure. Episodic: vector index over embedded turn summaries with metadata (user_id, session_id, timestamp, tags). Semantic: structured profile (rows or documents) plus an optional embedded fact store for less stable facts. Procedural: keyed routine library where each routine has a trigger description (embedded for similarity match), a sequence of tool calls or prompt scaffolds, and provenance ("learned from session X").

Write policy. Append to episodic on every interaction. Update semantic via an extraction pass, either inline (small agents, mem0-style) or batch (larger systems, periodic re-extraction). Add to procedural when an action sequence succeeds and the trigger is recognizably reusable — this is usually a deliberate "save this routine" call rather than automatic.

Recall flow. On a new user turn:

Always include the most stable semantic facts in context (the always-in-context layer — Letta walkthrough for the canonical example).
Search procedural for routines that match the current situation. If a high-confidence match exists, the routine becomes a scaffold for the agent's response.
Search semantic for less stable facts relevant to the current request (vector retrieval over the embedded fact store).
Optionally search episodic for specific past events the agent might need to reference. Skip this if the request is purely about facts or routines.
Assemble the context: stable facts → matched routine → relevant facts → relevant past events → user turn.
Call the model.

The order matters. Searching episodic first floods the context with raw history; searching semantic first gives you the facts that should always be present. Procedural sits between because it shapes how the agent will respond, not what it will respond about.

Always-in-context vs retrieved. This is the central design decision in semantic memory. Stable, frequently-used facts (user's name, role, language preference) earn always-in-context status — the prompt always carries them. Less stable or less frequently-used facts (specific project details, historical preferences) get retrieved on demand. The split is a budget problem: the always-in-context layer is precious. See the SurePrompts Quality Rubric for how to evaluate whether the prompt is well-shaped before you spend more compute on memory retrieval.

When the cognitive framing breaks down

Be honest about the limits of the analogy. LLMs do not have neuroscientific separation between episodic, semantic, and procedural systems. Their parameters mix everything from training. Any "memory" outside the parameters is whatever you choose to build. The model itself does not know which of your stores it is reading from.

The cognitive labels also do not perfectly map. Some memories are partly episodic and partly semantic. Procedural memory in humans is largely non-verbal, which is why we do not lose typing skill when we forget the experience of learning to type — the analogy to LLMs, which only operate in language, is loose. And cognitive science has many more distinctions than these three (working memory, perceptual memory, motor memory) that do not map cleanly to agent design at all.

The benefit of the framing is design clarity, not biological accuracy. When you separate episodic from semantic in your storage, you stop conflating two different retrieval problems. When you name procedural memory, you start noticing the times your agent re-derives the same routine. The labels earn their keep when they force design decisions; they stop earning their keep when you treat them as a literal description of the model. Use working memory and memory recall the same way — as borrowed structure, not as claims about what the model is doing inside.

For the relationship between this in-application memory layer and the model's context window, or its long-term memory features at the provider level, see the AI memory systems guide.

Common failure modes

A few patterns to watch for when implementing the layers.

Collapsing episodic and semantic into one big embedding store. Symptom: recall stops being precise. Asking "what does the user prefer" returns a mix of raw past turns and extracted facts, with the facts buried under more numerous events. Fix: separate the stores. Episodic stays append-only and noisy; semantic stays deduplicated and high-signal. Retrieval against each is independent.

Implementing only episodic and pretending it covers semantic. Symptom: agent asks the same questions every session. "What's your preferred unit system?" "What's your team size?" The information is in the event log, but the agent has no extracted-fact layer to make it cheap to retrieve. Fix: add the semantic layer with an extraction pass over recent episodic events.

Forgetting procedural entirely. Symptom: agent never gets faster at the user's repeated workflows. Every "do the standup" or "send the weekly update" triggers fresh planning. Fix: add a routine library, even a small one. Start with five hand-curated routines. Add a "save this routine" tool the agent can call when an action sequence succeeds.

Always-in-context bloat in the semantic layer. Symptom: the prompt grows turn by turn as more facts get promoted to always-in-context. Eventually you are paying for a 30k-token system prompt on every call. Fix: tier the semantic layer. Top tier (always in context) stays small — the five to ten most-used facts. Lower tier is retrieved on demand. See RCAF for how to keep the always-in-context part shaped well.

Stale semantic facts. Symptom: agent acts on facts that were true six months ago but are not now. Fix: every semantic fact carries a last_confirmed timestamp and an optional source episode. Periodic re-extraction updates the timestamp; facts past a staleness threshold get re-confirmed by the agent or expired.

Procedural drift. Symptom: a learned routine starts producing wrong outputs because the underlying tool or context changed. Fix: routines carry provenance and can be invalidated. The simplest version is a counter — every time the routine produces a "this doesn't look right" signal from the user, the routine is downgraded and eventually retired.

What to read next

If you want a wider survey of how Letta, mem0, LangGraph, and the OpenAI Agents SDK handle memory side by side, see agent memory architectures compared.

If you want to go deep on one framework, the Letta walkthrough shows how the human and persona blocks operationalize semantic memory while recall and archival storage cover the episodic layer.

If you want to see the multi-level memory pattern that spans episodic and semantic, the mem0 implementation guide walks through the extraction-and-deduplication approach.

For the broader context — how the in-application memory layer fits with within-session context and provider-managed memory — see the AI memory systems guide. For agent architecture more generally, see the Agentic Prompt Stack and the Context Engineering Maturity Model. For framework-specific prompting patterns that interact with memory, see the LangGraph guide and the OpenAI Agents SDK guide.