Skip to main content
Back to Blog
AI memorycontext engineeringChatGPT memoryClaude Projectsconversation historyprompt engineering

AI Memory Systems Guide (2026): Within-Session, Provider, and Application

How memory works in AI systems — within-session context, provider-managed memory like ChatGPT memory and Claude Projects, and application-managed custom memory.

SurePrompts Team
April 20, 2026
11 min read

TL;DR

AI memory comes in three shapes: within-session context, provider-managed memory (ChatGPT memory, Claude Projects), and application-managed custom memory. Each has different retention, invalidation, and privacy semantics.

"Memory" in AI systems is overloaded. It can mean the current conversation, a provider feature that persists facts across chats, or a custom retrieval layer on top of a model. These are three different systems with different semantics — and conflating them is how teams end up with leaky prompts or models that don't remember anything they should. This guide, under the context engineering pillar, disentangles the three shapes of AI memory.

The Three Shapes of AI Memory

Practical LLM memory falls into one of three shapes.

Within-session context is the messages in the current conversation. The model "remembers" them only because they're resent on every turn. End the session and the memory is gone unless something else wrote it down.

Provider-managed memory is a vendor feature operating on your behalf. ChatGPT's memory feature learns facts across conversations; Claude Projects gives you a space with shared files and instructions persisting across chats in that project. The provider runs storage and retrieval.

Application-managed memory is a retrieval layer you build. You decide what to save, where to store it (vector database, Postgres row, key-value store), and how it reaches the prompt. The model has no memory; your application has the memory and injects it into context.

The critical distinction: within-session and provider-managed memory live where you don't fully control. Application-managed memory lives where you put it. That difference drives most of the privacy and invalidation choices below.

Within-Session Context

The within-session shape is what most prompts implicitly assume. The "memory" is the message history plus any system prompt, resent on each turn.

What resets it. A new session, a new tab in some surfaces, clicking "new chat." Some products also silently compact or truncate history once it crosses a length threshold — the model "forgets" earlier turns even though the UI still shows them.

What persists it. Scrolling back to reference earlier messages keeps them in the prompt; some products allow pinning. Limits come from the model's context window and the product's budgeting rules — see context window management strategies for how products decide what to keep.

Within-session fits when the task lives inside one conversation: a drafting pass, a debug session, a one-off analysis. Wrong shape when you expect the model to know tomorrow what you told it today.

Provider-Managed Memory

Provider-managed memory is a feature you turn on and let the vendor operate. Two representative shapes:

ChatGPT memory learns facts across conversations — user preferences, recurring projects, stable context the user wants the assistant to keep in mind. It can be viewed, edited, and turned off.

Claude Projects scopes a set of conversations around shared files, system instructions, and project-level context. Chats inside the project see that context; chats outside don't. It's memory by scope rather than accumulated facts — a project carries a shared brief across related chats.

Pros. Zero infrastructure. The UI handles retention and deletion. Integration with the model is tight.

Cons. Opaque semantics. You can't inspect retrieval decisions. Controls are the vendor's. And memory only works in that vendor's product — a fact ChatGPT "remembers" isn't available to Claude or your own app.

Provider-managed memory fits personal, within-vendor continuity. It's the wrong shape when memory needs to be portable, auditable, or programmatically controlled by your application.

Application-Managed Memory

Application-managed memory is what teams build when the task needs retention the provider doesn't offer. You define the store, the write policy, the retrieval policy, and how retrieved material enters the prompt.

Common shapes:

  • Vector store of notes. Embedded snippets keyed to a user; retrieval pulls top-k relevant notes into each prompt.
  • Structured profile. A row per user holding known fields (preferences, entitlements, open tasks), read on demand.
  • Episodic log. An append-only log of past interactions, summarized or filtered before injection.
  • Long-running task state. For agents, the scratchpad across sessions — what's been tried, pending, or failed.

You read the store, assemble it into the prompt — often alongside real-time retrieval — then let the model reason. Patterns like hierarchical context loading and dynamic context assembly are how application-managed memory gets shaped into a prompt.

Necessary whenever you need portable memory, programmatic control over what's remembered and forgotten, cross-session agent state, or auditability. The cost is real engineering: stores stay in sync, retrieval stays relevant, injection stays inside the budget.

Retention and Invalidation

Memory without invalidation becomes wrong memory.

Within-session. Invalidation is implicit — the next turn sees the current history, including corrections. Outside the session, nothing persisted, so there's nothing to invalidate.

Provider-managed. Updates happen through the provider's controls. ChatGPT memory supports adding, editing, and removing entries via the UI; Claude Projects lets you edit the project's shared context. Override semantics are the provider's call.

Application-managed. You own the invalidation policy — last-write-wins on structured fields, timestamp-weighted retrieval on notes, explicit "forget" endpoints, TTLs on volatile state. If you need "user asked us to delete their preferences" to be genuinely honored and auditable, application-managed is the only shape that makes that trivial.

Failure mode: long-running systems without invalidation accumulate stale notes. Retrieval surfaces something true a year ago into a prompt about today, and the model contradicts reality. Every memory system needs a story for how wrong entries get out.

Privacy and Data Boundaries

Each shape has a different data boundary, and mixing them is how leaks happen.

Within-session. The conversation goes to the provider under your account's terms. Nothing stays behind beyond their logs and training-data policies.

Provider-managed. Memory lives on the provider's infrastructure under their terms. Consumer, enterprise, and API tiers typically have different terms; if you don't know which applies, you can't responsibly use provider memory for sensitive data.

Application-managed. Memory lives where you put it — your database, your region, your controls. That's the point, and what makes compliance stories workable.

Three practical rules: don't put provider-managed memory in front of data whose handling you don't own end-to-end; don't leak application-managed memory into a provider context that logs or trains on it unless your terms allow; treat memory boundaries as authorization — a fact about user A must not be retrievable into a prompt serving user B.

Memory for Agents

Agents push memory past anything a chat product needs. A useful agent operates across sessions, remembers what it tried, knows what's pending, and doesn't repeat failed work. Three needs show up specifically:

  • Episodic memory. A log of past runs — attempts, successes, user corrections. Grounds "based on what we did last time" reasoning.
  • Task state. Work the agent was mid-executing when a session ended. Picked back up next run.
  • World knowledge. Stable facts about the user, environment, and domain that don't change between sessions.

All three are almost always application-managed. Provider memory wasn't designed for agents, and within-session memory dies with the session. For any agent that lives longer than a single turn, memory is its own subsystem.

Choosing the Right Shape

Match shape to task:

DimensionWithin-sessionProvider-managedApplication-managed
LifetimeThis conversationAcross the vendor's conversationsWhatever your policy says
Storage locationIn-flight with the requestProvider's infrastructureYours
InvalidationOverwrite by later turnsVendor UI controlsYour code
Portability across vendorsNoneNoneFull
Privacy controlTerms of the requestVendor's termsYours
Audit trailConversation logProvider-dependentAs detailed as you build
Engineering effortNoneNear-zeroSignificant
Best forOne-off tasksPersonal continuity within a vendorMulti-session apps, agents, compliance-heavy workloads

Default reasoning: if the task fits in one conversation, don't add a memory system. If the user wants continuity inside a vendor's product, provider-managed is probably enough. If you're building a product, an agent, or anything cross-vendor or audited, you're in application-managed territory.

Example Memory-Aware Prompt Pattern

Hypothetical — a support assistant prompt drawing on application-managed memory.

code
[SYSTEM]
You are a customer support assistant for ACME. You have access to:
1. The current conversation (most recent 10 turns).
2. A PROFILE block with stable facts about this user (fetched from our DB).
3. A RECENT_ISSUES block summarizing the user's last 3 support tickets.
4. A KB_SNIPPETS block of up to 5 knowledge-base excerpts retrieved by
   relevance to the latest user message.

Rules:
- Treat PROFILE and RECENT_ISSUES as authoritative about the user.
- Treat KB_SNIPPETS as authoritative about product behavior.
- If PROFILE and KB_SNIPPETS conflict, prefer KB_SNIPPETS and flag the
  conflict to the user.
- Never claim to remember anything not in these blocks.

[CONTEXT]
PROFILE:
{profile_block}

RECENT_ISSUES:
{recent_issues_block}

KB_SNIPPETS:
{kb_snippets_block}

[USER]
{user_message}

The shape: the application is the memory. The prompt names what the model may treat as remembered, where conflicts go, and forbids invented continuity beyond the blocks. Every remembered fact is an explicit block — nothing is left to a hidden "memory" feature. For the broader patterns, see context engineering best practices.

Common Anti-Patterns

  • Mixing shapes without naming it. Using provider memory for some facts and application memory for others, with no rule about which wins, creates ghost-remembered facts that surface unpredictably.
  • Over-relying on provider memory. Treating ChatGPT memory or Claude Projects as your product's memory means your product breaks the day the user switches vendors — and you can't audit what the model saw.
  • Building custom when provider suffices. A solo user who wants their assistant to remember their cat's name doesn't need a vector store. Matching scope matters both ways.
  • No invalidation story. Append-only memory becomes wrong memory. How a corrected fact overrides an old one is a design question, not a future problem.
  • Leaky boundaries. Memory about user A retrievable into a prompt serving user B. An authorization bug that shows up in the memory layer.
  • Treating within-session as persistent. True in a session, false outside it. Any feature that depends on memory needs an explicit mechanism.

FAQ

Do LLMs have memory?

Not on their own. A base LLM is stateless — every request is independent. What looks like memory is one of three shapes: the conversation being resent, a provider feature storing facts on your behalf, or an application retrieving and injecting facts into the prompt.

Can I rely on ChatGPT memory or Claude Projects for production features?

For personal continuity inside those products, yes. As the memory layer of a product you're building, no — you can't inspect retrieval decisions, enforce your own invalidation rules, or carry memory across vendors. Build an application-managed layer when the feature matters.

Where does RAG fit in this?

Retrieval-augmented generation is one way to implement application-managed memory. RAG usually means vector retrieval over a document corpus; memory-focused RAG retrieves past interactions, user profile data, and task state. Same mechanism, different source material. See context engineering for the broader picture.

How much should I worry about memory privacy?

A lot. Memory is where personal data accumulates over time, often outside the user's direct awareness. For any feature touching user data, map which shape holds what, the retention, how deletion works, and the provider's terms. If you can't draw that map, don't ship the feature.

Should agents use provider memory?

Generally no. Provider memory is designed for personal continuity inside a chat product. Agents need episodic memory, task state, and domain-specific structure that provider memory doesn't expose.

Wrap-Up

Memory in AI systems is not one thing. Within-session is the conversation, resent. Provider-managed is a vendor feature. Application-managed is a retrieval layer you own. Each has a different lifetime, invalidation model, privacy shape, and engineering cost. Treating them as interchangeable is the mistake — the right shape for a solo user's preferences is the wrong shape for a production agent. Name the shape, own its invalidation, draw its data boundary, and only then worry about the prompt.

For the broader frame, the context engineering pillar. For composing memory into prompts, hierarchical context loading and dynamic context assembly patterns. For principles around the prompt those memory blocks land in, context engineering best practices. For the underlying discipline, context engineering.

Try it yourself

Build expert-level prompts from plain English with SurePrompts — 350+ templates with real-time preview.

Open Prompt Builder

Get ready-made ChatGPT prompts

Browse our curated ChatGPT prompt library — tested templates you can use right away, no prompt engineering required.

Browse ChatGPT Prompts