Skip to main content

Working Memory

Working memory is short-term active memory that holds the current task context. From cognitive psychology, where working memory is the limited-capacity workspace for current reasoning. In LLM agents, working memory maps directly onto the model's in-context window — what the model can "see" right now to reason over. It is distinct from long-term memory (persistent storage that must be recalled into working memory before use) and from conversation memory (session-scoped, often partially equivalent). The working-memory limit is the hard ceiling on what the agent can hold in mind simultaneously, regardless of how much long-term memory it has.

Example

A research agent with a 1M-token context window has a large working memory; it can hold a long brief, retrieved sources, and the running plan all at once. The same agent on a 32K-token model has small working memory and must page material in and out via recall calls. The architecture choice — push everything into context vs recall on demand — is a working-memory budget decision.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts