LangGraph Prompting Guide: How to Build Stateful Multi-Agent LLM Apps (2026)

Q: What is LangGraph?

LangGraph is a Python framework from the LangChain team for building stateful, multi-actor LLM applications as directed graphs. You declare a typed state object, register nodes (functions that read state and return updates), and connect them with edges — including conditional edges that branch on LLM output. The runtime executes the graph, persists state via a checkpointer, and supports streaming, replay, and human-in-the-loop pauses. It is the layer underneath many production LangChain agents that need branching control flow, persistent memory across turns, or coordinated work between several specialist nodes. LangGraph integrates cleanly with LangChain primitives but does not require them — you can use any model client and any tooling. The mental model to keep: nodes are pure-ish state transformers, edges are control flow, and state is the contract everyone shares.

Q: How is LangGraph different from LangChain?

LangChain is a library of LLM building blocks — model clients, prompt templates, output parsers, retrievers, tool wrappers, and chains that compose them sequentially. LangGraph is a runtime for arranging those blocks into a stateful graph with branching, loops, persistence, and human-in-the-loop. LangChain answers 'how do I call a model with retrieval and structured output.' LangGraph answers 'how do I run a planner, dispatch to specialist workers, route based on a router LLM, pause for approval, and resume tomorrow.' LangChain is enough for single-prompt apps and linear pipelines. LangGraph earns its complexity once you have branching control flow, multiple coordinating actors, or state that must outlive a single invocation. The two compose — most LangGraph nodes internally use LangChain runnables to call models and parse output.

Q: When should I use LangGraph vs a single LLM call?

Use a single LLM call when one prompt with retrieval and structured output is enough — most chatbot and content-generation tasks. Reach for LangGraph when at least one of these is true: the workflow branches on intermediate output (a router decides 'classify, then route to A or B'), multiple specialist roles must coordinate with isolated context, you need a human-in-the-loop pause that survives a process restart, or the same conversation persists across sessions and needs replayable state. The honest rule: if your app fits in one prompt today and has not yet broken under load, do not adopt LangGraph yet. The cost is real — typed state, graph wiring, and checkpointer setup are non-trivial. Adopt it when control flow is the actual problem, not because graphs feel sophisticated.

Q: How do I design state in LangGraph?

State is the load-bearing decision in any LangGraph app, because every node prompt reads from and writes to it. Start by listing the fields nodes actually need — inputs, intermediate artifacts, the final output, plus any router decisions or counters that drive control flow. Type it as a TypedDict (or use the built-in MessagesState if your app is mostly a conversation). Treat each field as part of an interface contract: rename it and every prompt that references it must update. Keep state lean — long observations belong in external storage with a pointer in state, not inline. Use reducer functions to control how updates merge (append vs overwrite). And version it explicitly when shipping changes, so checkpointer-restored runs do not deserialize into a schema that no longer exists.

Q: How does human-in-the-loop work in LangGraph?

LangGraph exposes an interrupt primitive that pauses graph execution at a specified point and surfaces a payload to the caller. The runtime checkpoints the full state, returns control, and waits. When the human responds, you resume by passing a Command (or equivalent resume input) and execution continues from the interrupted node with the human's input merged into state. The prompt-engineering work is the payload design: include only what the reviewer needs to make a decision (the candidate output, the source material, the proposed action), label fields explicitly, and keep the schema stable across versions so the UI does not break. Pair interrupt with a persistent checkpointer (SQLite or Postgres in production) so pauses survive process restarts — an in-memory checkpointer loses state when the worker dies.

Q: What are LangGraph's main failure modes?

Four show up repeatedly. First, state bloat — long observations and full message histories accumulate in state, every node pays the token cost, and runs slow down or hit context limits. Trim aggressively and store large artifacts externally. Second, router hallucination — a conditional-edge LLM emits 'maybe_finalize' instead of one of the allowed enum values and the graph dead-ends. Validate router output against an allowed set and route invalid output to a recovery node. Third, over-reading nodes — a worker prompt asks the model to consider every state field and the worker drifts off-task. Pass nodes only the fields they need. Fourth, infinite loops — a node routes back to itself or two nodes ping-pong forever. Set a max-step budget on compile, add cycle counters in state, and route to a terminal failure node when the budget is exhausted.

SurePrompts Team

LangGraph is the LangChain team's Python framework for building stateful, multi-actor LLM applications as directed graphs. You declare a typed state object, register nodes that read it and return updates, wire them with edges — including conditional edges that route on LLM output — compile the graph, and invoke or stream it. Persistence, human-in-the-loop pauses, and replay come from a pluggable checkpointer.

For prompt engineering, the shift is concrete: instead of one long prompt that tries to plan, execute, and synthesize, you write many small prompts that each do one job and share an explicit state contract. The hard parts move from "make the prompt clever" to "design state and the routing decisions on top of it."

Tip

Adopt LangGraph when control flow is the actual problem — branching on intermediate output, coordinating specialists, or pausing for a human. Until then, a single prompt with retrieval and structured output is usually the right tool.

Key takeaways:

LangGraph is a graph runtime, not a prompt template. The graph defines control flow; nodes do the LLM work.
State is the contract every node prompt reads from and writes to. Get the schema right before writing any prompts.
Node prompts should name the node's job, reference state fields by name, and return a state delta — not free-form prose.
Conditional edges work best when the routing model emits one of a small enum of values, validated before dispatch.
interrupt plus a persistent checkpointer is the standard pattern for human-in-the-loop that survives restarts.
The default failure modes are state bloat, router hallucination, over-reading nodes, and infinite loops. Each has a known fix.
LangGraph integrates with LangChain but does not require it. Any model client and any tool layer works inside a node.

What LangGraph Is

LangGraph is a Python-first framework for declaring agent graphs — directed graphs where nodes are functions (often LLM calls) and edges are control flow. The core abstraction is StateGraph: you instantiate it with a state schema, call add_node to register node functions, call add_edge and add_conditional_edges to wire them, then compile the graph into a runnable. The runnable exposes invoke, stream, and async equivalents — the same execution surface most LangChain users already know.

Two things distinguish it from a plain chain. First, state is explicit and typed: every node receives the current state, returns a partial update, and the runtime merges those updates according to per-field reducers you can override. Second, control flow is part of the program, not implicit in the order of chain.pipe(...) calls — conditional edges branch based on the output of a node (often an LLM that returns a label), and you can build cycles, retries, and dispatch fan-outs without monkey-patching a sequential chain.

LangGraph also ships persistence as a first-class concern. A checkpointer (in-memory, SQLite, or Postgres at the API level — pick what your deployment supports) writes state at every super-step, so a graph can be paused, resumed, replayed, or branched. This is what makes agent orchestration over multi-turn conversations and human-in-the-loop workflows tractable in production.

It is Python-first but framework-neutral: you can use any model client (OpenAI, Anthropic, your own gateway), any tool wrapper, and any retriever. LangChain runnables drop in cleanly inside nodes, but you can write a node that calls an HTTP endpoint with httpx and never touch LangChain at all.

LangGraph vs LangChain

The two are often confused because they ship from the same team and share primitives. They solve different problems.

Dimension	LangChain	LangGraph
Primary primitive	Runnable / chain	StateGraph + nodes + edges
State model	Implicit (passed through pipe)	Explicit typed state with reducers
Control flow	Sequential composition	Branching, loops, conditional dispatch
When to use	Single prompt with retrieval, linear pipeline	Branching workflows, multi-actor coordination, HITL
Learning curve	Low — chain.pipe(...)	Higher — state schema, edges, checkpointers

LangChain answers "how do I call a model with retrieval and structured output." LangGraph answers "how do I run a planner, dispatch to specialist workers, route based on a router LLM, pause for human approval, and resume tomorrow." Most production LangGraph apps use LangChain runnables inside nodes — the two compose, they do not compete. For a broader view of where this fits, see the Agentic Prompt Stack and the multi-agent prompting guide.

Designing State — the Load-Bearing Decision

State is the contract every node prompt reads from and writes to. Get it wrong and every node prompt is fighting the schema. Get it right and the prompts become small, focused, and almost boring.

Two starting points. MessagesState is built in: a list of messages with an add_messages reducer that appends rather than overwrites. It fits chat-shaped apps, simple ReAct-style tool loops, and anything where the conversation history is the state. The other path is a custom TypedDict (or Pydantic model) where you declare exactly the fields your nodes need.

For most non-chat workflows, custom typed state is the right default. A research agent might carry { query, plan, sources, draft, critique, final }. A support triage graph might carry { ticket, classification, customer_tier, suggested_response, requires_human }. Each field is named after the artifact it holds, not the node that produces it — this matters because multiple nodes can read the same field, and naming by producer creates dead language as the graph evolves.

Three discipline points worth enforcing from day one:

Treat the schema as an interface. Renaming draft to body means updating every prompt that references it. Version state schemas explicitly when shipping changes so checkpointer-restored runs do not blow up trying to deserialize into a schema that no longer exists.

Keep state lean. Long observations, scraped pages, and full retrieval results do not belong inline. Put them in object storage (or a cache) and keep a pointer in state. Every node pays the cost of reading state into its prompt — bloat compounds across nodes. This is context engineering at the framework level.

Use reducers deliberately. The default for most fields is overwrite; for accumulating fields (messages, tool-call history, list of sources) it is append. Decide per-field, not by accident.

The prompt-engineering implication: every node prompt should reference state fields by name. "Read state.query and produce state.plan" is a clearer instruction to both the model and the next engineer than "do the planning step." It also makes the contract auditable — you can grep the codebase for which nodes touch which fields.

Node Prompts — Patterns That Work

A node prompt has a much narrower job than a monolithic agent prompt. It should: name the node's role explicitly, reference the state fields it reads, describe the state fields it writes, and constrain the output shape. Three patterns recur.

The role-bound worker. A single-purpose node that takes a few state fields and produces one artifact. The prompt is short, names the role, and demands a structured output.

text

You are the RESEARCHER node in a multi-step research graph.

INPUT (from state):
- query: the user's research question (state.query)
- prior_sources: URLs already gathered (state.sources, may be empty)

JOB:
Identify 3-5 high-quality sources that address the query and are
not already in prior_sources. For each, return:
- url
- title
- one-sentence relevance note

OUTPUT (must match schema):
{ "sources": [ { "url": ..., "title": ..., "note": ... } ] }

Do not draft, summarize, or critique. Other nodes handle those.

The "do not" line is load-bearing. Without it the researcher drifts into drafting, the writer node has nothing to do, and the graph runs at half its specialization.

The router. A node whose only job is to emit a label that drives a conditional edge. Treat the output as an enum and validate it.

text

You are the ROUTER node. Given state.classification, decide which
specialist handles this next.

ALLOWED OUTPUTS (return exactly one):
- "billing"
- "technical"
- "account"
- "escalate_human"

Return only the label. No explanation, no JSON, no quotes.

In code, validate that the model returned an allowed value. If it did not, route to a recovery node — never to a default that silently swallows the misroute. This is one of the highest-leverage tests in any LangGraph app.

The synthesizer. A node that reads several state fields produced by upstream workers and merges them into the final artifact. Its prompt names every input field explicitly, defines precedence rules ("when state.critique conflicts with state.draft, defer to the critique"), and produces the output the graph commits as state.final.

The connecting principle: each node's prompt is shorter than a monolithic agent prompt, but the system of prompts is more legible because state is the explicit interface between them. For prompt-construction discipline at the unit level, the RCAF structure — Role, Context, Action, Format — applies cleanly to node prompts and is worth using as a template.

Routing — Conditional Edges

Conditional edges are where graphs earn their keep. add_conditional_edges(source_node, routing_function, path_map) lets you branch based on whatever the routing function returns — typically a label produced by a router LLM, sometimes a deterministic check on state.

Three patterns.

Supervisor LLM. A dedicated router node calls a small model with a short prompt and returns one of a fixed set of labels. The conditional edge maps each label to a downstream node. This is the workhorse for multi-agent system graphs where one supervisor dispatches to specialist workers.

python

def route_after_supervisor(state: State) -> str:
    label = state["next"]
    if label not in {"billing", "technical", "account", "escalate_human"}:
        return "recovery"
    return label

graph.add_conditional_edges(
    "supervisor",
    route_after_supervisor,
    {
        "billing": "billing_agent",
        "technical": "technical_agent",
        "account": "account_agent",
        "escalate_human": "human_handoff",
        "recovery": "recovery_node",
    },
)

Deterministic guard. A pure-Python function that reads state and routes — no LLM call. Use this for budget checks ("if state.steps > 10, route to terminate"), cycle breakers, and entitlement gates. Cheaper, faster, and not prone to hallucination.

Self-routing worker. A worker node returns both an artifact and a "what next" label as part of its output. Pulls double duty but couples the worker's prompt to the graph's topology — a worker that knows which other nodes exist is harder to refactor. Use sparingly.

The validation pattern is the same in all three: never let an unexpected label slip through. Map known labels explicitly; route everything else to a recovery or terminal node. Routers fail silently otherwise — the graph executes a fallback path you never intended and the bug surfaces three steps later in a synthesizer that gets the wrong inputs.

Human-in-the-Loop with `interrupt`

interrupt is LangGraph's pause primitive. A node calls it with a payload, the runtime checkpoints the full state, control returns to the caller, and the graph waits. When the human responds, you resume the run by passing the human's input back (typically via a Command resume) and execution continues from the interrupted node.

The mental model: interrupt turns the graph into a coroutine that the human is now part of. Two payload-design rules matter.

Send the reviewer only what they need to decide. A draft to approve, the source it came from, the proposed next action — not the entire state. Reviewers (or the UI rendering for them) should not have to parse implementation details. Label fields explicitly. Keep the schema stable across deploys so the reviewer-side UI does not break every release.

Be explicit about what resuming will do. "If you approve, the graph will publish the draft to the CMS. If you edit, your edits replace state.draft and the graph re-runs the publisher node." Surprises here are how HITL erodes trust.

python

from langgraph.types import interrupt, Command

def review_node(state):
    decision = interrupt({
        "type": "approve_draft",
        "draft": state["draft"],
        "source_summary": state["source_summary"],
    })
    return {"draft": decision.get("edited_draft", state["draft"]),
            "approved": decision.get("approved", False)}

Pair interrupt with a persistent checkpointer. An in-memory checkpointer loses everything when the worker process dies; a SQLite or Postgres checkpointer survives restarts and lets a human respond hours or days later. This is the difference between a demo HITL and a production one.

Persistence — Checkpointers

A checkpointer writes state at every super-step. The same primitive powers four capabilities: pause/resume, replay (re-run from any historical state), branching (fork a run to explore an alternative), and audit (every state transition is on disk).

At the API level you have three common options:

In-memory — fast, zero setup, loses everything when the process exits. Fine for tests and notebooks. Not for anything else.
SQLite — file-backed, single-process, cheap to operate. Fits single-worker deployments and local development.
Postgres — multi-process, suitable for horizontally scaled workers. The default for production.

The choice affects what you can promise. With SQLite or Postgres, you can tell users "your conversation persists across sessions" and "an interrupt can wait overnight." With in-memory, you cannot.

Configure the checkpointer when you compile the graph; pass a thread_id (or equivalent) when you invoke or stream so the runtime knows which conversation's state to load. One thread per user-conversation is the standard mapping; a separate checkpoint_id lets you address specific historical states for replay or branching.

Common Failure Modes

Four show up repeatedly across LangGraph projects. Each has a known fix.

State bloat. Long observations and full message histories accumulate in state. Every node reads the entire state into its prompt, so token cost and latency compound across the graph. Symptom: runs that started fast get slow over many turns; you start hitting context limits on synthesizer nodes. Fix: trim aggressively, summarize message history into a rolling summary field, store large artifacts externally with a pointer in state, and use field-level reducers to bound list growth.

Router hallucination. A conditional-edge router emits something outside the allowed enum — "maybe_billing", "billing.", a JSON object, an apologetic explanation. The graph routes to no edge and dead-ends, or worse, falls through to a default that silently does the wrong thing. Symptom: graphs that work in tests fail in production on edge-case inputs. Fix: validate router output against an explicit allowed set; route unknown values to a recovery node that re-asks with a tighter prompt; lower the router model's temperature; consider structured output constraints if your model client supports them.

Over-reading nodes. A worker prompt receives the entire state and the model decides to be helpful — answering things outside its scope, second-guessing prior nodes, or rewriting other nodes' artifacts. Symptom: multi-agent graphs where downstream nodes "fix" upstream work and the synthesis becomes incoherent. Fix: pass each node only the fields it needs (use a small input-builder function in the node), name the node's job in the prompt, explicitly forbid rewriting other nodes' fields, and split nodes that are doing two things.

Infinite loops. A node routes back to itself, or two nodes ping-pong indefinitely. Symptom: runs that never complete; cost spikes; no error, just silence. Fix: set a max-step or max-recursion budget on compile so the runtime terminates runaway runs; add a cycle counter in state and a deterministic guard edge that routes to a terminal failure node when the budget is exhausted; log every node entry so you can see the loop in traces.

A fifth, less common but worth naming: schema drift across checkpoint versions. You change state shape, deploy, and a worker tries to resume a checkpoint written under the old schema. Fix: version state schemas explicitly and write small migrations, or invalidate old checkpoints on deploy if your app can tolerate it.

These map cleanly onto the diagnostic patterns in the SurePrompts Quality Rubric and the staged thinking in the Context Engineering Maturity Model — graph-based agents are where context engineering stops being optional.

When LangGraph Is the Right Tool — and When It Is Overkill

LangGraph is the right tool when your workflow has at least one of:

Branching control flow that depends on intermediate LLM output (router patterns).
Multiple specialist roles that must coordinate but should not share full context.
Human-in-the-loop pauses that may last longer than a request lifecycle.
State that must persist across sessions, with replay or branching.
An agent tool loop complex enough that "ReAct in a while loop" has stopped being legible.

It is overkill when:

A single prompt with retrieval and structured output answers the question. Most B2B AI features still live here.
The flow is a fixed sequence of three or four steps with no branching. A LangChain pipeline (or even plain Python) is simpler and easier to reason about.
You only need short-term memory within one conversation and the platform's built-in memory is enough.
You are early enough that the actual product question is "do users want this," not "how do we scale the orchestration."

The honest tradeoff: LangGraph adds typed state, graph wiring, and checkpointer setup as ongoing concerns. The payoff is that you can ship branching, multi-actor, persistent agent behavior with a debuggable runtime instead of a tangle of conditionals and globals. Adopt it when control flow is the actual problem.

For applied prompting at the node level, the AI agents prompting guide covers tool-use prompting, reasoning model selection, and tool use patterns that work cleanly inside LangGraph nodes.

What to Read Next

Agentic Prompt Stack — the full stack view: how prompts, tools, memory, and orchestration fit together.
Multi-agent prompting guide — orchestrator-worker topology, hand-off design, shared vs isolated context. The conceptual layer above LangGraph.
CrewAI prompting guide — sibling framework, role-and-task abstraction, when its higher-level conventions fit better than a graph.
OpenAI Agents SDK prompting guide — sibling framework, hand-off and guardrails-first design, OpenAI-native tooling.
Mastra prompting guide — sibling framework, TypeScript-first, when your stack is Node and you want graph-shaped agents without leaving the JS ecosystem.
AI agents prompting guide — node-level prompting patterns that drop straight into LangGraph workers.
Context Engineering Maturity Model — staged framework for thinking about how state, retrieval, and prompts evolve as your agent stack grows.

LangGraph Prompting Guide: How to Build Stateful Multi-Agent LLM Apps (2026)

What LangGraph Is

LangGraph vs LangChain

Designing State — the Load-Bearing Decision

Node Prompts — Patterns That Work

Routing — Conditional Edges

Human-in-the-Loop with `interrupt`

Persistence — Checkpointers

Common Failure Modes

When LangGraph Is the Right Tool — and When It Is Overkill

What to Read Next

Build prompts like these in seconds

Related Articles

The Agentic Prompt Stack: 6 Layers for Designing Prompts That Run Agents

Multi-Agent Prompting Guide: Coordinating Specialist Agents (2026)

AI Agents Prompting Guide: How to Write Instructions That Actually Work (2026)

LangGraph Prompting Guide: How to Build Stateful Multi-Agent LLM Apps (2026)

What LangGraph Is

LangGraph vs LangChain

Designing State — the Load-Bearing Decision

Node Prompts — Patterns That Work

Routing — Conditional Edges

Human-in-the-Loop with interrupt

Persistence — Checkpointers

Common Failure Modes

When LangGraph Is the Right Tool — and When It Is Overkill

What to Read Next

Build prompts like these in seconds

Related Articles

The Agentic Prompt Stack: 6 Layers for Designing Prompts That Run Agents

Multi-Agent Prompting Guide: Coordinating Specialist Agents (2026)

AI Agents Prompting Guide: How to Write Instructions That Actually Work (2026)

Human-in-the-Loop with `interrupt`