LangGraph is the LangChain team's Python framework for building stateful, multi-actor LLM applications as directed graphs. You declare a typed state object, register nodes that read it and return updates, wire them with edges — including conditional edges that route on LLM output — compile the graph, and invoke or stream it. Persistence, human-in-the-loop pauses, and replay come from a pluggable checkpointer.
For prompt engineering, the shift is concrete: instead of one long prompt that tries to plan, execute, and synthesize, you write many small prompts that each do one job and share an explicit state contract. The hard parts move from "make the prompt clever" to "design state and the routing decisions on top of it."
Tip
Adopt LangGraph when control flow is the actual problem — branching on intermediate output, coordinating specialists, or pausing for a human. Until then, a single prompt with retrieval and structured output is usually the right tool.
Key takeaways:
- LangGraph is a graph runtime, not a prompt template. The graph defines control flow; nodes do the LLM work.
- State is the contract every node prompt reads from and writes to. Get the schema right before writing any prompts.
- Node prompts should name the node's job, reference state fields by name, and return a state delta — not free-form prose.
- Conditional edges work best when the routing model emits one of a small enum of values, validated before dispatch.
interruptplus a persistent checkpointer is the standard pattern for human-in-the-loop that survives restarts.- The default failure modes are state bloat, router hallucination, over-reading nodes, and infinite loops. Each has a known fix.
- LangGraph integrates with LangChain but does not require it. Any model client and any tool layer works inside a node.
What LangGraph Is
LangGraph is a Python-first framework for declaring agent graphs — directed graphs where nodes are functions (often LLM calls) and edges are control flow. The core abstraction is StateGraph: you instantiate it with a state schema, call add_node to register node functions, call add_edge and add_conditional_edges to wire them, then compile the graph into a runnable. The runnable exposes invoke, stream, and async equivalents — the same execution surface most LangChain users already know.
Two things distinguish it from a plain chain. First, state is explicit and typed: every node receives the current state, returns a partial update, and the runtime merges those updates according to per-field reducers you can override. Second, control flow is part of the program, not implicit in the order of chain.pipe(...) calls — conditional edges branch based on the output of a node (often an LLM that returns a label), and you can build cycles, retries, and dispatch fan-outs without monkey-patching a sequential chain.
LangGraph also ships persistence as a first-class concern. A checkpointer (in-memory, SQLite, or Postgres at the API level — pick what your deployment supports) writes state at every super-step, so a graph can be paused, resumed, replayed, or branched. This is what makes agent orchestration over multi-turn conversations and human-in-the-loop workflows tractable in production.
It is Python-first but framework-neutral: you can use any model client (OpenAI, Anthropic, your own gateway), any tool wrapper, and any retriever. LangChain runnables drop in cleanly inside nodes, but you can write a node that calls an HTTP endpoint with httpx and never touch LangChain at all.
LangGraph vs LangChain
The two are often confused because they ship from the same team and share primitives. They solve different problems.
| Dimension | LangChain | LangGraph |
|---|---|---|
| Primary primitive | Runnable / chain | StateGraph + nodes + edges |
| State model | Implicit (passed through pipe) | Explicit typed state with reducers |
| Control flow | Sequential composition | Branching, loops, conditional dispatch |
| When to use | Single prompt with retrieval, linear pipeline | Branching workflows, multi-actor coordination, HITL |
| Learning curve | Low — chain.pipe(...) | Higher — state schema, edges, checkpointers |
LangChain answers "how do I call a model with retrieval and structured output." LangGraph answers "how do I run a planner, dispatch to specialist workers, route based on a router LLM, pause for human approval, and resume tomorrow." Most production LangGraph apps use LangChain runnables inside nodes — the two compose, they do not compete. For a broader view of where this fits, see the Agentic Prompt Stack and the multi-agent prompting guide.
Designing State — the Load-Bearing Decision
State is the contract every node prompt reads from and writes to. Get it wrong and every node prompt is fighting the schema. Get it right and the prompts become small, focused, and almost boring.
Two starting points. MessagesState is built in: a list of messages with an add_messages reducer that appends rather than overwrites. It fits chat-shaped apps, simple ReAct-style tool loops, and anything where the conversation history is the state. The other path is a custom TypedDict (or Pydantic model) where you declare exactly the fields your nodes need.
For most non-chat workflows, custom typed state is the right default. A research agent might carry { query, plan, sources, draft, critique, final }. A support triage graph might carry { ticket, classification, customer_tier, suggested_response, requires_human }. Each field is named after the artifact it holds, not the node that produces it — this matters because multiple nodes can read the same field, and naming by producer creates dead language as the graph evolves.
Three discipline points worth enforcing from day one:
Treat the schema as an interface. Renaming draft to body means updating every prompt that references it. Version state schemas explicitly when shipping changes so checkpointer-restored runs do not blow up trying to deserialize into a schema that no longer exists.
Keep state lean. Long observations, scraped pages, and full retrieval results do not belong inline. Put them in object storage (or a cache) and keep a pointer in state. Every node pays the cost of reading state into its prompt — bloat compounds across nodes. This is context engineering at the framework level.
Use reducers deliberately. The default for most fields is overwrite; for accumulating fields (messages, tool-call history, list of sources) it is append. Decide per-field, not by accident.
The prompt-engineering implication: every node prompt should reference state fields by name. "Read state.query and produce state.plan" is a clearer instruction to both the model and the next engineer than "do the planning step." It also makes the contract auditable — you can grep the codebase for which nodes touch which fields.
Node Prompts — Patterns That Work
A node prompt has a much narrower job than a monolithic agent prompt. It should: name the node's role explicitly, reference the state fields it reads, describe the state fields it writes, and constrain the output shape. Three patterns recur.
The role-bound worker. A single-purpose node that takes a few state fields and produces one artifact. The prompt is short, names the role, and demands a structured output.
You are the RESEARCHER node in a multi-step research graph.
INPUT (from state):
- query: the user's research question (state.query)
- prior_sources: URLs already gathered (state.sources, may be empty)
JOB:
Identify 3-5 high-quality sources that address the query and are
not already in prior_sources. For each, return:
- url
- title
- one-sentence relevance note
OUTPUT (must match schema):
{ "sources": [ { "url": ..., "title": ..., "note": ... } ] }
Do not draft, summarize, or critique. Other nodes handle those.
The "do not" line is load-bearing. Without it the researcher drifts into drafting, the writer node has nothing to do, and the graph runs at half its specialization.
The router. A node whose only job is to emit a label that drives a conditional edge. Treat the output as an enum and validate it.
You are the ROUTER node. Given state.classification, decide which
specialist handles this next.
ALLOWED OUTPUTS (return exactly one):
- "billing"
- "technical"
- "account"
- "escalate_human"
Return only the label. No explanation, no JSON, no quotes.
In code, validate that the model returned an allowed value. If it did not, route to a recovery node — never to a default that silently swallows the misroute. This is one of the highest-leverage tests in any LangGraph app.
The synthesizer. A node that reads several state fields produced by upstream workers and merges them into the final artifact. Its prompt names every input field explicitly, defines precedence rules ("when state.critique conflicts with state.draft, defer to the critique"), and produces the output the graph commits as state.final.
The connecting principle: each node's prompt is shorter than a monolithic agent prompt, but the system of prompts is more legible because state is the explicit interface between them. For prompt-construction discipline at the unit level, the RCAF structure — Role, Context, Action, Format — applies cleanly to node prompts and is worth using as a template.
Routing — Conditional Edges
Conditional edges are where graphs earn their keep. add_conditional_edges(source_node, routing_function, path_map) lets you branch based on whatever the routing function returns — typically a label produced by a router LLM, sometimes a deterministic check on state.
Three patterns.
Supervisor LLM. A dedicated router node calls a small model with a short prompt and returns one of a fixed set of labels. The conditional edge maps each label to a downstream node. This is the workhorse for multi-agent system graphs where one supervisor dispatches to specialist workers.
def route_after_supervisor(state: State) -> str:
label = state["next"]
if label not in {"billing", "technical", "account", "escalate_human"}:
return "recovery"
return label
graph.add_conditional_edges(
"supervisor",
route_after_supervisor,
{
"billing": "billing_agent",
"technical": "technical_agent",
"account": "account_agent",
"escalate_human": "human_handoff",
"recovery": "recovery_node",
},
)
Deterministic guard. A pure-Python function that reads state and routes — no LLM call. Use this for budget checks ("if state.steps > 10, route to terminate"), cycle breakers, and entitlement gates. Cheaper, faster, and not prone to hallucination.
Self-routing worker. A worker node returns both an artifact and a "what next" label as part of its output. Pulls double duty but couples the worker's prompt to the graph's topology — a worker that knows which other nodes exist is harder to refactor. Use sparingly.
The validation pattern is the same in all three: never let an unexpected label slip through. Map known labels explicitly; route everything else to a recovery or terminal node. Routers fail silently otherwise — the graph executes a fallback path you never intended and the bug surfaces three steps later in a synthesizer that gets the wrong inputs.
Human-in-the-Loop with interrupt
interrupt is LangGraph's pause primitive. A node calls it with a payload, the runtime checkpoints the full state, control returns to the caller, and the graph waits. When the human responds, you resume the run by passing the human's input back (typically via a Command resume) and execution continues from the interrupted node.
The mental model: interrupt turns the graph into a coroutine that the human is now part of. Two payload-design rules matter.
Send the reviewer only what they need to decide. A draft to approve, the source it came from, the proposed next action — not the entire state. Reviewers (or the UI rendering for them) should not have to parse implementation details. Label fields explicitly. Keep the schema stable across deploys so the reviewer-side UI does not break every release.
Be explicit about what resuming will do. "If you approve, the graph will publish the draft to the CMS. If you edit, your edits replace state.draft and the graph re-runs the publisher node." Surprises here are how HITL erodes trust.
from langgraph.types import interrupt, Command
def review_node(state):
decision = interrupt({
"type": "approve_draft",
"draft": state["draft"],
"source_summary": state["source_summary"],
})
return {"draft": decision.get("edited_draft", state["draft"]),
"approved": decision.get("approved", False)}
Pair interrupt with a persistent checkpointer. An in-memory checkpointer loses everything when the worker process dies; a SQLite or Postgres checkpointer survives restarts and lets a human respond hours or days later. This is the difference between a demo HITL and a production one.
Persistence — Checkpointers
A checkpointer writes state at every super-step. The same primitive powers four capabilities: pause/resume, replay (re-run from any historical state), branching (fork a run to explore an alternative), and audit (every state transition is on disk).
At the API level you have three common options:
- In-memory — fast, zero setup, loses everything when the process exits. Fine for tests and notebooks. Not for anything else.
- SQLite — file-backed, single-process, cheap to operate. Fits single-worker deployments and local development.
- Postgres — multi-process, suitable for horizontally scaled workers. The default for production.
The choice affects what you can promise. With SQLite or Postgres, you can tell users "your conversation persists across sessions" and "an interrupt can wait overnight." With in-memory, you cannot.
Configure the checkpointer when you compile the graph; pass a thread_id (or equivalent) when you invoke or stream so the runtime knows which conversation's state to load. One thread per user-conversation is the standard mapping; a separate checkpoint_id lets you address specific historical states for replay or branching.
Common Failure Modes
Four show up repeatedly across LangGraph projects. Each has a known fix.
State bloat. Long observations and full message histories accumulate in state. Every node reads the entire state into its prompt, so token cost and latency compound across the graph. Symptom: runs that started fast get slow over many turns; you start hitting context limits on synthesizer nodes. Fix: trim aggressively, summarize message history into a rolling summary field, store large artifacts externally with a pointer in state, and use field-level reducers to bound list growth.
Router hallucination. A conditional-edge router emits something outside the allowed enum — "maybe_billing", "billing.", a JSON object, an apologetic explanation. The graph routes to no edge and dead-ends, or worse, falls through to a default that silently does the wrong thing. Symptom: graphs that work in tests fail in production on edge-case inputs. Fix: validate router output against an explicit allowed set; route unknown values to a recovery node that re-asks with a tighter prompt; lower the router model's temperature; consider structured output constraints if your model client supports them.
Over-reading nodes. A worker prompt receives the entire state and the model decides to be helpful — answering things outside its scope, second-guessing prior nodes, or rewriting other nodes' artifacts. Symptom: multi-agent graphs where downstream nodes "fix" upstream work and the synthesis becomes incoherent. Fix: pass each node only the fields it needs (use a small input-builder function in the node), name the node's job in the prompt, explicitly forbid rewriting other nodes' fields, and split nodes that are doing two things.
Infinite loops. A node routes back to itself, or two nodes ping-pong indefinitely. Symptom: runs that never complete; cost spikes; no error, just silence. Fix: set a max-step or max-recursion budget on compile so the runtime terminates runaway runs; add a cycle counter in state and a deterministic guard edge that routes to a terminal failure node when the budget is exhausted; log every node entry so you can see the loop in traces.
A fifth, less common but worth naming: schema drift across checkpoint versions. You change state shape, deploy, and a worker tries to resume a checkpoint written under the old schema. Fix: version state schemas explicitly and write small migrations, or invalidate old checkpoints on deploy if your app can tolerate it.
These map cleanly onto the diagnostic patterns in the SurePrompts Quality Rubric and the staged thinking in the Context Engineering Maturity Model — graph-based agents are where context engineering stops being optional.
When LangGraph Is the Right Tool — and When It Is Overkill
LangGraph is the right tool when your workflow has at least one of:
- Branching control flow that depends on intermediate LLM output (router patterns).
- Multiple specialist roles that must coordinate but should not share full context.
- Human-in-the-loop pauses that may last longer than a request lifecycle.
- State that must persist across sessions, with replay or branching.
- An agent tool loop complex enough that "ReAct in a while loop" has stopped being legible.
It is overkill when:
- A single prompt with retrieval and structured output answers the question. Most B2B AI features still live here.
- The flow is a fixed sequence of three or four steps with no branching. A LangChain pipeline (or even plain Python) is simpler and easier to reason about.
- You only need short-term memory within one conversation and the platform's built-in memory is enough.
- You are early enough that the actual product question is "do users want this," not "how do we scale the orchestration."
The honest tradeoff: LangGraph adds typed state, graph wiring, and checkpointer setup as ongoing concerns. The payoff is that you can ship branching, multi-actor, persistent agent behavior with a debuggable runtime instead of a tangle of conditionals and globals. Adopt it when control flow is the actual problem.
For applied prompting at the node level, the AI agents prompting guide covers tool-use prompting, reasoning model selection, and tool use patterns that work cleanly inside LangGraph nodes.
What to Read Next
- Agentic Prompt Stack — the full stack view: how prompts, tools, memory, and orchestration fit together.
- Multi-agent prompting guide — orchestrator-worker topology, hand-off design, shared vs isolated context. The conceptual layer above LangGraph.
- CrewAI prompting guide — sibling framework, role-and-task abstraction, when its higher-level conventions fit better than a graph.
- OpenAI Agents SDK prompting guide — sibling framework, hand-off and guardrails-first design, OpenAI-native tooling.
- Mastra prompting guide — sibling framework, TypeScript-first, when your stack is Node and you want graph-shaped agents without leaving the JS ecosystem.
- AI agents prompting guide — node-level prompting patterns that drop straight into LangGraph workers.
- Context Engineering Maturity Model — staged framework for thinking about how state, retrieval, and prompts evolve as your agent stack grows.