Model Routing
Model routing is the practice of dispatching different requests to different language models based on task classification, cost target, or expected reasoning depth. It treats model choice as a per-request decision rather than a one-time pick for the whole application.
A production system might route a quick classification call to Haiku 4.5, a complex coding task to Claude Opus 4.7, and a math proof to o3 — paying flagship rates only for the work that needs them. Distinct from prompt routing, which dispatches between prompt templates inside a single model.
How it works
- 1
A router classifier inspects each incoming request.
- 2
It assigns the request a category based on task type, complexity, or cost target.
- 3
The request is dispatched to the model whose strengths match that category.
- 4
Routing decisions are typically logged for analysis and re-tuning.
Example
A customer-support pipeline might route "order status" questions to a cheap, fast model and "refund dispute" questions to a smarter, slower one — because the cost of getting the second category wrong dwarfs the per-token savings of using the cheap model uniformly.
Not to be confused with
- Prompt routing
- Prompt routing selects between prompt templates inside one model. Model routing selects between different models entirely. Some systems combine both: route to a model, then route to a prompt template.
Related Terms
Put this into practice
Build polished, copy-ready prompts in under 60 seconds with SurePrompts.
Try SurePrompts