Skip to main content

Model Routing

Model routing is the practice of dispatching different requests to different language models based on task classification, cost target, or expected reasoning depth. It treats model choice as a per-request decision rather than a one-time pick for the whole application.

A production system might route a quick classification call to Haiku 4.5, a complex coding task to Claude Opus 4.7, and a math proof to o3 — paying flagship rates only for the work that needs them. Distinct from prompt routing, which dispatches between prompt templates inside a single model.

How it works

  1. 1

    A router classifier inspects each incoming request.

  2. 2

    It assigns the request a category based on task type, complexity, or cost target.

  3. 3

    The request is dispatched to the model whose strengths match that category.

  4. 4

    Routing decisions are typically logged for analysis and re-tuning.

Example

A customer-support pipeline might route "order status" questions to a cheap, fast model and "refund dispute" questions to a smarter, slower one — because the cost of getting the second category wrong dwarfs the per-token savings of using the cheap model uniformly.

Not to be confused with

Prompt routing
Prompt routing selects between prompt templates inside one model. Model routing selects between different models entirely. Some systems combine both: route to a model, then route to a prompt template.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts