Semantic Router
A semantic router is an embedding-based routing layer that classifies an incoming query to one of several downstream prompts, agents, tools, or models by computing similarity between the query embedding and a set of labeled reference utterances. Each route is defined by a small list of example queries; at runtime the router embeds the incoming query, finds the nearest route by cosine similarity, and dispatches to that route's handler. Compared to an LLM-based router — where a model reads the query and chooses — a semantic router is fast (one embedding call instead of a full LLM call), deterministic, cheap enough for every request, and easy to debug because the nearest-neighbor match is inspectable. The tradeoff is that it handles only the routing decisions it has example utterances for; genuinely novel intents need an LLM fallback.
Example
A customer-service assistant needs to split requests across three downstream pipelines: billing, technical support, and general product questions. Instead of adding a router LLM call, the team defines each route with fifteen example queries and uses a semantic router. At runtime, embedding-based routing adds roughly 25ms of latency and costs a fraction of a cent per request; the LLM-router it replaced added 400ms and was the single most expensive step in the pipeline. Routing accuracy holds at 94% on the eval set, with a confidence threshold falling back to an LLM classifier for the other 6%.
Put this into practice
Build polished, copy-ready prompts in under 60 seconds with SurePrompts.
Try SurePrompts