Skip to main content

Function-Calling Accuracy

Function-calling accuracy is how often a model correctly picks the right tool, passes valid arguments, and respects schema constraints when given a function-calling interface. It is the single best predictor of agent reliability in production.

Example

Two models can score similarly on general reasoning benchmarks while their function-calling accuracy differs dramatically — one returns clean, schema-valid tool calls roughly 95% of the time, the other invents argument names or skips required fields, breaking the agent loop.