Skip to main content

Perplexity

Perplexity is a standard metric for evaluating how well a language model predicts a sequence of text. Mathematically, it represents the exponential of the average negative log-likelihood per token — intuitively, it measures how "surprised" the model is by the text it encounters. Lower perplexity indicates better prediction quality, meaning the model assigns higher probabilities to the actual next tokens.

Example

A model is evaluated on a held-out news article. Model A achieves a perplexity of 15, meaning on average it's "choosing between" 15 equally likely next words. Model B scores 25, indicating more uncertainty. Model A is the better language model for this domain because it predicts the text more confidently and accurately.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts