Skip to main content

Sampling

Sampling is the process of selecting the next token from the probability distribution a language model produces at each generation step. Different sampling strategies — including greedy decoding (always pick the highest probability), temperature scaling, top-p nucleus sampling, and top-k filtering — control the balance between deterministic and creative outputs. The choice of sampling method significantly affects output quality and diversity.

Example

After the model computes probabilities for the next word — "brilliant" (30%), "great" (25%), "wonderful" (20%), "good" (15%), "nice" (10%) — greedy sampling always picks "brilliant." With temperature 0.8 and top-p 0.9, any of the top four words could be selected proportionally, producing more natural and varied text across generations.

Frequently asked questions

What is Sampling?

Sampling is the process of selecting the next token from the probability distribution a language model produces at each generation step.

Can you give an example of Sampling?

After the model computes probabilities for the next word — "brilliant" (30%), "great" (25%), "wonderful" (20%), "good" (15%), "nice" (10%) — greedy sampling always picks "brilliant." With temperature 0.8 and top-p 0.9, any of the top four words could be selected proportionally, producing more natural and varied text across generations.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts