Skip to main content

Sampling

Sampling is the process of selecting the next token from the probability distribution a language model produces at each generation step. Different sampling strategies — including greedy decoding (always pick the highest probability), temperature scaling, top-p nucleus sampling, and top-k filtering — control the balance between deterministic and creative outputs. The choice of sampling method significantly affects output quality and diversity.

Example

After the model computes probabilities for the next word — "brilliant" (30%), "great" (25%), "wonderful" (20%), "good" (15%), "nice" (10%) — greedy sampling always picks "brilliant." With temperature 0.8 and top-p 0.9, any of the top four words could be selected proportionally, producing more natural and varied text across generations.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts