Skip to main content

Attention Mechanism

An attention mechanism is a neural network component that allows a model to dynamically weigh the importance of different parts of the input when generating each part of the output. Rather than processing input as a fixed-length summary, attention lets the model "focus" on the most relevant tokens at each generation step. It is the core innovation that makes transformers and modern LLMs possible.

Example

When translating "The bank by the river was steep" to French, the attention mechanism assigns high weight to "river" and "steep" when choosing how to translate "bank" — correctly selecting "berge" (riverbank) instead of "banque" (financial bank), because the surrounding context receives appropriate attention.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts