Skip to main content

Prefix-Tuning

Prefix-tuning is a parameter-efficient fine-tuning method in which a small set of continuous, trainable vectors — the "prefix" — is prepended to the input at every transformer layer and the underlying model weights are frozen. Only the prefix parameters are updated during training, giving task-specific adaptation at a tiny fraction of full fine-tuning cost and storage. It is related to prompt-tuning, which trains soft prompts only at the input-embedding layer; prefix-tuning operates at every layer and is typically more expressive, at the price of more trainable parameters. Both methods let a single frozen base model serve many tasks by swapping lightweight task-specific prefixes.

Example

A company fine-tunes a frozen 13B base model for three internal tasks — contract summarization, ticket classification, and release-note drafting — by training a separate 1M-parameter prefix per task. Each task prefix is a few megabytes on disk, versus gigabytes for a full fine-tune. At inference, the router loads the right prefix for each request and all three tasks share the same base-model weights in memory.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts