Question 1

What is Prefix-Tuning?

Accepted Answer

Prefix-tuning is a parameter-efficient fine-tuning method in which a small set of continuous, trainable vectors — the "prefix" — is prepended to the input at every transformer layer and the underlying model weights are frozen.

Question 2

How does Prefix-Tuning work?

Accepted Answer

A small matrix of trainable parameters — the prefix — is created for each transformer layer, typically a few hundred "virtual tokens" wide. During training the base model weights stay frozen; gradients flow only into the prefix matrices, which the attention mechanism treats as additional keys and values at every layer. Different tasks get different prefixes, all stored as lightweight checkpoints (megabytes, not gigabytes), while a single base model serves them all in memory. At inference the router selects the right prefix for the request, prepends its activations at every layer, and runs a normal forward pass through the frozen base.

Question 3

Can you give an example of Prefix-Tuning?

Accepted Answer

A company fine-tunes a frozen 13B base model for three internal tasks — contract summarization, ticket classification, and release-note drafting — by training a separate 1M-parameter prefix per task. Each task prefix is a few megabytes on disk, versus gigabytes for a full fine-tune. At inference, the router loads the right prefix for each request and all three tasks share the same base-model weights in memory.

Prefix-Tuning

How it works

Example

Frequently asked questions

What is Prefix-Tuning?

How does Prefix-Tuning work?

Can you give an example of Prefix-Tuning?

Not to be confused with

Related Terms

Put this into practice