Question 1

What is KV-Cache?

Accepted Answer

A KV-cache (key-value cache) stores the computed attention key and value matrices from previously processed tokens so the model does not need to recalculate them when generating each new token. Without a KV-cache, the model would recompute attention for the entire input sequence at every generation step.

Question 2

Can you give an example of KV-Cache?

Accepted Answer

When ChatGPT generates a 500-word response, it produces one token at a time. After processing your 100-token prompt, the KV-cache stores the attention computations for those tokens. For each subsequent token, the model only computes attention for the new token against the cached values — instead of reprocessing all previous tokens from scratch.

KV-Cache

Example

Frequently asked questions

What is KV-Cache?

Can you give an example of KV-Cache?

Related Terms

Put this into practice