KV Cache Calculator

Calculate KV cache memory for LLM inference.

Model Configuration

B

Total KV Cache Memory

2.00 GB

512.00 KB per token

🎯Head Dimension
128
πŸ’ΎMemory Saved (GQA)
0.0%

Cache Details

Cache per Token512.00 KB
Growth per Token512.00 KB
Query/KV Head Ratio1:1
Est. Max Batch (24GB GPU)4

Memory by Sequence Length

512 tokens256 MB
1,024 tokens512 MB
2,048 tokens1.00 GB
4,096 tokens2.00 GB
8,192 tokens4.00 GB
16,384 tokens8.00 GB
32,768 tokens16.00 GB

Performance Metrics

Bandwidth Required2000.0 GB/s
MHA Equivalent Cache2.00 GB

Tip: Using fewer KV heads (GQA/MQA) can significantly reduce KV cache memory while maintaining quality. LLaMA-2 uses 8 KV heads with 32 query heads.