KV Cache Calculator
Calculate KV cache memory for LLM inference.
Model Configuration
B
Total KV Cache Memory
2.00 GB
512.00 KB per token
π―Head Dimension
128
πΎMemory Saved (GQA)
0.0%
Cache Details
Cache per Token512.00 KB
Growth per Token512.00 KB
Query/KV Head Ratio1:1
Est. Max Batch (24GB GPU)4
Memory by Sequence Length
512 tokens256 MB
1,024 tokens512 MB
2,048 tokens1.00 GB
4,096 tokens2.00 GB
8,192 tokens4.00 GB
16,384 tokens8.00 GB
32,768 tokens16.00 GB
Performance Metrics
Bandwidth Required2000.0 GB/s
MHA Equivalent Cache2.00 GB
Tip: Using fewer KV heads (GQA/MQA) can significantly reduce KV cache memory while maintaining quality. LLaMA-2 uses 8 KV heads with 32 query heads.