KV Cache Calculator

Calculate KV cache memory for LLM inference.

Model Configuration

Model Size (Billions)

Number of Layers

Hidden Size

Query Heads

KV Heads

Sequence Length

Batch Size

Precision

Total KV Cache Memory

2.00 GB

512.00 KB per token

🎯Head Dimension

128

💾Memory Saved (GQA)

0.0%

Cache Details

Cache per Token512.00 KB

Growth per Token512.00 KB

Query/KV Head Ratio1:1

Est. Max Batch (24GB GPU)4

Memory by Sequence Length

512 tokens256 MB

1,024 tokens512 MB

2,048 tokens1.00 GB

4,096 tokens2.00 GB

8,192 tokens4.00 GB

16,384 tokens8.00 GB

32,768 tokens16.00 GB

Performance Metrics

Bandwidth Required2000.0 GB/s

MHA Equivalent Cache2.00 GB

Tip: Using fewer KV heads (GQA/MQA) can significantly reduce KV cache memory while maintaining quality. LLaMA-2 uses 8 KV heads with 32 query heads.

💡

Help us improve!

How would you rate the KV Cache Calculator?

Editorial Note

MyCalcBuddy Editorial Team

This page is maintained as an educational calculator reference.

ðŸ“š

Formula Source: Standard Mathematical References

by Various

ðŸ”„Last reviewed: May 2026

âœ“Formula checks are based on standard references and internal QA review.