Attention Head Calculator
Calculate multi-head attention parameters and requirements.
Attention Configuration
Attention Parameters
2.36M
Head Dimension: 64
β‘GFLOPs
26.27
πΎKV Cache Savings
0.0%
Parameter Breakdown
Query Projection (W_Q)0.59M
Key Projection (W_K)0.59M
Value Projection (W_V)0.59M
Output Projection (W_O)0.59M
Memory Usage
Attention Scores384.00 MB
Q Tensor48.00 MB
KV Tensors96.00 MB