Attention Head Calculator

Calculate multi-head attention parameters and requirements.

Attention Configuration

Attention Parameters

2.36M

Head Dimension: 64

⚑GFLOPs
26.27
πŸ’ΎKV Cache Savings
0.0%

Parameter Breakdown

Query Projection (W_Q)0.59M
Key Projection (W_K)0.59M
Value Projection (W_V)0.59M
Output Projection (W_O)0.59M

Memory Usage

Attention Scores384.00 MB
Q Tensor48.00 MB
KV Tensors96.00 MB