Gradient Accumulation Calculator
Calculate gradient accumulation steps for training.
Training Configuration
B
GB
Gradient Accumulation Steps
32
Effective batch: 256
💾Memory Usage
384.7%
⚡Efficiency
36.0%
Batch Configuration
Micro Batch per GPU8
Effective Batch per GPU256
Tokens per Micro Batch16,384
Tokens per Effective Batch524,288
Memory Breakdown
Model Weights13.04 GB
Activations1.07 GB
Gradients26.08 GB
Optimizer States52.15 GB
Total Memory92.34 GB
Training Estimates
Steps per Epoch (1B tokens)1,908
Suggestions
- Memory utilization is high (384.7%). Consider reducing micro batch size.