Gradient Accumulation Calculator

Calculate gradient accumulation steps for training.

Training Configuration

B
GB

Gradient Accumulation Steps

32

Effective batch: 256

💾Memory Usage
384.7%
Efficiency
36.0%

Batch Configuration

Micro Batch per GPU8
Effective Batch per GPU256
Tokens per Micro Batch16,384
Tokens per Effective Batch524,288

Memory Breakdown

Model Weights13.04 GB
Activations1.07 GB
Gradients26.08 GB
Optimizer States52.15 GB
Total Memory92.34 GB

Training Estimates

Steps per Epoch (1B tokens)1,908

Suggestions

  • Memory utilization is high (384.7%). Consider reducing micro batch size.