Mixture of Experts Calculator

Calculate MoE model parameters and efficiency.

MoE Configuration

Total Parameters

47.5B

13.7B active per forward

πŸ’ΎTotal Memory
88.5 GB
🎯Expert Utilization
25%

Parameter Breakdown

Params per Expert176.2M
Params per Layer1.48B
Attention Params/Layer67.1M
Router Params/Layer32.8K

Efficiency Metrics

Parameter Efficiency28.8%
Memory Overhead247.1%
Equivalent Dense Model13.7B
Equiv. Dense Memory25.5 GB

Expert Configuration

Routed Experts8
Shared Experts0
Tokens per Expert16%

Tip: MoE models like Mixtral 8x7B have 47B total params but only ~13B active, giving dense-model quality at much lower compute cost.