Mixture of Experts Calculator
Calculate MoE model parameters and efficiency.
MoE Configuration
Total Parameters
47.5B
13.7B active per forward
πΎTotal Memory
88.5 GB
π―Expert Utilization
25%
Parameter Breakdown
Params per Expert176.2M
Params per Layer1.48B
Attention Params/Layer67.1M
Router Params/Layer32.8K
Efficiency Metrics
Parameter Efficiency28.8%
Memory Overhead247.1%
Equivalent Dense Model13.7B
Equiv. Dense Memory25.5 GB
Expert Configuration
Routed Experts8
Shared Experts0
Tokens per Expert16%
Tip: MoE models like Mixtral 8x7B have 47B total params but only ~13B active, giving dense-model quality at much lower compute cost.