Model Distillation Calculator
Calculate knowledge distillation requirements.
Distillation Configuration
B
B
Compression Ratio
10.0x
90.0% parameter reduction
πEst. Quality Retention
55%
β‘Inference Speedup
10.0x
Memory Requirements
Teacher Model130.4 GB
Student Training52.2 GB
Total Required182.5 GB
GPUs Needed3
Training Estimates
Training Time210.6 hours
Estimated Cost$2527.18
Temperature2
Layer Mapping (Feature Distillation)
Student Layer 1Teacher Layer 3
Student Layer 2Teacher Layer 5
Student Layer 3Teacher Layer 8
Student Layer 4Teacher Layer 10
Student Layer 5Teacher Layer 13
Layer mapping ratio: 1:2.5
Tip: Temperature of 2-4 works well for most cases. Higher temperature produces softer probability distributions, transferring more "dark knowledge".