Model Distillation Calculator

Calculate knowledge distillation requirements.

Distillation Configuration

B
B

Compression Ratio

10.0x

90.0% parameter reduction

πŸ“ŠEst. Quality Retention
55%
⚑Inference Speedup
10.0x

Memory Requirements

Teacher Model130.4 GB
Student Training52.2 GB
Total Required182.5 GB
GPUs Needed3

Training Estimates

Training Time210.6 hours
Estimated Cost$2527.18
Temperature2

Layer Mapping (Feature Distillation)

Student Layer 1Teacher Layer 3
Student Layer 2Teacher Layer 5
Student Layer 3Teacher Layer 8
Student Layer 4Teacher Layer 10
Student Layer 5Teacher Layer 13

Layer mapping ratio: 1:2.5

Tip: Temperature of 2-4 works well for most cases. Higher temperature produces softer probability distributions, transferring more "dark knowledge".