Speculative Decoding Calculator
Calculate speculative decoding speedup.
Model Configuration
B
B
ms
ms
Inference Speedup
1.82x
7020ms vs 12800ms
π―Tokens/Round
3.29
β‘Throughput
36.6/s
Speculation Analysis
Expected Accepted Tokens2.53
Rounds Needed78
Time per Round90.0 ms
Draft Utilization63.3%
Memory & Efficiency
Target Model Memory130.4 GB
Draft Model Memory13.0 GB
Memory Overhead10.0%
Target Call Reduction69.5%
Speedup by Acceptance Rate
50% acceptance1.55x speedup
60% acceptance1.67x speedup
70% acceptance1.82x speedup
80% acceptance1.95x speedup
90% acceptance2.09x speedup
Optimal K: Based on your latency ratio, try K=3 for potentially better speedup.