Speculative Decoding Calculator

Calculate speculative decoding speedup.

Model Configuration

B
B
ms
ms

Inference Speedup

1.82x

7020ms vs 12800ms

🎯Tokens/Round
3.29
⚑Throughput
36.6/s

Speculation Analysis

Expected Accepted Tokens2.53
Rounds Needed78
Time per Round90.0 ms
Draft Utilization63.3%

Memory & Efficiency

Target Model Memory130.4 GB
Draft Model Memory13.0 GB
Memory Overhead10.0%
Target Call Reduction69.5%

Speedup by Acceptance Rate

50% acceptance1.55x speedup
60% acceptance1.67x speedup
70% acceptance1.82x speedup
80% acceptance1.95x speedup
90% acceptance2.09x speedup

Optimal K: Based on your latency ratio, try K=3 for potentially better speedup.