2.24x decode TPS increase On Qwen 3.6 27B @ temp 0.6 | Native MTP Speculative Decoding On Apple Silicon With No External Drafter.
metal mtp mlx inference-engine apple-silicon local-ai qwen speculative-decoding speculative-sampling openai-compatible qwen3-next anthropic-compatible native-mtp mtplx
-
Updated
May 11, 2026 - Python