Bench-retune MTP_FLAG_BUNDLE (spec-draft-p-min 0.0 → ~0.75, n-max → 5 for dense)

## Problem
Our `MTP_FLAG_BUNDLE` (`src/hal0/config/schema.py`) ships `--spec-draft-p-min 0.0 --spec-draft-n-max 4`. Current research on MTP/speculative-decoding tuning for llama.cpp indicates:
- **`--spec-draft-p-min` matters more than `--spec-draft-n-max`** for effective throughput, and **0.0 is too permissive** — it accepts every draft regardless of confidence, wasting verification on low-probability drafts. Recommended sweet spot ~**0.75**.
- For dense models (e.g. Qwen3.6-27B dense), **`--spec-draft-n-max 5`** is the recommended draft length; MoE wants shorter/none.

Our bundle was bench-tuned earlier (`hal0-container-bench-2026-06-08.md`) but predates this guidance; the `rocm-mtp` profile currently benches slower than `rocm` (24.4 vs 52.8 tps) on the MoE workload — expected for MoE, but the dense path may be leaving throughput on the table with p-min 0.0.

## Ask
Bench `MTP_FLAG_BUNDLE` variants on Strix Halo with an **MTP-capable dense GGUF**:
- p-min: 0.0 (current) vs 0.5 vs 0.75
- n-max: 4 (current) vs 5
Measure tok/s + acceptance rate. Update the bundle + `PROFILE_BENCH` if a variant wins; keep `hal0-container-bench-*.md` in sync.

## Context
Surfaced during the slot-config MTP work (per-slot MTP override + capability-gated pill, PR for Phase 2). MTP helps dense / hurts MoE / needs an MTP-capable GGUF — see `docs/superpowers/specs/2026-06-14-slot-config-grouping-mtp-templates-design.md`.

Sources: github.com/ggml-org/llama.cpp/blob/master/docs/speculative.md; dredyson.com Qwen3.6-27B MTP guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bench-retune MTP_FLAG_BUNDLE (spec-draft-p-min 0.0 → ~0.75, n-max → 5 for dense) #799

Problem

Ask

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bench-retune MTP_FLAG_BUNDLE (spec-draft-p-min 0.0 → ~0.75, n-max → 5 for dense) #799

Description

Problem

Ask

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions