feat: add optional Headroom compression proxy support#22
Open
mydisha wants to merge 1 commit into
Open
Conversation
Add opt-in Headroom support for local development via a new make target and KEIROUTER_HEADROOM_AUTO flag, keeping the default dev/setup flow unchanged. Track Headroom activity, token savings, and transforms in usage records so compression behavior can be observed alongside existing optimization modes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds opt-in integration with the Headroom compression proxy. When enabled, KeiRouter sends requests through Headroom for token compression and tracks exact savings (tokens before/after, compression ratio, transforms) alongside existing optimization modes.
Why
Headroom reports exact token counts from its compression proxy (unlike RTK's byte/4 estimate), giving more accurate savings analytics. Integration is fully opt-in so the default
make dev/make setupflow remains unchanged.How
backend/internal/headroom/— HTTP client that sends OpenAI-format messages to Headroom, applies compressed results back to the canonicalcore.ChatRequest, and returnsStats(tokens before/after/saved, compression ratio, transforms, CCR hashes). Skips unsupported content parts gracefully.pipeline.go) — Headroom runs as a compression step inapplyTokenSaving. WhenHeadroom.Enabled, it takes precedence over Slimmer; otherwise Slimmer/Terse/Caveman run as before. Failures are non-fatal (logs a warning and continues uncompressed).0021_headroom_savings.sqladdsheadroom_*columns tousage_records;store/models.goandrepo_usage.gopersist and aggregate headroom stats. Separate from RTK estimates so both systems can be observed independently.meterandobserv/metricstrack headroom activity and savings;gateway/insights.goandgateway/settings.goexpose headroom config and savings in the admin API.make headroomtarget starts the proxy via Docker or native CLI.KEIROUTER_HEADROOM_AUTO=1 make devauto-starts it alongside backend + dashboard.compose.yamladds aheadroomservice with healthcheck and wiresKEIROUTER_HEADROOM__BASE_URL.SavingsBreakdowncomponent surface headroom savings separately from RTK.scripts/quickstart.shupdated with headroom setup instructions.Changed files (21 files, +967/-61)
headroom/headroom.go,headroom/headroom_test.go,pipeline/pipeline.gostore/migrations/0021_headroom_savings.sql,store/models.go,store/repo_usage.gometer/meter.go,observ/metrics.gogateway/insights.go,gateway/settings.go,gateway/gemini.go,gateway/handlers.go,app/app.gopages/Settings.tsx,pages/Usage.tsx,components/SavingsBreakdown.tsx,lib/api.tsMakefile,compose.yaml,README.md,scripts/quickstart.shChecklist
make testpassesmake vetpassesnpm run typecheckpasses (frontend changed)npm run lint— N/A (no lint script configured infrontend/package.json)