Skip to content

token-economy port: compaction-trigger + compaction-prefilter + pipeline wiring#6

Merged
dancinlife merged 5 commits into
mainfrom
feat/compaction-trigger
May 17, 2026
Merged

token-economy port: compaction-trigger + compaction-prefilter + pipeline wiring#6
dancinlife merged 5 commits into
mainfrom
feat/compaction-trigger

Conversation

@dancinlife
Copy link
Copy Markdown
Contributor

Summary

Ports hive's token-saving system to wilson hexa-native and wires the
previously-dead compaction pipeline. 5 commits:

  • phase 1 9a77786compaction-trigger plugin (opt-in): the WHEN
    decision. Ports hive compaction.hexa's pure subset (CompactionSettings,
    calculate_context_tokens, should_compact), subscribes request_post,
    arms + emits compact_armed, exposes status + /compaction. Wires
    the documented-but-dead max_session_tokens knob.
  • phase 2a 5197122compaction-prefilter plugin (opt-in):
    TF-IDF line-salience (compactByTfidf port, hive git 5bb968618),
    compact_request @Transform priority 80.
  • phase 2b fd03102 — MinHash near-dup paragraph dedup (hive
    minhash-daemon.hexa kernel; pairwise jaccard, no LSH — small P).
  • phase 2c 9406b9b — reduction bench (constructed representative
    fixture: 66.4% byte reduction, honestly framed as NOT hive's -34.6%)
    • an O(n²)→O(n) string-build fix the bench surfaced.
  • phase 1.5 eaab29f — the linchpin: nothing emitted
    compact_request, so compaction-prefilter/-local/-llm were all dead.
    Added host_fire_hook (generic core/host surface) and routed
    harness_cli_compact through it (summarizer now sees the reduced
    transcript — hive's design), plus turn-boundary auto-fire via
    compaction-trigger. Safe degradation when plugins absent (g8).

Test

wilson build default + --with compaction-trigger,compaction-prefilter
OK; wilson test 23/23 (default + custom); per-plugin selftests pass
(void-host / constructed-fixture — wilson's plugin selftest harness is
unwired, same caveat as compaction-local). End-to-end /compact + LLM is
outside wilson test; pipeline correctness is by construction (agent_loop
uses fire_hook identically; event_bus _is_replace matches the prefilter
contract proven by selftest).

Notes

  • New plugins are opt-in (--with), not default bundle.
  • core changes: core/host.hexa host_fire_hook (generic), core/loader.hexa
    taxonomy rows. No g8 violation (generic surface + plugin index).
  • Follow-up flagged: cost plugin's crude trigger uses a cumulative-sum
    metric (not a window-occupancy proxy) — left untouched, documented.

🤖 Generated with Claude Code

ghost and others added 5 commits May 17, 2026 21:02
…nomy port phase 1

wilson had compaction-local / compaction-llm (HOW to compact) but no
trigger deciding WHEN — `max_session_tokens` sat in core/loader.hexa as
a documented-but-dead "(future)" knob.

New opt-in plugin `compaction-trigger` ports the pure-function subset of
hive's core/compaction/compaction.hexa (hive commit 4150a4612):
CompactionSettings + calculate_context_tokens + should_compact, adapted
to wilson's request_post usage shape.

- subscribes request_post @ observe → tracks live context occupancy
  (latest turn's input+output+cache_read — a snapshot, not the cost
  plugin's cumulative sum which is not a window-occupancy proxy)
- threshold = max_session_tokens config when set (wires the dead knob),
  else context_window - reserve_tokens
- on should_compact → arms once (host_render notification + compact_armed
  emit); does NOT auto-fire mid-turn (wilson's deliberate stance)
- `status` action (for the PI-MONO-UIUX g27 meter) + `/compaction` slash

hive's cut-point / prepare_compaction not ported (coupled to hive's
session-entry shape; compaction-local already owns the trim strategy).

core/loader.hexa: taxonomy rows for compaction-trigger (plugin index,
every plugin has one).

Verified: build --with compaction-trigger OK (42 plugins) + default
build OK; wilson test 23/23; test_compaction_trigger.hexa selftest passes.

Phase 2 (separate) = TF-IDF/MinHash pre-filter — the actual -34.6%
summarizer-token saving hive measured (hive commit 11a92ce8b).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…conomy port phase 2a

Ports hive @need-singularity/hive-compression's compactByTfidf
(tfidf.archive.txt + tokenize.archive.txt, recovered from hive git
5bb968618) to hexa-native — the mechanism behind hive's measured
-34.6% summarizer-token saving (hive commit 11a92ce8b).

hive deferred its own hexa port only because the TS host could call
hexa solely via subprocess (packages/compression/hexa/DEFERRED.md §1);
wilson is hexa-native and calls the kernel in-process, so that blocker
does not apply.

plugins/compaction-prefilter (opt-in, --with):
- compact_request @ transform, priority 80 — runs before
  compaction-local (60) / compaction-llm (10)
- scores every transcript line by TF-IDF salience over the whole
  transcript corpus (Σ tf·log(N/df)), percentile-buckets the nonzero
  scores, truncates each line to 400 / 200 / 80 chars by bucket
- ``` code-fence interiors preserved verbatim; zero-token lines kept
- replace only when it actually shrank (passthrough parity)
- deterministic; no host I/O / LLM in the hot path
- hexa: hand-rolled ASCII fold (n19 .to_lower* trap), no >=/<=,
  explicit float coercion for the log/percentile math

core/loader.hexa: taxonomy rows (plugin index, every plugin has one).

Verified: build --with compaction-trigger,compaction-prefilter OK
(43 plugins) + default build OK; wilson test 23/23 (custom + default);
test_compaction_prefilter.hexa selftest passes (low-salience filler
truncated, rare line kept whole, code fence verbatim).

Phase 2b = MinHash dedup (hive minhash-daemon.hexa kernel exists).
Phase 2c = port bench-compaction to measure wilson's actual reduction %.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds hive pre-filter's second stage: collapse near-duplicate paragraphs
(retry-loop output, repeated tool dumps, banner spam) that TF-IDF
salience cannot shed.

Ports hive minhash-dedup.archive.txt (sync path) + the minhash-daemon.hexa
kernel (git 5bb968618 / db2c790): fnv1a32, xorshift32, (a·x+b) mod
M31 permute, normalize (lower + ws-collapse), 5-char shingles, K=128
signature, jaccard — all hexa-native. seed 0xdeadbeef parity.

Simplification: hive's full LSH banding (b16/r8) is dropped — a
transcript's paragraph count is small, so canonical-vs-candidate
pairwise jaccard (O(C·P·K), deterministic, exact) is sufficient.

Pipeline is now dedup_transcript → compact (TF-IDF). Per-message
(message-boundary-safe; cross-message whole-message dup is partly
covered by compaction-local's head+tail). Drop runs collapse to one
`[duplicate paragraph ×N]` marker.

hexa: decimal (not hex) literals for codegen safety, char_code instead
of .bytes(), >= avoided (j > T-ε), explicit float coercion.

Verified: build --with compaction-trigger,compaction-prefilter OK
(43 plugins) + default build OK; wilson test 23/23 (custom + default);
selftest passes (existing TF-IDF cases unchanged — no blank lines so
dedup is a no-op there — plus new dedup case: 3 dup paras → 1 + ×2
marker). No core changes (taxonomy landed in phase 2a).

Phase 2c = port bench-compaction to measure wilson's actual reduction %.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bench surfaced a real defect: lower()/normalize() built strings via
per-char `out = out + c` — O(n²), AGENTS.tape's string-concat-quadratic
trap. A 15KB tool-dump line hung the interpreter and would also hurt
compiled wilson on large transcripts. Fixed: array-push + one .join("")
(O(n), hexa builtin join); compaction_prefilter_join → xs.join(sep);
signature() now shingles only the leading SIG_MAXCHARS=4096 (near-dups
diverge from the start — deterministic, bounds char-shingle × K=128).

bench_prefilter.hexa: real ~/.wilson session jsonl was unusable
(json_parse rejects the non-UTF-8 / unescaped big tool dumps — the
interesting bulk drops, meaningless ~0%; and the interpreter is the slow
fallback per n6). So the bench builds a deterministic representative
trace (retry loop logging the same verbose output 4× → dedup; long
low-salience boilerplate dump → TF-IDF) and reports:

  5 msgs · 3847 B (~962 tok) → 1292 B (~323 tok) · 66.4% byte reduction

Honestly framed in-output as "this fixture; NOT hive's -34.6%" — hive's
number was its own 2.4MB/446-msg fixture + TS impl, not comparable; this
is wilson's own indicative figure on a constructed fixture.

Verified: selftest passes (correctness preserved after O(n)/SIG change),
build --with compaction-trigger,compaction-prefilter + default OK,
wilson test 23/23 (custom + default). No core changes.

token-economy port phases 1 / 2a / 2b / 2c complete. Remaining: phase
1.5 — wire a harness turn-boundary consumer of compact_armed for actual
auto-fire.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uto-fire — phase 1.5

Discovery: nothing emitted compact_request. harness_cli_compact did its
own LLM summarize and ignored the transform pipeline, so
compaction-prefilter (2a/2b) AND the pre-existing compaction-local /
compaction-llm were all dead code — phase-2 work was never on the live
path.

A. Pipeline wiring (the linchpin):
   - core/host.hexa: host_fire_hook — a generic host wrapper over
     event_bus fire_hook (validate→transform-fold→observe). g8-sanctioned
     generic surface; agent_loop already drives hooks via fire_hook
     internally, this exposes it to a harness.
   - harness_cli_compact: runs host_fire_hook("compact_request",
     CompactRequest{transcript, target_tokens:8000}) before the summarize
     prompt and summarizes the REDUCED transcript — deterministic
     dedup+TF-IDF shrink so the summarizer LLM sees far fewer tokens
     (hive's -34.6%-summarizer design). No subscriber bundled → fire_hook
     returns payload unchanged → full transcript summarized as before
     (safe degradation, zero coupling — g8).

B. Turn-boundary auto-fire (named phase 1.5):
   - harness_cli_autocompact_tick at handle_user_line's tail (beside
     loop_tick, non-cancelled only): if compaction-trigger is bundled and
     its status says should_compact, enqueue /compact for the NEXT turn
     boundary (never mid-turn — wilson's deliberate stance). Throttled by
     SESSION.messages delta; a real /compact collapses the transcript so
     it self-rearms only when context regrows. compaction-trigger absent
     → no-op. Generalizes the /loop-only Phase 7c auto-compact.

Verified: wilson build (default) + --with compaction-trigger,
compaction-prefilter OK; wilson test 23/23 (default + custom). Pipeline
correctness by construction (agent_loop uses fire_hook identically;
event_bus _is_replace matches action=="replace"; prefilter selftest
proves the {action:replace,value:TranscriptData} contract). End-to-end
(/compact + LLM) is outside `wilson test` (needs interactive + LLM).

token-economy port complete: phase 1 / 2a / 2b / 2c / 1.5. The
prefilter/local/llm plugins now actually run on /compact (manual + auto).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dancinlife dancinlife merged commit 39f231a into main May 17, 2026
0 of 2 checks passed
@dancinlife dancinlife deleted the feat/compaction-trigger branch May 17, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant