Skip to content

fix(dolt): surface + auto-clear stale compact quarantines; label backup-sync timeouts (gc-h7mc0tz)#1

Open
vbtcl wants to merge 1 commit into
mainfrom
fix/dolt-compact-quarantine-autoclear
Open

fix(dolt): surface + auto-clear stale compact quarantines; label backup-sync timeouts (gc-h7mc0tz)#1
vbtcl wants to merge 1 commit into
mainfrom
fix/dolt-compact-quarantine-autoclear

Conversation

@vbtcl

@vbtcl vbtcl commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Why

A transient post-flatten value-hash compact-quarantine marker silently disables ALL compaction/GC for a DB with no alert and no auto-staleness, letting the noms journal grow toward the corrupted-journal city-down threshold. beads_hq sat quarantined 16 days (journal 5.1G, data dir 13G) — the same path that ends in a city-wide Dolt outage. This is a recurring incident class (see gc-h7mc0tz).

What

Three improvements close the gap (commit b5116b0, examples/dolt source — the deployed .gc/system/packs/dolt is reconciler-managed so the fix must originate here):

  1. mol-dog-doctor: emit a [HIGH] health advisory whenever an active compact-quarantine marker exists. Counts only valid-db-name markers (matching the compactor's own has_compact_marker lookup) so operator archives like beads_hq.stale-cleared-* do not false-alarm.
  2. compact auto-clear: clear a quarantine marker older than GC_DOLT_COMPACT_QUARANTINE_STALE_SECS (default 6h) only once the DB reads clean (row counts) and is quiescent (whole-DB value hash stable across two probes a settle apart), then retry. The post-flatten re-verification re-quarantines on real drift — so this is a supervised retry that never bypasses integrity enforcement.
  3. Label backup-sync timeouts distinctly from hard failures.

Tests

7 new hermetic tests + full examples/dolt suite green (2 pre-existing env-specific failures unrelated).

Open for reviewer

  • Merge target: opened against vbtcl/gascity:main (the branch base). Retarget to upstream gastownhall/gascity if that's where the release is cut.
  • Release/deploy: tracked in gc-sffnhkx (P2) — needs a gc release + city reinstall after merge.
  • Independent: gc-o2n5yzz (P3) flags a dolt-2.0.7 ANSI-color leak in compact's remote-HEAD parsing, worth a look.

Filed by gastown.mayor on behalf of claude-1 (gc-wisp-ncb1). Refs gc-h7mc0tz, gc-sffnhkx.

…up sync timeouts

A transient post-flatten value-hash quarantine silently disabled ALL
compaction/GC for a DB with no alert and no auto-staleness, letting the
noms journal grow toward the corrupted-journal city-down threshold
(beads_hq sat quarantined 16 days; journal 5.1G, data dir 13G). Three
improvements close the gap:

1. mol-dog-doctor: emit a [HIGH] health advisory whenever an active
   compact-quarantine marker exists (it silently disables GC on critical
   infra). Counts only valid-db-name markers, matching the compactor's own
   has_compact_marker lookup, so operator archives like
   beads_hq.stale-cleared-20260607 do not false-alarm.

2. compact: auto-clear a quarantine marker older than
   GC_DOLT_COMPACT_QUARANTINE_STALE_SECS (default 6h) once the DB reads
   clean (row counts) and is quiescent (whole-DB value hash stable across
   two probes a settle apart), then retry compaction. The post-flatten
   re-verification re-quarantines if real drift remains, so auto-clear is
   a supervised retry that never bypasses integrity enforcement or GCs
   unverified data. Kill switch: GC_DOLT_COMPACT_QUARANTINE_AUTOCLEAR=0.

3. mol-dog-backup: distinguish a sync timeout (run_bounded rc 124 ->
   "sync timed out >120s; likely journal bloat/size", surfaced in the mail
   subject) from a generic sync error ("sync failed rc=N"), so journal
   bloat is diagnosable from the advisory.

Tests: 7 new hermetic tests in dog_exec_scripts_test.go (auto-clear when
quiescent, keep-fresh, keep-when-writer-active, kill switch, backup
timeout vs error, doctor advisory) plus a quarantine_writer_active fake
mode. Full examples/dolt suite green except two pre-existing,
environment-specific failures unrelated to this change
(TestCompactScriptRealDoltRemotePush: dolt 2.0.7 ANSI color in remote-HEAD
parse; TestRuntimeScriptManagedStateBeatsStaleEnvPort: port-resolve env).

Refs gc-h7mc0tz.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
vbtcl pushed a commit that referenced this pull request Jun 16, 2026
…(ga-c4w) (gastownhall#3103)

## Summary

Makes the mouse wheel drive **tmux copy-mode scrollback** in interactive
`gc`
sessions instead of leaking the wheel to the focused TUI (Claude Code's
own
history, a pager, or the shell) — durably and out-of-the-box — while
**headless
agent sessions stay mouse-off** (controller-poll safety). This is the
proper
in-source fix that supersedes the portharbour city-local `po-vtg2`
`set-hook`
stopgap.

Two facts made the wheel inert before this change, so the fix has two
parts:

- **Part B — runtime default (`internal/api/session_runtime.go`).**
  `sessionCreateHints` now sets `MouseOn: true`. The runtime skips
  `disableMouseAndActivity` only when `MouseOn` is true
(`internal/runtime/tmux/adapter.go:930`), so the `mouse on` set at
session
create (`tmux-theme.sh`) survives and the wheel binding can fire. This
seam
  flips exactly the two human-interactive callers — provider-adhoc
(`session_resolved_config.go`) and named sessions
(`session_resolution.go`).
The headless agent path resolves `MouseOn` from
`cmd/gc/template_resolve.go`
  (`cfgAgent.MouseModeOn()`) and is **not** involved → stays mouse-off.

- **Part A — pack binding
(`examples/gastown/.../tmux-keybindings.sh`).**
Adds root-table `WheelUpPane → copy-mode -e` / `WheelDownPane →
send-keys -M`
bindings (forces copy-mode even over mouse-reporting apps so scrollback
wins;
  Shift+wheel keeps native terminal selection). **No** `client-attached`
  `set-hook` stopgap — the `MouseOn` default replaces the prototype's.

> Why `sessionCreateHints` and not the bead's suggested
`mouse_mode='on'`
> template default: provider/named sessions build their runtime hints
solely via
> `sessionCreateHints`; their synthetic `&config.Agent{}` is discarded
after
> provider resolution, so a template `mouse_mode` would never reach
them. The
> hints builder is the minimal correct seam, and it keeps the change off
the
> agent-template path entirely (guaranteeing headless behavior is
unchanged).

## Micro-tasks (TDD red→green, per-task commits)

| task | commit | test |
| --- | --- | --- |
| T-001/T-002 interactive mouse-on default | `19d6a9cdf` |
`TestSessionCreateHintsEnablesMouse` |
| T-003 headless stays mouse-off (guard) | `fe1c2149f` |
`TestResolveTemplateHeadlessAgentStaysMouseOff` |
| T-004/T-005 pack wheel binding + no stopgap | `6bc2d400a` |
`TestTmuxKeybindingsScrollWheel` |
| T-006 build + targeted tests + CHANGELOG | `0745b53d8` | — |

## Testing

Run under the hermetic `env -i` wrapper (Makefile `TEST_ENV`) +
`icu4c@78` CGO flags.

- `go build ./...` → **Success**
- `go test ./internal/api/... ./examples/gastown/...` → **1635 passed**
- `go test ./cmd/gc/...` → all ga-c4w tests pass.

**Pre-existing, unrelated failures (not introduced here):**
`TestBdRuntimeEnvManagedCityProjectsHostOverride` and
`TestBdRuntimeEnvForRigInheritedManagedCityProjectsHostOverride` fail
**identically on base `dd3ee8524`** with none of this branch's changes
present
(managed-Dolt host-override port resolution; the local sandbox's
proxied-server
setup does not produce the override).
`TestProbeDetachedWork_TmuxExitStatus`
timeouts were host-env flakes that pass under the hermetic `env -i`
wrapper.

## Manual verification (acceptance #1, gastownhall#3 — not unit-testable)

After merge + pack roll, in a fresh interactive `gc session new
<provider>`:
1. Wheel-up in a Claude pane enters copy-mode scrollback; wheel-down
scrolls
   down and exits at the bottom.
2. Mouse pane-select, drag-resize, the `MouseDown1StatusRight` mail
popup, and
   Shift+wheel native selection all still work.
3. A headless agent session shows `mouse off`
   (`tmux show-options -t <sess> mouse`).

## For the reviewer (open questions, downstream-resolvable)

1. **`monitor-activity` side-effect.** `MouseOn=true` skips the whole
   `disableMouseAndActivity`, so interactive sessions also keep
`monitor-activity on` — same as `mouse_mode=on` agents already get,
benign
for a human-attended session. Split the helper (mouse conditional,
activity
always) only if you want activity off regardless. Out of scope unless
flagged.
2. **`WheelDownPane send-keys -M` at bottom of scrollback** — exit-clean
is
   covered by manual verification #1.

## Compliance

- **GDPR:** no-op. Governs tmux mouse-mode / key bindings for dev
tooling; no
personal or special-category data read, written, transmitted, or logged.
- **MDR Class I:** no-op. Outside the voxmemo → voxist-api clinical
pipeline.

## Follow-up (separate, not this PR)

Removing the portharbour city-local `po-vtg2` stopgap is a separate
city-store
task to file once this ships and the gastown pack is rolled.

Refs: ga-c4w (supersedes po-vtg2). Plan:
`docs/plans/durable-mouse-wheel-scrollback.md`.

---------

Co-authored-by: Eric Cestari <eric@escapevelocity.fr>
vbtcl pushed a commit that referenced this pull request Jun 16, 2026
…ession) (gastownhall#3139)

## Summary

Post-merge regression fix for **ga-c4w / PR gastownhall#3103**. `internal/api`
`sessionResumeHints` emitted `MouseOn: true` **unconditionally** for
every
resumed session — including pool/headless agents resumed through the API
worker
factory — re-enabling tmux mouse on controller-polled sessions and
breaking
ga-c4w's controller-poll-safety invariant.

This is human reviewer **sjarmak's MAJOR #1** (review 4437810731), which
was
dismissed and merged without a code fix.

## Root cause

`resolveWorkerSessionRuntimeWithMetadata` (wired as the worker factory's
`ResolveSessionRuntime` in `worker_factory.go`) calls
`sessionResumeHints` and
builds `runtime.Config` **directly** — it never routes through
`cmd/gc/template_resolve.go`. So the in-code assumption that headless
agents
"re-resolve MouseOn mouse-off downstream" did not hold for this path,
and a
resumed pool agent got mouse **on**. `MouseOn` has exactly one consumer
(`internal/runtime/tmux/adapter.go`: `if !cfg.MouseOn {
disableMouseAndActivity }`),
so `MouseOn=true` means mouse is not disabled.

## Fix

Gate `MouseOn` on an explicit interactive signal instead of hardcoding
`true`:

- `sessionResumeHints(..., interactive bool)` sets `MouseOn:
interactive`.
- `sessionResumeInteractive(metadata)` derives it from `session_origin
== "manual"`,
mirroring the create-path gate `templateParamsSessionOrigin(tp) ==
"manual"` in
  `templateParamsToConfig`.
- Both resume call sites (`buildSessionResume`,
`resolveWorkerSessionRuntimeWithMetadata`)
  pass the metadata-derived signal.

Only interactive (human-attached) resumes keep mouse-on. Pool/headless
resumes —
and any unknown/empty origin — resolve mouse-**off** (the safe
direction: never
enable mouse on a polled agent).

## Test plan

- **RED→GREEN:** new
`TestResolveWorkerSessionRuntimeResolvesMouseOnlyForInteractiveResume`
exercises the real worker-factory resolver
(`resolveWorkerSessionRuntimeWithMetadata`,
not a stub) for both cases: pool agent (`session_origin=worker`) →
`MouseOn=false`,
interactive (`session_origin=manual`) → `MouseOn=true`. Failed first on
the
  pool case (`MouseOn = true, want false`), passes after the fix.
- `TestSessionResumeHintsEnablesMouse` extended with the
`interactive=false` →
  `MouseOn=false` case (previously proved only the true case).
- `go test ./internal/api/` green; `go vet ./internal/api/` clean.

Refs ga-g7go, ga-c4w #1 (sjarmak review 4437810731), PR gastownhall#3103.

Co-authored-by: Eric Cestari <eric@escapevelocity.fr>
vbtcl pushed a commit that referenced this pull request Jun 16, 2026
…rted before creation_complete) (gastownhall#3466) (gastownhall#3503)

Fixes the crash-loop reported in gastownhall#3466 (sibling of gastownhall#3109; relates to
gastownhall#534): a tmux-transport agent whose work_dir loads a project-scoped MCP
server blocks on Claude Code's "New MCP server found in this project"
trust modal, which a headless managed agent cannot answer, so the
session-create handshake aborts ("aborted before creation_complete") and
`mode=always` agents crash-loop.

Defense in depth, two independent commits:

1. **Preventive** — `enableAllProjectMcpServers: true` in the projected
Claude settings template (`internal/hooks/config/claude.json`), next to
the existing `skipDangerousModePermissionPrompt`. The modal never
renders for projected agents. (Issue ask #2.)
2. **Reactive** — a new MCP-trust dialog class in
`internal/runtime/dialog.go` that selects option 2 ("Use this and all
future MCP servers in this project"), covering agents gc does not
project settings for. (The narrow, still-open piece of gastownhall#534 / issue ask
#1.)

### Verification

- Reproduced and fix-checked the modal directly against Claude Code
2.1.177 in a throwaway tmux session: an untrusted project `.mcp.json`
renders the modal on launch; the same launch with
`enableAllProjectMcpServers: true` in the `--settings` file goes
straight to the prompt with no modal. (Note: `-p`/print mode does not
render the project-MCP gate, so it cannot reproduce this — the modal
only appears on the interactive tmux launch path.)
- Tests: extended `TestInstallClaude` to assert the key reaches the
projected `.gc/settings.json`; added matcher + peek + stream tests in
`internal/runtime/dialog_test.go`.
- `make check` (fmt, lint, vet, full test suite) green.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant