fix(dolt): surface + auto-clear stale compact quarantines; label backup-sync timeouts (gc-h7mc0tz)#1
Open
vbtcl wants to merge 1 commit into
Open
fix(dolt): surface + auto-clear stale compact quarantines; label backup-sync timeouts (gc-h7mc0tz)#1vbtcl wants to merge 1 commit into
vbtcl wants to merge 1 commit into
Conversation
…up sync timeouts
A transient post-flatten value-hash quarantine silently disabled ALL
compaction/GC for a DB with no alert and no auto-staleness, letting the
noms journal grow toward the corrupted-journal city-down threshold
(beads_hq sat quarantined 16 days; journal 5.1G, data dir 13G). Three
improvements close the gap:
1. mol-dog-doctor: emit a [HIGH] health advisory whenever an active
compact-quarantine marker exists (it silently disables GC on critical
infra). Counts only valid-db-name markers, matching the compactor's own
has_compact_marker lookup, so operator archives like
beads_hq.stale-cleared-20260607 do not false-alarm.
2. compact: auto-clear a quarantine marker older than
GC_DOLT_COMPACT_QUARANTINE_STALE_SECS (default 6h) once the DB reads
clean (row counts) and is quiescent (whole-DB value hash stable across
two probes a settle apart), then retry compaction. The post-flatten
re-verification re-quarantines if real drift remains, so auto-clear is
a supervised retry that never bypasses integrity enforcement or GCs
unverified data. Kill switch: GC_DOLT_COMPACT_QUARANTINE_AUTOCLEAR=0.
3. mol-dog-backup: distinguish a sync timeout (run_bounded rc 124 ->
"sync timed out >120s; likely journal bloat/size", surfaced in the mail
subject) from a generic sync error ("sync failed rc=N"), so journal
bloat is diagnosable from the advisory.
Tests: 7 new hermetic tests in dog_exec_scripts_test.go (auto-clear when
quiescent, keep-fresh, keep-when-writer-active, kill switch, backup
timeout vs error, doctor advisory) plus a quarantine_writer_active fake
mode. Full examples/dolt suite green except two pre-existing,
environment-specific failures unrelated to this change
(TestCompactScriptRealDoltRemotePush: dolt 2.0.7 ANSI color in remote-HEAD
parse; TestRuntimeScriptManagedStateBeatsStaleEnvPort: port-resolve env).
Refs gc-h7mc0tz.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
vbtcl
pushed a commit
that referenced
this pull request
Jun 16, 2026
…(ga-c4w) (gastownhall#3103) ## Summary Makes the mouse wheel drive **tmux copy-mode scrollback** in interactive `gc` sessions instead of leaking the wheel to the focused TUI (Claude Code's own history, a pager, or the shell) — durably and out-of-the-box — while **headless agent sessions stay mouse-off** (controller-poll safety). This is the proper in-source fix that supersedes the portharbour city-local `po-vtg2` `set-hook` stopgap. Two facts made the wheel inert before this change, so the fix has two parts: - **Part B — runtime default (`internal/api/session_runtime.go`).** `sessionCreateHints` now sets `MouseOn: true`. The runtime skips `disableMouseAndActivity` only when `MouseOn` is true (`internal/runtime/tmux/adapter.go:930`), so the `mouse on` set at session create (`tmux-theme.sh`) survives and the wheel binding can fire. This seam flips exactly the two human-interactive callers — provider-adhoc (`session_resolved_config.go`) and named sessions (`session_resolution.go`). The headless agent path resolves `MouseOn` from `cmd/gc/template_resolve.go` (`cfgAgent.MouseModeOn()`) and is **not** involved → stays mouse-off. - **Part A — pack binding (`examples/gastown/.../tmux-keybindings.sh`).** Adds root-table `WheelUpPane → copy-mode -e` / `WheelDownPane → send-keys -M` bindings (forces copy-mode even over mouse-reporting apps so scrollback wins; Shift+wheel keeps native terminal selection). **No** `client-attached` `set-hook` stopgap — the `MouseOn` default replaces the prototype's. > Why `sessionCreateHints` and not the bead's suggested `mouse_mode='on'` > template default: provider/named sessions build their runtime hints solely via > `sessionCreateHints`; their synthetic `&config.Agent{}` is discarded after > provider resolution, so a template `mouse_mode` would never reach them. The > hints builder is the minimal correct seam, and it keeps the change off the > agent-template path entirely (guaranteeing headless behavior is unchanged). ## Micro-tasks (TDD red→green, per-task commits) | task | commit | test | | --- | --- | --- | | T-001/T-002 interactive mouse-on default | `19d6a9cdf` | `TestSessionCreateHintsEnablesMouse` | | T-003 headless stays mouse-off (guard) | `fe1c2149f` | `TestResolveTemplateHeadlessAgentStaysMouseOff` | | T-004/T-005 pack wheel binding + no stopgap | `6bc2d400a` | `TestTmuxKeybindingsScrollWheel` | | T-006 build + targeted tests + CHANGELOG | `0745b53d8` | — | ## Testing Run under the hermetic `env -i` wrapper (Makefile `TEST_ENV`) + `icu4c@78` CGO flags. - `go build ./...` → **Success** - `go test ./internal/api/... ./examples/gastown/...` → **1635 passed** - `go test ./cmd/gc/...` → all ga-c4w tests pass. **Pre-existing, unrelated failures (not introduced here):** `TestBdRuntimeEnvManagedCityProjectsHostOverride` and `TestBdRuntimeEnvForRigInheritedManagedCityProjectsHostOverride` fail **identically on base `dd3ee8524`** with none of this branch's changes present (managed-Dolt host-override port resolution; the local sandbox's proxied-server setup does not produce the override). `TestProbeDetachedWork_TmuxExitStatus` timeouts were host-env flakes that pass under the hermetic `env -i` wrapper. ## Manual verification (acceptance #1, gastownhall#3 — not unit-testable) After merge + pack roll, in a fresh interactive `gc session new <provider>`: 1. Wheel-up in a Claude pane enters copy-mode scrollback; wheel-down scrolls down and exits at the bottom. 2. Mouse pane-select, drag-resize, the `MouseDown1StatusRight` mail popup, and Shift+wheel native selection all still work. 3. A headless agent session shows `mouse off` (`tmux show-options -t <sess> mouse`). ## For the reviewer (open questions, downstream-resolvable) 1. **`monitor-activity` side-effect.** `MouseOn=true` skips the whole `disableMouseAndActivity`, so interactive sessions also keep `monitor-activity on` — same as `mouse_mode=on` agents already get, benign for a human-attended session. Split the helper (mouse conditional, activity always) only if you want activity off regardless. Out of scope unless flagged. 2. **`WheelDownPane send-keys -M` at bottom of scrollback** — exit-clean is covered by manual verification #1. ## Compliance - **GDPR:** no-op. Governs tmux mouse-mode / key bindings for dev tooling; no personal or special-category data read, written, transmitted, or logged. - **MDR Class I:** no-op. Outside the voxmemo → voxist-api clinical pipeline. ## Follow-up (separate, not this PR) Removing the portharbour city-local `po-vtg2` stopgap is a separate city-store task to file once this ships and the gastown pack is rolled. Refs: ga-c4w (supersedes po-vtg2). Plan: `docs/plans/durable-mouse-wheel-scrollback.md`. --------- Co-authored-by: Eric Cestari <eric@escapevelocity.fr>
vbtcl
pushed a commit
that referenced
this pull request
Jun 16, 2026
…ession) (gastownhall#3139) ## Summary Post-merge regression fix for **ga-c4w / PR gastownhall#3103**. `internal/api` `sessionResumeHints` emitted `MouseOn: true` **unconditionally** for every resumed session — including pool/headless agents resumed through the API worker factory — re-enabling tmux mouse on controller-polled sessions and breaking ga-c4w's controller-poll-safety invariant. This is human reviewer **sjarmak's MAJOR #1** (review 4437810731), which was dismissed and merged without a code fix. ## Root cause `resolveWorkerSessionRuntimeWithMetadata` (wired as the worker factory's `ResolveSessionRuntime` in `worker_factory.go`) calls `sessionResumeHints` and builds `runtime.Config` **directly** — it never routes through `cmd/gc/template_resolve.go`. So the in-code assumption that headless agents "re-resolve MouseOn mouse-off downstream" did not hold for this path, and a resumed pool agent got mouse **on**. `MouseOn` has exactly one consumer (`internal/runtime/tmux/adapter.go`: `if !cfg.MouseOn { disableMouseAndActivity }`), so `MouseOn=true` means mouse is not disabled. ## Fix Gate `MouseOn` on an explicit interactive signal instead of hardcoding `true`: - `sessionResumeHints(..., interactive bool)` sets `MouseOn: interactive`. - `sessionResumeInteractive(metadata)` derives it from `session_origin == "manual"`, mirroring the create-path gate `templateParamsSessionOrigin(tp) == "manual"` in `templateParamsToConfig`. - Both resume call sites (`buildSessionResume`, `resolveWorkerSessionRuntimeWithMetadata`) pass the metadata-derived signal. Only interactive (human-attached) resumes keep mouse-on. Pool/headless resumes — and any unknown/empty origin — resolve mouse-**off** (the safe direction: never enable mouse on a polled agent). ## Test plan - **RED→GREEN:** new `TestResolveWorkerSessionRuntimeResolvesMouseOnlyForInteractiveResume` exercises the real worker-factory resolver (`resolveWorkerSessionRuntimeWithMetadata`, not a stub) for both cases: pool agent (`session_origin=worker`) → `MouseOn=false`, interactive (`session_origin=manual`) → `MouseOn=true`. Failed first on the pool case (`MouseOn = true, want false`), passes after the fix. - `TestSessionResumeHintsEnablesMouse` extended with the `interactive=false` → `MouseOn=false` case (previously proved only the true case). - `go test ./internal/api/` green; `go vet ./internal/api/` clean. Refs ga-g7go, ga-c4w #1 (sjarmak review 4437810731), PR gastownhall#3103. Co-authored-by: Eric Cestari <eric@escapevelocity.fr>
vbtcl
pushed a commit
that referenced
this pull request
Jun 16, 2026
…rted before creation_complete) (gastownhall#3466) (gastownhall#3503) Fixes the crash-loop reported in gastownhall#3466 (sibling of gastownhall#3109; relates to gastownhall#534): a tmux-transport agent whose work_dir loads a project-scoped MCP server blocks on Claude Code's "New MCP server found in this project" trust modal, which a headless managed agent cannot answer, so the session-create handshake aborts ("aborted before creation_complete") and `mode=always` agents crash-loop. Defense in depth, two independent commits: 1. **Preventive** — `enableAllProjectMcpServers: true` in the projected Claude settings template (`internal/hooks/config/claude.json`), next to the existing `skipDangerousModePermissionPrompt`. The modal never renders for projected agents. (Issue ask #2.) 2. **Reactive** — a new MCP-trust dialog class in `internal/runtime/dialog.go` that selects option 2 ("Use this and all future MCP servers in this project"), covering agents gc does not project settings for. (The narrow, still-open piece of gastownhall#534 / issue ask #1.) ### Verification - Reproduced and fix-checked the modal directly against Claude Code 2.1.177 in a throwaway tmux session: an untrusted project `.mcp.json` renders the modal on launch; the same launch with `enableAllProjectMcpServers: true` in the `--settings` file goes straight to the prompt with no modal. (Note: `-p`/print mode does not render the project-MCP gate, so it cannot reproduce this — the modal only appears on the interactive tmux launch path.) - Tests: extended `TestInstallClaude` to assert the key reaches the projected `.gc/settings.json`; added matcher + peek + stream tests in `internal/runtime/dialog_test.go`. - `make check` (fmt, lint, vet, full test suite) green. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
A transient post-flatten value-hash compact-quarantine marker silently disables ALL compaction/GC for a DB with no alert and no auto-staleness, letting the noms journal grow toward the corrupted-journal city-down threshold. beads_hq sat quarantined 16 days (journal 5.1G, data dir 13G) — the same path that ends in a city-wide Dolt outage. This is a recurring incident class (see gc-h7mc0tz).
What
Three improvements close the gap (commit b5116b0, examples/dolt source — the deployed
.gc/system/packs/doltis reconciler-managed so the fix must originate here):[HIGH]health advisory whenever an active compact-quarantine marker exists. Counts only valid-db-name markers (matching the compactor's ownhas_compact_markerlookup) so operator archives likebeads_hq.stale-cleared-*do not false-alarm.GC_DOLT_COMPACT_QUARANTINE_STALE_SECS(default 6h) only once the DB reads clean (row counts) and is quiescent (whole-DB value hash stable across two probes a settle apart), then retry. The post-flatten re-verification re-quarantines on real drift — so this is a supervised retry that never bypasses integrity enforcement.Tests
7 new hermetic tests + full
examples/doltsuite green (2 pre-existing env-specific failures unrelated).Open for reviewer
vbtcl/gascity:main(the branch base). Retarget to upstreamgastownhall/gascityif that's where the release is cut.Filed by gastown.mayor on behalf of claude-1 (gc-wisp-ncb1). Refs gc-h7mc0tz, gc-sffnhkx.