Skip to content

feat: trust console + memory check layer (v0.15.0)#40

Merged
codejunkie99 merged 24 commits into
masterfrom
feature/trust-console-tui
May 5, 2026
Merged

feat: trust console + memory check layer (v0.15.0)#40
codejunkie99 merged 24 commits into
masterfrom
feature/trust-console-tui

Conversation

@codejunkie99
Copy link
Copy Markdown
Owner

Summary

  • New agentic-stack doctor, tui, memory ..., verify, team ... commands inspect the file-backed .agent/ data layer with no daemon. Same normalized data model powers text, JSON, and a read-only stdlib curses TUI.
  • Status glyphs (✓ / ! / ✗) replace PASS/WARN/FAIL in the TUI; encoding-aware fallback to + / ! / x under non-UTF-8 stdout.
  • Three codex-review-surfaced fixes: adapter conformance now requires every listed file, doctor --json preserves non-zero exit on failure, glyph fallback for ASCII terminals.
  • 27/27 standalone regression checks (verify_trust_console.py).
  • Branch was 72 behind master; this PR includes the integration merge with conflict resolutions documented in commit bddc63b.

Test plan

  • python3 verify_trust_console.py — 27/27 pass
  • python3 agentic_stack_cli.py doctor — exit 0
  • python3 agentic_stack_cli.py tui --plain — renders glyphs
  • python3 agentic_stack_cli.py verify --all --json — exit 0
  • Fresh install claude-code <tmp> --yes + verify claude-code — all six conformance dimensions pass
  • CI on the branch
  • Brew formula bump after v0.15.0 tag (follow-up commit)

codejunkie99 and others added 24 commits April 25, 2026 15:22
Baseline commit so subsequent per-bug fixes have minimal diffs.
No behavior changes; just brings these files under version control:

- .agent/harness/runtime.py
- .agent/harness/control_plane.py
- .agent/harness/lesson_store.py
- .agent/tools/instances.py
Previously, the second positional arg was assigned to TARGET unconditionally,
so the documented form `agentic-stack claude-code --yes` wrote into a literal
`--yes/.agent` directory. Now flags are filtered out of positional parsing,
TARGET defaults to $PWD when only an adapter and flags are passed, and
unknown -flags are rejected rather than silently consumed.

Refs: HIGH_PRIORITY_BUG_REPORT.md (P0)
`_parse_args()` previously treated every arg starting with `-` as a flag,
silently dropping target paths that begin with `-` and falling back to
`os.getcwd()`. Now `--` ends flag parsing, only known flags
(--yes/-y/--force/--reconfigure) are consumed as flags, and unrecognized
-tokens warn-and-treat-as-path instead of being eaten.

Refs: HIGH_PRIORITY_BUG_REPORT.md (P0)
`mark_worker_stopped()` previously left `active_instance` pointing at a
stopped instance, so workers that exited via STOP/SIGINT/SIGTERM kept the
registry routing future work to a dead instance. Now matches the CLI
`stop_instance()` behavior: clears `active_instance` if it points at the
instance being stopped, then persists.

Refs: HIGH_PRIORITY_BUG_REPORT.md (P0)
Hook only checked `blocked_targets` and the `requires_approval` boolean,
ignoring `blocked_patterns` and `requires_approval_patterns` from the
shell schema. Now matches command strings against both pattern lists
via re.search, blocks bad regex with stderr warnings (fail-soft), and
runs before the legacy boolean and permissions.md keyword heuristics.

Catches: `curl ... | sh`, `rm -rf /`, `git push --force`, etc.

Refs: HIGH_PRIORITY_BUG_REPORT.md (Critical)
No CI previously ran the documented verifier scripts, so high-risk
areas could regress without merge-time signal. Workflow runs on push
and PR to master with three jobs:

- verifiers (ubuntu): test_claude_code_hook.py, verify_codex_fixes.py,
  verify_instances.py
- installer-smoke (ubuntu): exercises both `install.sh claude-code <path> --yes`
  and the documented no-path form `install.sh claude-code --yes`,
  asserting no literal `--yes/` directory is created
- installer-windows-pwsh (windows): pwsh install.ps1 parity

Refs: HIGH_PRIORITY_BUG_REPORT.md (P0)
Windows installer omitted the documented `pi` adapter from the usage
comment, ValidAdapters list, and switch cases. Now mirrors install.sh:
creates `<TARGET>/.pi/AGENTS.md` only if absent, then wires
`.pi/skills` to `.agent/skills` via SymbolicLink, falling back to
Junction, then a recursive copy. Safer than install.sh: an existing
real `.pi/skills` directory is renamed to a timestamped `.bak-` rather
than rm-rf'd.

Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
Existing Homebrew test only used the explicit-path form `claude-code <path>
--yes`, so it never exercised the broken documented `claude-code --yes`
ordering. Pre-creating `testpath/.agent/memory/personal` also masked the
install.sh skip-when-exists branch in the .agent copy.

Now: removed the pre-creation, asserted `runtime.py` exists after the
explicit-path install (full tree copy), then ran `claude-code --yes`
inside a fresh subdir asserting no `--yes/` directory was created.

Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
Manifest-provided names and precondition paths were joined under
SKILLS_DIR / ROOT without containment checks, so a poisoned manifest
entry with `../` could probe files outside the skill tree.

Adds `_within(root, candidate)` resolve-and-relative-to check, regex
validation for skill names, and per-file containment checks before
opening SKILL.md / KNOWLEDGE.md. Bad entries warn to stderr and skip
rather than crash the loader.

Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
Concurrent `start` calls could both observe no live worker and spawn
duplicate workers for the same queue (check then subprocess then
mark-started left a TOCTOU window).

Adds an fcntl exclusive non-blocking lock on `<runtime>/spawn.lock`
held across re-check-spawn-mark, so a contended caller bails fast
with "another spawn in flight". Liveness now also checks via
`os.kill(pid, 0)` so a stale-but-non-None pid triggers respawn.

Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
Required and optional sections previously appended unconditionally,
with only matched-skills gated by the budget — so an oversized
WORKSPACE.md or lessons file would blow past `budget` regardless.

Now every append checks `_room()` first. Required sections (role,
permissions, paths) are truncated with a marker rather than dropped;
optional sections (lessons, episodes, skills) skip with an "[N items
omitted]" marker. Reserves a per-required-section header floor so an
early section cannot starve later ones. Returns a `_UsedTokens` int
subclass exposing `.overflow` while preserving the `(ctx, used)`
2-tuple shape for existing callers.

Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
`mark_graduated` / `mark_rejected` / `mark_reopened` joined raw
`candidate_id` into paths without sanitization, so an id with `../`
could resolve outside the candidates directory.

Adds module-level `_validate_candidate_id` (regex
`^[a-zA-Z0-9_-]{1,128}$`) called at the top of each lifecycle entry
point, plus `_ensure_within` realpath-containment defense-in-depth
against symlink shenanigans. Non-atomic write fix is a separate commit.

Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
`graduate.py` joined raw `candidate_id` into paths from sys.argv, so a
caller could probe candidate-shaped JSON outside the candidate dir.

Validates candidate_id at the CLI entry point right after parse_args
(rejects with exit code 4) and adds a `_safe_candidate_path` helper
that re-validates plus realpath-checks containment under
CANDIDATES_DIR. Imports `_validate_candidate_id` from review_state
when available, falls back to a local copy with the same regex.

Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
Duplicate-detection callers in graduate.py and auto_dream.py read
LESSONS.md, which is rendered accepted-only — so a provisional lesson
could be re-staged or re-graduated as if novel. Adds
`render_dedup_text()` and `_load_all_for_dedup()` that include every
lesson regardless of status (annotated with the real status), and
points the two prefilter call sites at the new function. The
accepted-only `render_visible_lessons_md` is unchanged so agent context
keeps the same trust boundary.

Refs: HIGH_PRIORITY_BUG_REPORT.md (High)
`_write_entries()` did a direct truncate-and-rewrite on
AGENT_LEARNINGS.jsonl, so a crash, disk-full, or concurrent hook append
during run_dream_cycle could lose the entire log. Now snapshots prior
state to `.bak`, writes to `.tmp`, fsyncs, then `os.replace`s
atomically. Cleans up `.tmp` on failure with original file intact.
`_load_entries(report_malformed=True)` surfaces bad-line counts via
stderr from `run_dream_cycle` so corruption isn't silent.

Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
`stage()` deterministically computed the candidate id and wrote a fresh
record with `rejection_count: 0` and an empty decisions list — so
re-teaching a previously-rejected candidate erased its rejection
history and made churn look novel.

Now `_find_prior` checks candidates/, candidates/rejected/, and
candidates/graduated/. If a non-provisional graduated record exists,
re-staging refuses with exit 3. Otherwise the new record preserves
`rejection_count`, `staged_at`, and the prior decisions list,
appending a fresh `staged` or `re-staged` entry. The old rejected
copy is removed once the new staged file lands so the candidate
lives in exactly one location.

Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
Auto-promoted candidates were written via direct `open(path, "w")`,
so an interruption mid-write left a partial file that the listing
loop silently skipped. Adds `_atomic_write_json()` helper using
`open(path+".tmp","w")` -> flush -> fsync -> `os.replace(tmp, path)`,
with a try/except cleanup of the temp file on failure. The single
existing JSON write at line 188 now goes through it.

Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
`render_lessons()` acquired the lock via `_locked_jsonl()` then called
`load_lessons()`, which opened the same path on a separate UNLOCKED
descriptor — so concurrent appends could produce torn reads despite
the comment claiming the read-render-write cycle was locked.

`load_lessons()` now accepts an optional keyword-only `fp=` argument
that reads through a caller-provided locked descriptor. `render_lessons()`
binds the locked fp from `_locked_jsonl()` and passes it in. Existing
positional callers are unaffected.

Refs: HIGH_PRIORITY_BUG_REPORT.md (High)
`post_execution.py` and `on_failure.py` both did raw `open(...,"a").write(json.dumps(...))`
into AGENT_LEARNINGS.jsonl, so parallel hook invocations could
interleave writes. The dream-cycle rewrite path also raced.

Adds `append_episodic_entry()` to `_provenance.py` that takes
`fcntl.flock(LOCK_EX)` on the open fd, writes the JSON line, flushes
+ fsyncs, releases on context exit. Both hooks now go through it.
Documents the residual locking-model gap with auto_dream.py's atomic
rename rewrite (different mechanisms; worst case is a single lost
entry written between snapshot read and rename — acceptable).

Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
`claim_next_job()` removed jobs whose JSON failed to parse, silently
losing partial writes or manually corrupted entries with no diagnostic
artifact.

Now moves the file from `running/` to `failed/<job>.json` via
`os.replace` (atomic) and writes a `<job>.json.error.json` sidecar
containing the parse error, UTC ISO timestamp, and the original
queued/ path. Stderr warning emitted so callers/operators can find
the quarantine.

Refs: HIGH_PRIORITY_BUG_REPORT.md (P2)
- adapter installed: require all listed files (was passing if any
  existed; caused false positives in `verify` for opencode/pi/etc.)
- doctor --json: preserve non-zero exit code on failed checks (JSON
  path was always returning 0, masking failures in CI)
- tui glyphs: swap PASS/WARN/FAIL text labels for ✓/!/✗ glyphs in
  curses + plain modes; encoding-aware fallback to +/!/x on non-UTF-8
  terminals (PYTHONIOENCODING=ascii, LANG=C)
- gitignore: exclude `.agent/memory/**/*.bak` runtime backups
- tests: 6 new regression checks (27/27 passing) covering opencode
  partial install, hermes single-file, doctor --json broken-project
  exit code, and glyph fallback across encodings
Integrates 72 commits from master (v0.13.0..v0.14.0 + post-tag work)
into the trust console branch, then resolves 11 file conflicts.

Resolutions:
- install.sh / install.ps1: took master (rewrote to thin Python dispatcher;
  feature's bash flag-parsing fixes are obsoleted).
- Formula/agentic-stack.rb: combined master's harness_manager+scripts+
  transfer test with feature's agentic_stack_cli.py + runtime.py + no-path
  test. Wrapper still delegates to install.sh; trust console CLI is
  installed alongside but not the bin entrypoint (follow-up: integrate
  trust commands into harness_manager.cli).
- README.md, CHANGELOG.md: combined entries from both sides.
- .gitignore: combined; .bak exclusion added under master's structure.
- .agent/tools/learn.py: kept feature's prior-record merge logic, adopted
  master's UTC timestamp.
- .agent/tools/skill_loader.py: kept both feature's _SAFE_NAME_RE
  containment check and master's skill_enabled() guard.
- .agent/harness/hooks/on_failure.py, post_execution.py: took master
  (uses _episodic_io.append_jsonl; feature's _provenance.append_episodic_entry
  is now redundant).
- .agent/memory/auto_dream.py: took master (flock-based atomic writes
  supersede feature's tempfile+.bak approach).

Verified post-merge: 27/27 regression checks pass; doctor, tui --plain,
verify all exit 0.

Known follow-ups (defer to post-merge):
- Formula version/sha bump for v0.15.0 release tag (P1 from codex review).
- Wire trust console commands into harness_manager.cli or update bin
  wrapper so `agentic-stack doctor` resolves to the trust console CLI.
@codejunkie99 codejunkie99 merged commit cebe245 into master May 5, 2026
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant