Skip to content

feat(plugins): Pianola manager-agent + plugin system + MCP tool bridge (beta)#1139

Draft
jSydorowicz21 wants to merge 66 commits into
rcfrom
feat/autonomous-manager-agent
Draft

feat(plugins): Pianola manager-agent + plugin system + MCP tool bridge (beta)#1139
jSydorowicz21 wants to merge 66 commits into
rcfrom
feat/autonomous-manager-agent

Conversation

@jSydorowicz21

@jSydorowicz21 jSydorowicz21 commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Beta / visibility draft

Opening as a draft for visibility — the autonomous-manager-agent feature is behind the off-by-default plugins / pianola Encore flags. The plugin authorization gate + UI surface is now complete and verified (see Update below), but the PR stays a draft pending team review and a rebase onto rc (currently merge-DIRTY); it is not proposed for immediate merge. Sharing the full diff + the audit so the team can see the shape and weigh in.

Update — authorization gate + full UI surface + security locks (since the description above)

Since this description was written, the plugin authorization spine and UI capability surface landed and were verified, and the two highest-risk capabilities are now locked as regression-guarded deferrals:

  • Auth gate (Phase A) — CDP-verified end-to-end. Isolated consent minter, host-owned consent window, sealed authorization ledger as the live grant source, refresh-time re-authorization + central tamper exclusion. Verified: unconsented enable rejected; consent mints the approved subset + enables; getGrants reads the sealed ledger; revoke drops + disables.
  • UI capabilities (Phases B/C/D) + HOST_API 1.4.0. Per-capability consent UI; declarative UI-slot render host with per-plugin error isolation; all panel placements wired/rendering/tested; panel self-navigation egress is now FIXED (bridge origin-gating + will-frame-navigate subframe backstop + unit tests) — it was a follow-up below, now resolved.
  • Highest-risk caps stay inert + regression-locked (NOT implemented — deferred to the Phase-3 OS sandbox):
    • agents:dispatch / process:spawn (arbitrary-code-execution grade): the handler factory omits both verbs, and an AST source-guard asserts the live buildHostCallHandlers({…}) call in index.ts passes no dispatch/spawn dep — wiring either fails CI.
    • ui:render-unsafe (highest-risk render escape hatch): declared, consent-gated, renders nothing; a gate test proves holding only it still yields uiItems=[]/panels=[], so it can't silently unlock host-rendered surfaces.

tsc ×3 + ESLint + Prettier clean; affected suites green.

What's in it (all Encore-gated, off by default)

  • Pianola — autonomous manager-agent: watches tabs, classifies awaiting-input prompts, auto-answers/escalates per user rules; task-DAG orchestrator, supervised watcher daemon, and a learn→synthesize re-learn loop.
  • Plugin system — tier-0 data + tier-1 sandboxed-compute/UI plugins: hand-rolled manifest/permission/signature contracts, a default-deny permission broker, a utilityProcess sandbox, net-egress SSRF guard, an ed25519 signing CLI + @maestro/plugin-sdk authoring package.
  • MCP tool bridge (Raychaser/claude path 01 #3) — exposes registered plugin tools to a spawned agent's model over MCP (maestro-cli mcp serve); every model-initiated call is risk-gated before the broker runs it.

~220 files vs rc.

Audit (this PR)

Greptile-5/5-style audit over the full diff: 4 reviewer subagents (MCP, plugin system, Pianola, integration/SDK) + 2 GPT-5.5 codex read-only passes (MCP bridge, plugin security core). Triaged every finding; fixed the real ones:

Critical / High

  • Skip MCP injection on the claude interactive (maestro-p) spawn path — it was prepending --mcp-config before the script path and breaking the launch.
  • tools/call can no longer reach an arbitrary plugin command handler: the bridge rejects unmapped names and the app validates the toolId is a declared tool.
  • plugins:set-grants now enforces the transcripts:read + egress mutual-exclusion at the consent boundary (was UI/runtime only — a direct IPC call could persist the conflicting grant).
  • Unified plugin sign/pack/host-verify on one exclusion policy (the scaffold's own sign→pack flow produced archives that failed host verification).

Correctness

  • Risk gate rates the tool name + description + args, not the stable toolId slug (avoids false-negatives and permanently-blocking benign tools like deploy/publish).
  • Per-bucket contribution-id uniqueness (a tool sharing a command's localId was silently dropped).
  • Pianola: full-turn risk rating (cross-message high-risk bypass), failed-auto-answer rehydrate fix, orchestrate try/catch, decision-log per-record byte cap.
  • getPanelHtml realpath containment; drop --strict-mcp-config; forward data-dir env to the bridge; -32700 on malformed MCP frames; SDK SPDX license.

+28 tests. tsc ×3 clean · ESLint + Prettier clean · SDK drift intact · affected suites green.

MCP auto-injection status (honest)

  • Live: claude API-mode + codex (verified, ephemeral, no global config mutation).
  • Built but gated (verified:false, not auto-injected): opencode (its OPENCODE_CONFIG vs OPENCODE_CONFIG_CONTENT merge needs a live check or it could re-enable prompt-hangs); best-guess adapters for gemini/qwen/copilot/droid/hermes/pi.
  • Not yet: claude interactive (maestro-p) — needs arg-forwarding through the wrapper.

Known follow-ups (not blocking the draft)

  • Per-tool risk metadata + a user-approval path for HIGH tool calls (today HIGH is a hard block).
  • Tier-1 vm realm-escape — the accepted Phase-3 OS-sandbox decision.
  • Pre-existing Windows-only unit failures (POSIX-path / symlink test assumptions) unrelated to this work.

Pianola is a new Encore-gated autonomous manager agent that watches agent
tabs, detects when an agent is awaiting the user, classifies the ask and its
risk, and either auto-answers low-risk prompts from the user's rules or
escalates uncertain/high-risk ones.

Foundation (steps 0-1 of Plans/pianola-implementation-plan.md):

- Encore flag `pianola`, off by default, end-to-end: EncoreFeatureFlags type,
  DEFAULT_ENCORE_FEATURES and settingsMetadata defaults, CLI encore FEATURES +
  aliases (auto-pilot/autopilot/pilot/manager), and a Settings -> Encore toggle.
- Shared contracts in src/shared/pianola/types.ts.
- Pure, I/O-free classifier: prefers a structured AwaitingInputSignal, falls
  back to conservative heuristics, rates risk via a focused lexicon.
- Pure, safety-first policy engine: high-risk always escalates; auto-answer
  only on an explicit matching rule; default is escalate.
- 31 classifier/policy unit tests, plus CLI alias/default coverage.

Nothing consumes the flag yet, so the feature is fully inert when off.
Plans/ captures the verified codebase investigation and the build plan.
Addresses all findings from the codex review of the foundation:

- CRITICAL: high-risk prompts now always escalate before any rule action, so a
  broad `ignore` rule can no longer suppress the most important alerts. Added a
  regression test.
- Risk lexicon extracted into pianola-risk.ts with word-boundary regexes instead
  of raw substrings, fixing false positives (`auth` in `author`, `token` in
  `tokenizer`) and adding outward-facing/destructive coverage (push/publish/
  release/merge PR/deploy/email/.env/private key/payment). Boundary tests added.
- Empty-match `auto_answer` rules now escalate (an auto-answer rule must narrow
  its scope), preventing an accidental catch-all from auto-answering everything.
- Numbered-choice detection (1) .. 2) ..) implemented to match the documented
  behavior; trailing-question-mark check tightened to end-of-message only.
- Project/tab rule scoping normalizes path separators and casing for
  cross-platform robustness.
- PianolaDecision is now a discriminated union so `answer` only exists on an
  auto_answer decision.
- Added a compile-time drift guard tying PianolaMessage to the web-server
  SessionHistoryMessage contract.

Typecheck clean; 58 Pianola/encore tests pass.
Step 2 of the Pianola plan, realized as a pure detector module rather than
parser hot-path surgery (more maintainable; no parser/IPC/WebSocket contract
changes, and the watcher derives the signal from session-show output it already
has).

- pianola-awaiting-detector.ts: detectAwaitingInput(content) returns a typed
  AwaitingInputSignal (precedence plan_review > permission > choice > question)
  with extracted options (numbered and slash/bracket forms); enrichWithAwaiting
  Input(messages) fills it onto assistant turns immutably before classification.
- The classifier already treats a present signal as authoritative (high
  confidence), so structured prompts now upgrade from heuristic to structured.
- Tests cover each kind, option extraction, null cases, immutability, and the
  detector+classifier integration.

Typecheck clean; 64 Pianola tests pass.
Steps 3-4 of the plan; Pianola is now runnable end-to-end from the CLI.

Layering: moved the pure brain (classifier, policy, risk, awaiting-detector)
from src/main/pianola to src/shared/pianola, since both the CLI watcher and the
future in-app engine consume it. Tests moved alongside.

Storage:
- shared/pianola/storage.ts: filenames, the PianolaDecisionRecord type, and a
  pure rule validator (drops malformed rules so a bad hand-edit cannot break the
  engine).
- cli/services/pianola-store.ts: reads the rules file and appends to the JSON
  Lines decision audit log in the Maestro config dir (shared with the desktop,
  no native deps).

Watcher + CLI:
- shared/pianola/pianola-watcher.ts: runWatchIteration ties enrich -> classify
  -> decide -> dispatch/record with all I/O injected, so it is fully unit tested
  and reusable by a future engine. Dedups by last-handled assistant message id.
- cli/commands/pianola.ts: `pianola watch <tab>` polls get_session_history and
  acts per rules (--dry-run, --interval, --once, --agent); `pianola rules` and
  `pianola log` are read views. Every command hard-gates on the pianola Encore
  flag.

Typecheck clean across all configs; 108 Pianola/encore tests pass.
Addresses the runnable-MVP review findings:

- HIGH audit-before-dispatch: the watcher records the decision BEFORE sending a
  message (intent record), then appends the dispatch outcome under the same id.
  A pre-dispatch audit-write failure now fails closed (no message sent).
- HIGH bounded retry: a failed dispatch no longer advances the dedup cursor, so
  the prompt is retried up to MAX_DISPATCH_ATTEMPTS, then Pianola gives up instead
  of either looping forever or skipping it forever.
- HIGH injection hardening: option extraction is strict (clean short tokens; no
  file paths, markdown links, version numbers) and a `choice` requires real
  question/choice context, so prose cannot be shaped into an auto-answerable
  choice. The classifier reuses the same option/asking helpers (one source).
- MEDIUM decision records are validated and folded by id on read, so a malformed
  JSONL line cannot crash `pianola log`.
- MEDIUM projectPath is threaded through the get_session_history result so
  project-scoped rules work in the CLI watcher.
- MEDIUM the watch loop now contains iteration errors (logs and continues).
- LOW a malformed rules file surfaces a one-line warning instead of silently
  disabling all rules.
- Added watcher safety tests (audit ordering, retry/give-up, fail-closed),
  detector false-positive fixtures, store folding/validation tests, and CLI
  watch tests (mocked client + dispatch).

Typecheck clean across all configs; 123 Pianola/encore tests pass.
Adds the single pinned Pianola agent at the top of the Left Bar, gated by
the pianola Encore flag. Pianola is a real claude-code-backed agent so it
chats through the existing agent system. It is excluded from normal
categories and cannot be renamed, duplicated, bookmarked, moved to a
group, or deleted.

- types: isPianola flag on Session
- usePianolaAgent: one-time creation after sessions hydrate, flag-gated
- useSessionCategories: exclude Pianola from all categories
- SessionList: pinned Pianola section above Starred
- SessionContextMenu: guard mutating actions for Pianola
…(v2 L2-4)

Turns the pinned Pianola agent into Maestro's orchestrator using the
existing, tested CLI surface (the chosen action layer over MCP).

- CLI: new `maestro-cli pianola add-rule` so Pianola turns a conversation
  into a durable watcher rule; backed by writePianolaRules (validated,
  atomic) in the CLI store.
- Prompt: src/prompts/pianola-system.md (identity, exact CLI invocations,
  task-dump orchestration, and the Hybrid confirmation discipline -
  auto for read/observe/watch, confirm before spawning or production sends).
  Registered in promptDefinitions (CORE_PROMPTS, PROMPT_IDS, quick actions).
- Spawn: prepareMaestroSystemPrompt appends the Pianola prompt for the
  isPianola agent; process.ts injects MAESTRO_CLI_JS + MAESTRO_AGENT_ID env
  so Pianola's Bash can reach the CLI with no PATH assumptions. Reuses the
  existing maestro-cli path resolver (now exported).
Phase 1 of learning-from-history, plus the earlier fixes for the broken
MAESTRO_CLI_JS path and the list-agents command name.

- shared/pianola/transcript-mining.ts (pure, 23 unit tests): per-format line
  parsers (Claude Code + Codex), decision-pair extraction reusing
  enrichWithAwaitingInput + classifyMessages, reply-polarity heuristic, aggregation.
- cli pianola learn: crawls ~/.claude/projects + ~/.codex/sessions into a labeled
  decision corpus (JSON or --out file). Verified: 154 pairs from real history.
- logger: diagnostic notices to stderr so they no longer corrupt CLI --json output.
- cue-cli-executor: add correct ../../cli candidate so MAESTRO_CLI_JS resolves in
  the preserved dev layout (also fixes Cue's latent dev path).
- pianola-system prompt: use 'list agents' (real subcommand), add self-correction
  guidance to run --help and not substitute a stale CLI.
Three fixes to the transcript miner so Phase 2 gets a cleaner, fuller corpus:

1. Scoping flags on pianola learn: --since <date> (filter by file mtime),
   --project <substr> and --exclude <substr> (filter by session cwd). Lets the
   user target representative history and drop noise.
2. Drop topic-based aggregation (the classifier's topic is a per-ask snippet, not
   a stable category - it produced misleading topTopics). Replaced with a
   risk x polarity cross-tab, which is genuinely meaningful.
3. Widen mining recall: trigger extraction on the classifier itself (structured
   signal OR heuristic asks) instead of double-gating on the strict structured
   detector. Captures prose-style asks the detector missed. Live watcher is
   untouched. Result on real data: 154 -> 287 pairs, 20 -> 29 sessions.

24 unit tests (added a recall test for heuristic prose asks).
Add learned decision profiles so Pianola can recall how the user
decides in a given project and judge low/medium-risk agent asks the
way they would, escalating high-risk regardless.

- storage contract: PianolaProfileEntry/PianolaProfiles types,
  validatePianolaProfileEntry/validatePianolaProfiles, resolveProfile
  (project profile, else global fallback, else none), with a
  PIANOLA_PROFILE_MAX_CHARS guard on profile size
- CLI store: read/write/get/set profile helpers with atomic writes
- CLI commands: pianola profile (read) and pianola set-profile
  (write from --file or piped stdin, optional --pair-count)
- pianola-system prompt: learning/onboarding flow describing how to
  crawl history, synthesize a profile, get user sign-off, and recall
  it at decision time
- tests: profile entry/collection validators and resolveProfile
When a watched agent hits an ask that no rule covers and that is not
high risk, the watcher now hands the decision to Pianola (an LLM agent)
to judge against the user's learned per-project profile instead of
escalating straight to the user. High risk always escalates; dry runs
never hand off; and the watcher stays purely rule-driven when no
Pianola agent is available to hand off to.

- watcher: optional resolveProfile + requestJudgment deps; a handoff
  branch that audits intent before the side effect (mirroring the
  auto-answer invariant), records the escalation, and marks the prompt
  handled so it is not re-handed-off
- CLI: wires the handoff deps only when MAESTRO_AGENT_ID is set, builds
  a structured handoff prompt (waiting agent, ask, profile, how to
  answer or escalate), and guards against handing back to Pianola itself
- prompt: tells Pianola what a decision handoff is and how to act on it,
  and to offer a rule when it keeps making the same call
- tests: handoff fires/skips correctly across risk, profile presence,
  rule coverage, dry-run, and missing deps; audit-before-side-effect
The Pianola section drew a bold uppercase category header plus an
accent-bordered box around the single manager agent (using the
ungrouped variant, which sits flush with no horizontal margin), so one
pinned agent read like a whole category and looked misaligned next to
Starred/Ungrouped.

Render it instead as a single flat row (normal row styling) with a pin
marker and a divider below, so it reads as "the manager, pinned" rather
than a section. The pin marker is shown for any isPianola session.
Give Pianola two pinned, non-closable views in its workspace - a
Dashboard and its Chat - in place of the normal file/terminal/browser
tab bar (Pianola is a manager surface, not a coding workspace). The
Dashboard is the default view and badges a live count of agents waiting
on the user.

The dashboard combines two live signals - desktop session states
(busy / waiting_input / idle) and Pianola's decision audit log
(escalations, handoffs, auto-answers) - into four sections: needs your
input, working now, recently done, and recent decisions. Rows jump to
the owning agent on click. The decision data flows through the existing
pianola.getDecisions IPC; the bucket derivation is a pure, tested
function.

- uiStore: pianolaView ('chat' | 'dashboard') + setter, default dashboard
- PianolaDashboard: view, data hook (live + polled), pure deriveDashboard
- PianolaWorkspaceTabs: the pinned Dashboard|Chat strip
- MainPanel: render the strip + swap content for Pianola
- tests: deriveDashboard bucket mapping
Captures the multi-agent audit: Pianola is a strong supervisor but not
an orchestrator (4/10), Maestro is not plugin-ready (3/10), the Sprint 0
safety bundle, Track A orchestrator spine, and the tiered Track B plugin
system roadmap, plus the resolved scope (full orchestrator, desktop
daemon, full plugin SDK).
Closes four holes that undermined the human-in-the-loop premise of the
autonomous watcher:

- Durable watch-state rehydration: rehydrateWatchState(records, tabId)
  folds the audit log to seed lastHandledMessageId before the poll loop,
  so a restarted watcher no longer re-answers a prompt it already
  handled (the prior code recreated state fresh every start).
- Watcher self-stop on Encore revoke: re-read encoreFeatures.pianola
  each poll and break, so toggling Pianola off in Settings actually
  halts in-flight autonomous answering instead of running until killed.
- Proactive escalation notifications: optional deps.notify on the pure
  watcher; the CLI fires a notify_toast (jump-session click, sourceAgent
  Pianola, sticky+red for high-risk) on escalate, handoff failure, and
  handoff timeout, so blocking asks reach the user, not just a badge.
- Handoff-failure fallback + timeout: a failed handoff no longer drops
  the ask; it falls back to an audited user escalation + notify. A
  successful handoff is tracked as pendingHandoff and, if Pianola never
  answers within HANDOFF_TIMEOUT_POLLS, re-escalates to the user.

Watcher suite: 27 tests pass. CLI + main typecheck clean; CLI rebuilt.
The pure foundation the multi-agent coordinator consumes. Nothing in
Pianola previously represented a task, a dependency, or an ordering, so
a task-dump fired everything at once against incomplete upstream state.

- pianola-tasks.ts (pure, immutable): PianolaTask/PianolaPlan,
  findPlanCycle (DFS, reports the cycle), validatePlan (shape + unknown/
  self/duplicate deps + cycle detection -> errors, null plan on fatal),
  computeReadyTasks (pending with all deps done), markTaskStatus,
  propagateBlocked (fixed-point cascade from failed/skipped upstream),
  planProgress.
- storage.ts: PIANOLA_PLANS_FILENAME + PianolaPlansFile +
  validatePianolaPlansFile (drops malformed plans).
- CLI + desktop stores: read/write/get/upsert plans (atomic temp+rename).

37 tests pass; cli + main typecheck clean.
The pure trigger the orchestrator uses to advance the DAG: detects when
a dispatched task finished or failed so dependents can start and botched
tasks are noticed. Previously the classifier only emitted question|
blocked|none, so there was no done/failed signal at all.

- pianola-completion-detector.ts (pure): detectTaskOutcome({previousState,
  currentState, recentMessages}) -> done|failed|working with a reason.
  done = working-state -> idle transition with no failure marker; failed
  = error state or a failure marker in the latest message; waiting_input
  stays working (the watcher owns the ask). Failure lexicon ported from
  parsers/error-patterns.ts; success heuristic mirrors cue-completion.
- hasFailureMarker + FAILURE_MARKER_PATTERNS exported for reuse.

24 tests pass; cli + main typecheck clean.
The coordinator that drives a task DAG to completion with concurrency
control. Sibling of the watcher: pure, dependency-injected, looped by a
CLI/daemon shell.

- pianola-orchestrator.ts: runOrchestratorIteration(state, deps,
  {concurrencyLimit}) polls running tasks via detectTaskOutcome, marks
  done/failed, propagateBlocked, then dispatches up to the concurrency
  limit from computeReadyTasks (ensure/reuse agent -> dispatch -> mark
  running). Failures leave tasks pending to retry without consuming a
  slot. Persists the plan each iteration; done flips when all tasks are
  terminal or blocked.
- Seeds prevStates='connecting' on dispatch so a fast task's first
  idle poll resolves as done instead of hanging.

15 tests pass; cli + main typecheck clean.
The I/O shell that runs the pure orchestration engine end to end.

- pianola plan set/list/show: author a task DAG as JSON (validated:
  cycles, unknown/self deps, and bad shape rejected), inspect plans and
  per-task status/progress.
- pianola orchestrate <planId>: loops runOrchestratorIteration against
  the live desktop. Deps wired to real WS contracts - getRunState from
  list_desktop_sessions (2s memo), getRecentMessages from
  get_session_history, ensureAgent via create_session (reuse agentId or
  create by agentType), dispatch via runDispatch, persist via
  upsertPianolaPlan, task_failed toasts. --interval/--concurrency/--once.
  Mirrors the watcher loop: SIGINT, per-iteration Encore self-stop,
  finally-disconnect.
- pianola-system.md: "Orchestrating a task DAG" section teaching Pianola
  to author a plan for interdependent tasks vs the dump-and-babysit flow
  for independent ones.

cli + main typecheck clean; CLI builds; plan/orchestrate help verified.
Adds PianolaSupervisor, a main-process registry that owns Pianola's
long-running watch and orchestrate processes as managed children:

- Reconciles desired targets from the shared supervisor store and spawns
  supervised child CLI processes (pianola watch/orchestrate) via
  child_process, replacing orphaned nohup backgrounding.
- Bounded-backoff restart on crash, health tracking, and fs.watch-driven
  reconcile so adds/removes/enable/disable take effect live.
- Encore-gated on pianola; relaunches active targets on app start and
  tears them down on shutdown (lifecycle wired in main/index.ts).
- Shared supervisor store (CLI + main read/write the same files):
  PIANOLA_SUPERVISOR_FILENAME + PianolaSupervisedTarget +
  validatePianolaSupervisorFile in shared storage; upsert/remove/list in
  both pianola-store (CLI) and pianola-store-main (desktop).
- IPC: supervisor list/add/set-enabled/remove handlers + preload
  namespace + global.d.ts typings.
- CLI: pianola supervise watch|orchestrate|list|remove|enable|disable.
- Prompt: prefer `pianola supervise` over nohup for watch/orchestrate.
- 9 supervisor-storage tests.
Stands up the permanent, semver-managed contract every later plugin tier
builds on. Inert by default: gated behind a new `plugins` Encore flag,
list-only, and no plugin code executes yet.

Pure, bundle-safe contract (src/shared/plugins):
- host-api.ts: HOST_API_VERSION + isHostApiCompatible (same-major, host
  >= declared minHostApi; strict on malformed).
- plugin-manifest.ts: the frozen plugin.json shape + hand-rolled
  validator (matches the pianola storage convention; no new dep), with a
  strict plugin-id pattern. Host incompatibility is surfaced as a status,
  not a validation error, so incompatible plugins stay visible.
- plugin-registry.ts: pure, immutable registry (buildRecord, upsert,
  remove, setEnabled refusing non-ok plugins, listActive, toEnableState).
- storage.ts: versioned plugin state file + a generic forward migration
  runner (electron-store has no schema versioning) seeded with a v0
  bare-boolean -> v1 migration.

Main process (src/main/plugins):
- plugin-store-main.ts: atomic temp+rename writes, ENOENT-as-empty,
  validation at the persistence boundary, safe-folder-name guard.
- plugin-manager.ts: discover plugin dirs, validate manifests, derive the
  registry, persist enable toggles, install (copy dir by manifest id)
  and uninstall (dir removal guarded to the plugins dir). Self-gates on
  the Encore flag.

Wiring:
- IPC plugins:list/set-enabled/install/uninstall (handler-level Encore
  gate that throws PluginsDisabled, mirroring pianola).
- window.maestro.plugins preload API + global.d.ts typings.
- main/index.ts constructs the manager, refreshes on boot, broadcasts
  plugins:changed; registered in both production and the aggregate
  registrar.
- `plugins` Encore flag added to the type, both default literals, the
  settings metadata, an EncoreTab toggle card, and searchableSettings.

34 new tests (host-api, manifest, registry, storage/migrations).
…Phase 1)

Makes plugins real (data-only): a plugin can declare themes, prompts,
settings, and command macros, and the user can install/manage them.

- contributions.ts (pure, bundle-safe): typed Tier 0 contribution shapes
  + per-type validators that drop bad items with a reason rather than
  failing the whole plugin. Ids are namespaced `<pluginId>/<localId>` so
  two plugins never collide and every contribution traces to its source.
  collectContributions (one plugin) and aggregateContributions (across
  active plugins, with per-plugin error buckets and duplicate-id guard).
- PluginManager.getContributions() aggregates contributions from active,
  loadable plugins; empty when the Encore flag is off.
- IPC plugins:contributions (the read seam host registries will consume
  plugin data from) + window.maestro.plugins.contributions() + typings.
- PluginsPanel: Settings -> Encore -> Plugins management UI. Lists
  installed plugins with status badges (enabled/disabled/invalid/
  incompatible), enable/disable toggle, install-from-folder (folder
  picker), and uninstall, with toast feedback. Rendered inside the
  Plugins Encore card when the flag is on.

7 new contribution tests.
… (Track B Phase 3)

Lets a plugin run code, confined and capability-scoped. Security model:
signature trust + explicit install-time consent gate WHICH code runs; a
default-deny permission broker gates WHAT it can do; a utilityProcess +
vm context limit blast radius. Still tier-gated and Encore-gated; the
highest-risk host methods (agents.dispatch, process.spawn) are left
unwired pending a dedicated review.

Pure, bundle-safe contracts (src/shared/plugins):
- permissions.ts: fixed capability vocabulary, risk tiers, manifest
  permission parsing (unknown caps rejected, never degraded to allow-all),
  and isPermitted - default-deny matcher with path-prefix (separator
  boundary) and host-suffix (dot boundary) scope checks.
- rpc-protocol.ts: the frozen host<->sandbox message shapes, the
  method->capability map, and target extraction (never throws).
- signing.ts: signature.json shape, deterministic signing payload
  (sorted path:hash, excludes the signature file), and trust helpers.
- manifest: tier-gated `entry` (relative, traversal-checked) + validated
  `permissions`; tier 0 forbidden from declaring either.

Main process (src/main/plugins):
- plugin-signature.ts: ed25519 verify over the canonical payload, with an
  EXACT file-set check so adding/removing/altering any file invalidates
  it; resolves unsigned/invalid/untrusted/trusted.
- permission-broker.ts: the single authorization gate; re-reads grants per
  call (live revocation), audits every decision.
- plugin-sandbox-entry.ts: child bootstrap that runs plugin code in a vm
  context with a frozen minimal global (no require/process/Buffer); all
  effects go through broker-gated RPC. Documents that vm is defense-in-
  depth, not the primary boundary.
- plugin-sandbox-host.ts: forks one utilityProcess per running plugin,
  validates+size-caps every message, broker-authorizes, executes via
  injected handlers, graceful-then-hard teardown, crash isolation.
- plugin-host-handlers.ts: fs/net(size-capped)/settings(secret-redacted)/
  agents.list/notifications.toast implementations; dispatch/spawn injected.
- grants store (atomic, validated) + manager integration: verify on
  discovery, default tier-1 to DISABLED (enable = consent), reconcile
  sandboxes on enable/disable/uninstall, forget grants on uninstall.

UI + IPC:
- PluginConsentDialog: lists requested capabilities (risk-colored, scoped,
  with reason) + signature trust; user approves a subset. Tampered
  (invalid-signature) plugins cannot be enabled.
- plugins:get-grants/set-grants(subset-of-requested only)/revoke-grants
  IPC + preload + typings; signature badge in the panel.
- main/index.ts wires broker + sandbox host + handlers + trusted keys into
  the manager and tears sandboxes down on quit.

29 new tests (permissions, rpc, signing incl. real ed25519 tamper/forge/
file-add/remove detection, broker isolation + live revocation). 82 plugin
tests total.
Addresses the CRITICAL/HIGH findings from the Phase 3 security review:

- CRITICAL: broker path-scope bypass via `..`. normalizePath now collapses
  `.`/`..` (leading `..` on absolute paths dropped) so `/scope/../../etc`
  no longer prefix-matches `/scope`. (permissions.ts)
- HIGH: symlink/realpath escape after authorization. fs.read/fs.write now
  resolve the real path (symlinks + deepest existing ancestor) and
  RE-authorize it against the grant before touching disk. (host-handlers)
- HIGH: SSRF via redirect. net.fetch forces redirect:'error' and
  allowlists init (method/body/headers) so a 3xx to a non-granted host
  (metadata/localhost) cannot be followed. (host-handlers)
- HIGH: vm realm escape surface. The sandbox context no longer receives
  host intrinsics (Object/Array/Promise/URL/...) - V8 supplies isolated
  context-native ones - and timers are wrapped, closing prototype
  pollution of the host and `URL.constructor('return process')` escapes.
  Threat-model comment corrected (an escape is NOT broker-firewalled).
- HIGH: signed plugins could ship unsigned symlinks. Signature
  verification now FAILS (invalid) on any symlink in the tree, and install
  refuses a source containing symlinks.
- MEDIUM/LOW: fs.read size cap; expanded secret-key denylist for
  settings.get; unscoped-net consent warning; Windows reserved folder
  names rejected; per-plugin RPC rate limit + in-flight cap.

Adds traversal + symlink tests (85 plugin tests). Residual caveats logged
in Plans/plugin-build-review.md.
… (Track B Phase 2)

A plugin can declare scheduled triggers (interval or daily clock times)
that fire without touching the per-project Cue engine. Fired by a
supervised, encore-gated scheduler reusing the Pianola-supervisor
lifecycle pattern (managed, self-gating per tick, torn down on quit).

- contributions.ts: new `cueTriggers` contribution type + validator
  (interval everyMinutes >= 1, or dailyTimes as HH:MM; action notify or
  dispatch; dispatch requires agentId). Invalid times are dropped, bad
  triggers rejected with a reason. Added to collect/aggregate.
- plugin-scheduler.ts (pure): computeDueTriggers - intervals seed on
  first observation (no startup thundering herd) then fire on elapse;
  daily times fire at most once per matching clock-minute per day; state
  for removed triggers is dropped so it cannot grow unbounded.
- plugin-scheduler-host.ts (main): poll loop, self-gates on the plugins
  flag, notify -> toast. dispatch is honored only when an implementation
  is injected (agents:dispatch is not wired yet); otherwise skipped with
  a log line, never silently dropped.
- main/index.ts: construct + start the scheduler at boot, stop on quit.

Deeper Cue-engine integration (file/agent-completion event triggers,
prompt-dispatch actions) is deferred and logged in
Plans/plugin-build-review.md - the Cue engine is per-project-yaml and
flagged complex; global plugin triggers do not fit it without surgery.

17 new tests (scheduler timing + cueTrigger parsing).
…ion lock (#1)

#5 Marketplace core (local, OS-agnostic):
- PluginManager.update(sourceDir): version-aware in-place update of an installed
  plugin - validates the source manifest, requires a strictly-higher semver
  (rejects downgrade/equal), rejects symlinks, stops the sandbox, then swaps the
  plugin dir atomically (staging dir inside pluginsDir + rename + rollback on
  failure). Preserves the enable toggle but re-verifies signature/validation
  from the NEW bytes (old trust is never carried forward). Gated plugins:update
  IPC channel + preload/global bridge. A hosted registry stays out of scope.

#1 Sandbox hardening (portable limit + regression lock):
- The OS-agnostic isolation is already at its limit: curated globals (no
  require/process/Buffer/globalThis), codeGeneration disabled, and the child
  utilityProcess forked with env:{} (no inherited secrets). Added a regression
  test locking the empty-child-env fork. The residual vm realm-escape is
  inherent to the in-process bridge; true closure is the per-OS Phase-3 sandbox,
  out of the OS-agnostic scope by design (already documented in the entry file).

Verification: tsc (lint/main/cli) clean; ESLint + Prettier clean; affected
plugin suites green (109 tests, incl. update = 5 and isolation = 1).
#3 Agent-tool contribution + brokered invoke:
- New `tools` contribution type. A tool is invokable with a RESULT via a
  correlated request/response round-trip to the sandbox (invokeTool control
  message + toolResult reply, 30s timeout, bounded pending map, rejects on early
  child exit). maestro.tools.register, plugin-manager.invokeTool, a gated
  plugins:invoke-tool IPC channel + preload/global bridge. Exposing contributed
  tools to a specific agent's model is the documented follow-up.

#4 Plugin event topics wired:
- The catalog topics that never fired now have real metadata-only emit sites:
  session.created / session.removed / agent.statusChanged / agent.awaiting (via
  a pure, tested session-lifecycle differ on persist) and cue.fired (via a
  cue-dispatch DI seam). Only session.updated fired before. Payloads stay
  ids/status/type only - the metadata-only event contract is intact.
- New `keybindings` contribution type (its command is validated as a
  plugin-local id, so it cannot target built-in/other commands). Parsed,
  aggregated, and discoverable via plugins:contributions; chord-binding
  consumption is the documented follow-up.

Bumps HOST_API_VERSION 1.2.0 -> 1.3.0 (new contribution points); re-vendored
@maestro/plugin-sdk (types + version + maestro.tools) with its drift guard;
doc host-API references updated to 1.3.0.

Verification: tsc (lint/main/cli) clean; ESLint + Prettier clean; affected
plugin/ipc/renderer suites green (257 tests) + SDK package (15).
Closes the agent-tool loop: a spawned agent's model can now call registered
plugin `tools`, each call risk-gated before the broker invokes it.

- MCP bridge: `maestro-cli mcp serve` runs a newline-delimited JSON-RPC stdio
  server (initialize / tools/list / tools/call / ping) that the agent spawns;
  it bridges to the running app over the CLI WebSocket. stdout is MCP-only,
  all logs go to stderr. Core is hand-rolled + dependency-free (matches the
  repo's bundle-safe contract convention) and unit-tested with real frames.
- App handlers `plugins_list_tools` / `plugins_call_tool`: flag-gated on
  encoreFeatures.plugins and risk-gated via evaluatePluginDispatch - a HIGH-risk
  model-initiated call is surfaced and NEVER executed.
- Per-agent ephemeral MCP-config adapters (no global config mutation): claude
  (--mcp-config + --strict-mcp-config), codex (-c mcp_servers.*), opencode
  (OPENCODE_CONFIG temp file) verified against the live CLIs; gemini / qwen /
  copilot / droid / hermes / pi carried as best-guess (verified:false). The
  spawn wiring auto-injects ONLY verified strategies.
- Spawn wiring (process.ts): local + verified + tools-present spawns only; SSH
  skipped (bridge needs localhost WS); per-spawn mkdtemp dir + recursive cleanup.

Verification: 26 new tests (protocol handshake, per-agent injection shapes,
bridge mapping/de-collision/long-timeout); tsc x3 clean; ESLint + Prettier
clean; SDK drift unaffected (no vendored-contract change).
…ge confirmed

opencode batch mode relies on OPENCODE_CONFIG_CONTENT (inline JSON setting
permission/question:deny to prevent prompt hangs). A separate OPENCODE_CONFIG
file with only an mcp block has unconfirmed merge-vs-replace precedence against
that inline content; replacing would re-enable prompts and hang the agent. Mark
opencode verified:false (adapter still built + tested, just not auto-injected) so
only claude + codex auto-inject. Covers hermes/pi too.
…ntributions, pianola)

Audit pass (4 reviewer subagents + 2 gpt-5.5 codex reviews) over the full diff
vs origin/rc surfaced real gaps. Fixes:

MCP bridge:
- CRITICAL: skip MCP injection on the claude interactive (maestro-p / electron-as-node)
  spawn path - prepending --mcp-config before the script path broke the launch.
- HIGH: bridge rejects unmapped tool names + the app validates the toolId is a DECLARED
  tool, never an arbitrary command handler in the sandbox's shared handler map.
- Rate the risk gate on the tool name + description + the model's args (not the raw
  toolId slug): catches destructive tools without false-positive-blocking benign ones.
- Forward MAESTRO_USER_DATA / XDG_CONFIG_HOME to the bridge (custom-data-dir installs).
- Drop --strict-mcp-config so the user's own MCP servers stay loaded.
- Return JSON-RPC -32700 on malformed stdin; validate tool inputSchema before advertising.

Security / plugins:
- HIGH: enforce transcriptReadEgressConflict in plugins:set-grants (the consent boundary),
  not only the renderer/runtime - a direct IPC call can no longer persist transcripts:read
  together with net:fetch / process:spawn for an untrusted plugin.
- Per-bucket contribution-id uniqueness: a tool sharing a command's localId is no longer dropped.
- getPanelHtml realpath-resolves + re-checks containment (symlink escape).
- Unify plugin sign / pack / host-verify on one exclusion policy so the scaffold's
  sign-then-pack flow verifies on install; SDK license -> valid SPDX.

Pianola:
- Rate risk over the full agent turn (not just the last message): closes a cross-message
  high-risk auto-answer bypass of the always-escalate invariant.
- Don't adopt a failed (undispatched) auto-answer as the handled cursor on restart.
- Wrap the orchestrate iteration in try/catch so a transient WS error doesn't end the run.
- Per-record byte cap on the decision log.

Tests: +28 (bridge reject/timeout, app-side gate branches, set-grants conflict, contribution
dedup, full-turn risk, rehydrate, orchestrate, byte cap, sign/pack/verify agreement).
tsc x3 clean; ESLint + Prettier clean; SDK drift intact.
@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 452a7fa9-1e2c-45d0-8079-94c8d02d1f4f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/autonomous-manager-agent

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@mintlify

mintlify Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
Maestro 🟢 Ready View Preview Jun 28, 2026, 1:21 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

…d ESM)

`maestro plugin init` wrote an ESM entry.js (export function activate / export
default), but the sandbox loads the file raw via `new vm.Script(code)` and reads
`module.exports` - CommonJS, no module loader. `vm.Script` can't parse `export`,
so it threw SyntaxError and every freshly-scaffolded tier-1 plugin silently
failed to activate ("failed to start plugin").

- entryJsTemplate -> `module.exports = { activate, deactivate }` (runtime-compatible),
  keeping the JSDoc @import types so it still type-checks without a build step.
- package.json scaffold `type: commonjs`; tsconfig `module`/`moduleResolution`
  NodeNext (verified: module.exports type-checks under checkJs with no @types/node
  and no node10 deprecation).
- Regression guard: load the scaffolded entry.js exactly as the sandbox does
  (`new vm.Script` + bare `module` shim) and assert activate/deactivate bind.

12/12 plugin CLI tests pass; tsc (cli + lint) clean; ESLint + Prettier clean.
The dev guide already documented the CommonJS module.exports contract, but had
no top-level authoring walkthrough and showed entry.js examples without the
JSDoc @import typing - plus a section-14 ESM `import` that could be misread as
the runtime entry shape (which would not parse in the vm.Script sandbox).

- Quickstart mirroring `maestro plugin init` exactly: tier-0 vs tier-1 scaffold
  output (package.json type:commonjs, tsconfig NodeNext), the CommonJS entry.js
  shape, an explicit "no ESM in entry.js" warning, then validate/sign/pack/install.
- Type the canonical entry.js examples (section 6 + quickstart) with the JSDoc
  @import tag so they are copy-paste-accurate and type-check.
- Section 14: clarify SDK types come via a JSDoc @import in the plain-JS entry;
  the ESM `import` form is only for authoring in TypeScript compiled to CommonJS.

Prettier-clean. CLAUDE-PLUGINS.md already delegates authoring here (no change).
…ion (Phase A)

Plan: Plans/plugin-full-surface-plan.md — the full-surface plugin model where
enabling a plugin is a full-trust decision and the security boundary is the
unforgeable authorization gate, not runtime sandboxing of authorized code.

Lands the gate's core (more Phase-A wiring follows):
- AuthorizationStore: ONE profile-wide sealed ledger (Electron safeStorage),
  replacing the plain-JSON enable + grants files. Enable and grant are unified —
  enabling IS minting an authorization; there is no separate set-enabled path.
- Anti-rollback: a monotonic epoch + per-install secret held in a NAMED OS
  credential entry (outside the rollable file tree). A restored old sealed ledger
  fails the epoch check -> re-consent. Closes the replay/rollback hole.
- Tombstones: uninstall is authoritative (always bumps epoch + records a
  tombstone, even for a never-enabled plugin) so a restored folder can never
  silently re-enable.
- Identity binding: a grant binds {contentHash, signatureStatus, signerKey}, so a
  post-consent code OR signer/trust change forces re-consent (the content digest
  excludes signature.json, so signer identity is tracked explicitly).
- Fail-safe = session-only: missing seal/anchor (keyring-less Linux) or a locked
  keyring -> grants live in memory for the session, never silently persisted.
  persist() catches seal/anchor/disk failures and degrades safely; no mode
  persists authorization without the anchor.
- computePluginContentHash() reuses the signer's canonical file digest.

Dependency-injected (seal/anchor) so rollback, tombstone, tamper, locked-keyring,
and session-only paths are unit-tested. 11 security tests green; tsc (main+lint)
clean; Prettier clean.

Not yet wired (next Phase-A commits): production safeStorage + keyring factory
(adds @napi-rs/keyring, externalized + asarUnpack), the isolated minter IPC
(sender-frame + nonce + user-activation), the refresh-time verifier, and the
separate non-extensible consent window.
… (Phase A)

ConsentMinter is the only path that creates trust, enforcing the three
contract checks the nonce core alone could not:

- nonce issued ONLY in the main-owned open path (requestConsent), never a
  renderer-callable endpoint, and bound to THIS prompt (a superseded
  prompt's still-live nonce cannot validate against a newer one);
- confirm accepted ONLY from the recorded consent frame via a frame-level
  ConsentSender token (webContents id + frame routing id + url) - a bare
  webContents id a subframe/navigation could share is not enough;
- no renderer-supplied "user activation" flag trusted; trust derives from
  sender-frame identity + one-time nonce alone.

On confirm it consumes the nonce, resolves the plugin's current identity,
re-checks the transcripts:read + egress mutual-exclusion at the mint
boundary, and mints into the sealed AuthorizationStore.

pluginIdentity() maps an installed dir -> AuthIdentity (content digest +
signature status + signer key), guarding both the content-hash and the
signature read so an unhashable/unreadable plugin returns null (never
mintable) instead of crashing the minter/refresh verifier.

Pure given injected deps (no Electron) - 28 tests. The IPC layer wires
openPrompt to a real consent BrowserWindow/WebContentsView and builds the
ConsentSender from event.senderFrame; switching the broker/getGrants to
read the ledger is the remaining app-level step.
…bilities

Phase B added ui:contribute / ui:panel / ui:render-unsafe to the host
capability vocabulary but never bumped the host-API version or re-vendored
the standalone @maestro/plugin-sdk, so the drift guard was red.

- HOST_API_VERSION 1.3.0 -> 1.4.0 (source + vendored): a backward-compatible
  capability addition is a MINOR bump per the host-api contract.
- Re-vendor the SDK capability surface: the three caps in the PluginCapability
  union, PLUGIN_CAPABILITIES, CAPABILITY_RISK, CAPABILITY_SCOPE_KIND, and
  describeCapability - byte-for-byte with src/shared/plugins/permissions.ts.
- Vendor the UI-item contribution surface the SDK was also missing:
  UiSurface, UI_SURFACES, isUiSurface, UiItemContribution, and the uiItems
  field on PluginContributions + AggregatedContributions.
- Extend the drift guard: pin 1.4.0 and add UI_SURFACES parity (the prior
  guard only covered capabilities/tiers/topics/methods, so the contribution
  surface could silently fall behind).
- Bump @maestro/plugin-sdk 0.1.0 -> 0.2.0 in lockstep with the contract.
- Docs: add the ui:* rows to the capability tables in PLUGIN-DEVELOPMENT.md
  and CLAUDE-PLUGINS.md, the gating note, and the 1.4.0 minHostApi refs.

SDK 16 tests green; host-api/permissions/contributions 61 green; tsc clean
across SDK + main.
…ibutions

The runtime drift guard compares vocabularies/catalogs but can't see a new
or changed FIELD on a contribution interface. Add a vitest typecheck pass
(drift.test-d.ts via tsconfig.test.json) asserting the vendored
UiItemContribution / PluginContributions / AggregatedContributions are
type-identical to their host sources, so a future un-vendored field fails
tsc. Confirms the full contribution graph is currently in parity.
…sion (Phase A)

Two layers wired into the manager:

1. getActiveRecords() now excludes records whose signature is 'invalid'.
   This is the single authoritative "active" filter (contributions, agents),
   so tampered code can never contribute on ANY path - refresh, setEnabled,
   or a future toggle - even though listActive()/the enable toggle do not
   themselves check the signature. (isRunnable already excluded it for the
   sandbox; this closes the contribution/agent side.)

2. refresh() consults an optional verifyRecord seam: an enabled, runnable
   code-tier plugin the injected gate rejects (consented identity no longer
   matches the bytes on disk, or it was removed) is force-DISABLED during
   discovery. Only force-disables; absent by default (the enable toggle +
   consent govern), so current behavior is unchanged until production injects
   the seam (AuthorizationStore.verify() + pluginIdentity() against the live
   ledger) - the deferred app-integration step, same pattern as getGrants.

7 tests, incl. a setEnabled-bypass proof for the tamper exclusion; tsc +
eslint clean; 175 plugin tests green.
Wires the dedicated, non-extensible consent window and the isolated minter
end to end so the sealed AuthorizationStore is populated through the only
trusted path. CDP-verified in the built app: requesting consent for an
installed plugin opens the consent window, approving a subset mints ONLY
that subset into the ledger, and the window auto-closes.

- src/main/plugins/consent-window.ts: opens a modal, non-resizable, menu-less
  BrowserWindow that loads its OWN minimal consent.html with its OWN minimal
  consent preload (NOT the main SPA / full preload), hands the offer + one-time
  nonce in via additionalArguments only, denies navigation/new windows, and
  returns the window's ConsentSender (webContents id + main frame routing id +
  url) for confirm-time frame validation.
- src/main/preload/consent.ts + src/main/consent/consent.html: the isolated
  surface; exposes ONLY window.pluginConsent { offer, confirm, cancel }.
- scripts/build-preload.mjs: also builds consent-preload.js + copies consent.html.
- index.ts: constructs the AuthorizationStore (safeStorage seal + default
  noAnchor -> session-only until the keyring/packaging step) and the
  ConsentMinter; registers plugins:request-consent (main-renderer-only, opens
  the window), plugins:confirm-consent (validates event.senderFrame against the
  recorded consent frame + one-time nonce, then mints + audit-logs the granted
  subset), and plugins:cancel-consent. Superseding a prompt closes the prior
  window so a stale nonce can't close the new one.
- main preload + global.d.ts: requestConsent bridge for the trusted renderer.

Trust derives from the host-owned window + sender-frame + one-time nonce, not
a renderer-supplied activation flag (which a sandboxed surface could forge).
Reads (broker/manager/getGrants) still use the on-disk store: the read-flip to
make the ledger the live enforcement source is the documented next step, so
set-grants stays the live path until it is replaced.

tsc x3 + ESLint clean; 175 plugin tests green.
…p, Phase A)

Completes "Verifier in plugin-manager.refresh": the broker, contribution
gating, event bus, transcript-conflict check, scheduler dispatch, and the
get-grants IPC now ALL read the sealed AuthorizationStore instead of the
forgeable on-disk grants store. The manager's verifyRecord seam is wired to
authStore.verify() + pluginIdentity(), so a refresh force-disables an enabled
code-tier plugin whose consented identity no longer matches the bytes on disk
(tamper) or that was removed.

Consent flow is now the only grant path:
- plugins:set-grants is REMOVED. A code-tier plugin can only be granted via the
  host-owned consent window (plugins:request-consent -> ConsentMinter.confirm),
  which mints the approved subset and then enables + reconciles the plugin.
- plugins:set-enabled GATES code-tier activation: enabling a tier>=1 plugin is
  rejected (PluginNotAuthorized) unless it already holds a ledger grant, so the
  renderer can never start plugin code without consent. (4 new handler tests.)
- revoke-grants drops the sealed grant AND disables the plugin; uninstall
  tombstones it in the ledger.
- PluginsPanel routes code-tier enable through requestConsent; the in-renderer
  PluginConsentDialog (superseded by the host window) is removed.

Removed the dead `registerAllHandlers` plugin block (a divergent local
registration; production registers individually in index.ts).

Session-only fallback (noAnchor) until the keyring persistent anchor lands -
re-consent each launch, acceptable for the flag-gated feature; the keyring
anchor is a separate packaging task.

Verified end-to-end via CDP in the built app: ungranted tier-1 enable rejected;
consent mints fs:read only (not the unchecked notifications:toast) + enables;
getGrants reads the ledger; revoke drops the grant + disables. tsc x3 + ESLint
clean; 176 plugin tests green.
…gress (Phase D)

Reduces the documented plugin-panel self-navigation egress residual (a panel
setting window.location to a remote URL to leak data obtained via its granted
caps) with two defense-in-depth layers:

- Bridge origin-gating: the maestro:invokeCommand handler now requires
  event.origin === 'null' (the sandboxed srcDoc frame's opaque origin) in
  addition to the source check. If a panel self-navigates to a real origin the
  WindowProxy source can survive, but its messages then carry a non-null origin
  and are rejected, so a navigated frame can no longer reuse the bridge.
- Main-process subframe backstop: blocksSubframeNavigation (pure, unit-tested)
  wired into the main window's will-frame-navigate blocks any subframe navigating
  away from its initial about:srcdoc/about:blank document (data: included - it
  would drop the CSP while keeping the opaque origin). All main-window iframes
  are srcDoc (plugin panels + file-preview); browser tabs are separate guest
  WebContents, so the guard is scoped without keying off a mutable window.name.

HONEST SCOPE: the load-bearing barriers stay the iframe sandbox (no
allow-top-navigation), the SPA frame CSP, and the origin-gated bridge.
will-frame-navigate did not reliably surface sandboxed-srcdoc self-navigation in
CDP probing, so the new guard is a belt-and-suspenders main-process layer, not
the sole barrier - the comments say so.

6 new unit tests for blocksSubframeNavigation; tsc x3 + ESLint clean; 331
plugin/renderer tests green.
…rity-deferrals

The two remaining plugin-surface items are deliberate Phase-3 security
deferrals, not implementations. Convert them from prose intentions into
ENFORCED regression invariants so the boundary fails loudly if weakened.

E (agents:dispatch / process:spawn — arbitrary-code-execution grade):
- factory test: buildHostCallHandlers omits both verbs with default deps,
  still exposes read-only agents.get, and the deps are the sole gate.
- AST source guard (plugin-host-deps-wiring.test.ts): the live
  buildHostCallHandlers({...}) call in index.ts must pass no dispatch/spawn
  dep (parses the TS AST, not text), so wiring either fails CI.

D (ui:render-unsafe — highest-risk render escape hatch, still inert):
- gate test: holding only ui:render-unsafe yields uiItems=[]/panels=[], so
  the escape-hatch grant can't silently unlock host-rendered surfaces.

No production wiring changed; both verbs and the unsafe render surface stay
inert. tsc x3 + ESLint + Prettier clean; 63 affected tests green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant