cluesmith
diff --git a/‎codev/plans/591-af-workspace-failure-with-code.md‎
Lines changed: 193 additions & 0 deletions b/‎codev/plans/591-af-workspace-failure-with-code.md‎
Lines changed: 193 additions & 0 deletions
diff --git a/‎codev/reviews/591-af-workspace-failure-with-code.md‎
Lines changed: 122 additions & 0 deletions b/‎codev/reviews/591-af-workspace-failure-with-code.md‎
Lines changed: 122 additions & 0 deletions
@@ -0,0 +1,193 @@
+# Plan: Agent Harness Abstraction
+
+## Metadata
+- **Specification**: `codev/specs/591-af-workspace-failure-with-code.md`
+
+## Executive Summary
+
+Implement an extensible agent harness system that replaces hardcoded Claude-specific `--append-system-prompt` flags with per-harness role injection. Three phases: (1) create the harness module with built-in providers and config parsing, (2) refactor all call sites to use it, (3) add integration tests.
+
+## Phases (Machine Readable)
+
+```json
+{
+  "phases": [
+    {"id": "harness-module", "title": "Harness Provider Module + Config"},
+    {"id": "call-site-refactor", "title": "Call Site Refactoring"},
+    {"id": "integration-tests", "title": "Integration Tests"}
+  ]
+}
+```
+
+## Phase Breakdown
+
+### Phase 1: Harness Provider Module + Config
+**Dependencies**: None
+
+#### Objectives
+- Create the harness provider module with interface, built-in providers, custom config parsing, and resolution function
+- Add harness config fields to the UserConfig type system
+
+#### Deliverables
+- `packages/codev/src/agent-farm/utils/harness.ts` — new file
+- `packages/codev/src/agent-farm/types.ts` — updated UserConfig
+- `packages/codev/src/lib/config.ts` — updated CodevConfig + custom harness validation at load time
+- `packages/codev/src/agent-farm/__tests__/harness.test.ts` — new test file
+
+#### Implementation Details
+
+**`harness.ts`** — New module containing:
+
+1. `HarnessProvider` interface with `buildRoleInjection()` and `buildScriptRoleInjection()`
+2. Built-in provider objects: `CLAUDE_HARNESS`, `CODEX_HARNESS`, `GEMINI_HARNESS`
+3. `BUILTIN_HARNESSES` registry map: `{ claude, codex, gemini }`
+4. `buildCustomHarnessProvider(config)` — constructs a HarnessProvider from a custom config definition, expanding `${ROLE_FILE}` and `${ROLE_CONTENT}` template variables
+5. `resolveHarness(harnessName: string | undefined, userConfig: UserConfig)` — resolves a harness name to a provider:
+   - `undefined` → returns `CLAUDE_HARNESS` (default)
+   - matches built-in → returns built-in
+   - matches key in `userConfig.harness` → builds custom provider
+   - otherwise → throws descriptive error
+
+**`types.ts`** — Add to `UserConfig`:
+```typescript
+shell?: {
+  architect?: string | string[];
+  architectHarness?: string;    // NEW
+  builder?: string | string[];
+  builderHarness?: string;      // NEW
+  shell?: string | string[];
+};
+harness?: Record<string, {       // NEW
+  roleArgs: string[];
+  roleEnv?: Record<string, string>;
+  roleScriptFragment: string;
+  roleScriptEnv?: Record<string, string>;
+}>;
+```
+
+**`lib/config.ts`** — Add same fields to `CodevConfig`:
+```typescript
+shell?: {
+  architect?: string | string[];
+  architectHarness?: string;    // NEW
+  builder?: string | string[];
+  builderHarness?: string;      // NEW
+  shell?: string | string[];
+};
+harness?: Record<string, { ... }>;  // NEW — same shape as UserConfig
+```
+
+Add a `validateHarnessConfig()` function called during `loadConfig()` that checks:
+- Each custom harness entry has required fields (`roleArgs`, `roleScriptFragment`)
+- `roleArgs` is an array of strings
+- `roleScriptFragment` is a string
+If validation fails, throw a descriptive error at config load time (per spec: "missing required fields produce a descriptive error at config load time").
+
+#### Acceptance Criteria
+- `resolveHarness('claude')` returns claude provider
+- `resolveHarness('codex')` returns codex provider with `-c model_instructions_file=<path>` args
+- `resolveHarness('gemini')` returns gemini provider with `GEMINI_SYSTEM_MD` env var
+- `resolveHarness(undefined)` defaults to claude
+- `resolveHarness('nonexistent')` throws clear error
+- Custom harness from config correctly expands `${ROLE_FILE}` and `${ROLE_CONTENT}`
+- Missing required fields in custom harness throw descriptive error
+
+#### Test Plan
+- **Unit Tests**: Test each built-in provider's `buildRoleInjection()` and `buildScriptRoleInjection()` outputs. Test resolution logic (built-in, custom, default, error). Test template variable expansion. Test custom harness validation (missing required fields).
+
+---
+
+### Phase 2: Call Site Refactoring
+**Dependencies**: Phase 1
+
+#### Objectives
+- Replace all 5 hardcoded `--append-system-prompt` locations with harness provider calls
+- Update `buildArchitectArgs()` return type and its 3 callers
+- Fix side issues (deprecated Codex flag, "Claude exited" message)
+
+#### Deliverables
+- `packages/codev/src/agent-farm/commands/spawn-worktree.ts` — updated
+- `packages/codev/src/agent-farm/commands/architect.ts` — updated
+- `packages/codev/src/agent-farm/servers/tower-utils.ts` — updated
+- `packages/codev/src/agent-farm/servers/tower-terminals.ts` — updated (2 call sites)
+- `packages/codev/src/agent-farm/servers/tower-instances.ts` — updated (1 call site)
+- `packages/codev/src/commands/consult/index.ts` — side fix
+- `packages/codev/src/agent-farm/utils/config.ts` — add harness resolution helpers
+- `packages/codev/src/agent-farm/__tests__/spawn-worktree.test.ts` — updated
+- `packages/codev/src/agent-farm/__tests__/af-architect.test.ts` — updated
+- `packages/codev/src/commands/consult/__tests__/codex-sdk.test.ts` — updated (`experimental_instructions_file` → `model_instructions_file`)
+
+#### Implementation Details
+
+**`spawn-worktree.ts`** — Two functions to update:
+
+1. `startBuilderSession()` (line 597): Import `resolveHarness`. Get builder harness from config. Call `provider.buildScriptRoleInjection(roleContent, roleFile)`. Generate `export` lines from returned `env`. Replace hardcoded `--append-system-prompt "$(cat '${roleFile}')"` with the harness fragment. Change "Claude exited" to "Agent exited" in restart message.
+
+2. `buildWorktreeLaunchScript()` (line 668): Same pattern. Both functions need the harness name passed in (from config) or resolved internally.
+
+**`architect.ts`** (line 29): Write role to `<workspaceRoot>/.architect-role.md` (align with `tower-utils.ts`). Call `resolveHarness(harnessName).buildRoleInjection(role.content, roleFilePath)`. Spread returned args into spawn args. Merge returned env into spawn env.
+
+**`tower-utils.ts`** (line 175): `buildArchitectArgs()` return type changes from `string[]` to `{ args: string[]; env: Record<string, string> }`. Resolve harness, call `buildRoleInjection()`, return combined args and env.
+
+**Caller updates** — Three callers of `buildArchitectArgs()`:
+- `tower-terminals.ts:536` — destructure `{ args, env }`, pass env to session creation
+- `tower-terminals.ts:725` — same pattern for resume
+- `tower-instances.ts:377` — same pattern for workspace launch
+
+**Config helpers** — Add `getArchitectHarness()` and `getBuilderHarness()` to `agent-farm/utils/config.ts` that read `architectHarness`/`builderHarness` from UserConfig and call `resolveHarness()`. These helpers are the single point where harness resolution happens — callers (`startBuilderSession()`, `buildWorktreeLaunchScript()`, `architect()`, `buildArchitectArgs()`) call the helper rather than resolving inline, avoiding duplication.
+
+**Consult side fix** — `consult/index.ts:383`: Change `experimental_instructions_file` to `model_instructions_file` in the Codex SDK config.
+
+#### Acceptance Criteria
+- Claude workflows unchanged (regression-safe)
+- No remaining references to `--append-system-prompt` except inside `CLAUDE_HARNESS` provider
+- `buildArchitectArgs()` returns `{ args, env }` and all callers handle it
+- `architect.ts` writes role to `.architect-role.md`
+- "Agent exited" message in restart loop
+- `consult` uses `model_instructions_file` for Codex
+
+#### Test Plan
+- **Unit Tests**: Update `spawn-worktree.test.ts` to test with claude harness (regression) and verify no hardcoded `--append-system-prompt` outside the provider. Update `af-architect.test.ts` similarly.
+- **Manual Testing**: If possible, verify `afx workspace start` with claude config still works.
+
+---
+
+### Phase 3: Integration Tests
+**Dependencies**: Phase 2
+
+#### Objectives
+- Add comprehensive integration tests verifying all call sites produce correct commands for each harness type
+- Verify custom harness config end-to-end
+
+#### Deliverables
+- `packages/codev/src/agent-farm/__tests__/harness-integration.test.ts` — new test file
+
+#### Implementation Details
+
+Integration tests that verify the final output of each call site for each harness type:
+
+1. **`buildWorktreeLaunchScript()` integration**: For each harness (claude, codex, gemini, custom), verify the generated bash script contains the correct role injection fragment and env exports.
+
+2. **`buildArchitectArgs()` integration**: For each harness, verify the returned args and env are correct. For claude: `--append-system-prompt` with content. For codex: `-c model_instructions_file=<path>`. For gemini: env `GEMINI_SYSTEM_MD`.
+
+3. **Custom harness end-to-end**: Define a custom harness in mock config, resolve it, verify template expansion works correctly with `${ROLE_FILE}` and `${ROLE_CONTENT}`.
+
+4. **Error cases**: Unknown harness name, missing required fields in custom harness config.
+
+5. **No-role behavior**: Verify all call sites skip harness injection when role is `null`.
+
+#### Acceptance Criteria
+- All test scenarios from spec section "Test Scenarios" (1-8) have corresponding test cases
+- Tests pass for all 3 built-in harnesses + custom harness + error cases
+- No-role behavior tested
+
+#### Test Plan
+- **Integration Tests**: Parameterized tests across harness types for each call site function.
+
+## Risk Assessment
+
+| Risk | Impact | Mitigation |
+|------|--------|------------|
+| `tower-terminals.ts` env propagation may not reach PTY sessions | High | Verify how `createPtySession()` handles env vars; may need to update PTY creation |
+| `architect.ts` `shell: true` spawn with multiline role content | Medium | Existing issue — not introduced by this change. Document for future fix. |
+| Codex `model_instructions_file` flag may change again | Low | Verified against current Codex v0.117.0; easy to update |
@@ -0,0 +1,122 @@
+# Review: Agent Harness Abstraction
+
+## Summary
+
+Implemented an extensible agent harness system that replaces hardcoded Claude-specific `--append-system-prompt` flags with per-harness role injection. Built-in providers for Claude, Codex, and Gemini. Custom harness definitions configurable in `.codev/config.json`. Fixes GitHub issue #591 where `afx workspace start` fails with Codex.
+
+## Spec Compliance
+
+- [x] `afx workspace start` does not crash with `architect: "codex"` — role injected via `model_instructions_file`
+- [x] `afx spawn` does not crash with `builder: "codex"`
+- [x] Existing Claude-based workflows unchanged (no regression)
+- [x] All existing tests pass (updated as needed)
+- [x] Explicit `architectHarness`/`builderHarness` config resolves to built-in or custom providers
+- [x] Gemini support via `GEMINI_SYSTEM_MD` env var
+- [x] Custom harness definitions with template variable expansion (`${ROLE_FILE}`, `${ROLE_CONTENT}`)
+- [x] Unknown harness names produce clear error and fail to launch
+- [x] Deprecated `experimental_instructions_file` → `model_instructions_file` in consult
+
+## Deviations from Plan
+
+- **Phase 2**: Added `shellEscapeSingleQuote()` utility (not in original plan) — identified during Codex review as necessary for safe shell quoting in script fragments and env exports. Paths containing single quotes (e.g., `/Users/O'Neil/...`) would otherwise break generated bash scripts.
+- **Phase 2**: Changed architect error message from "install claude" to generic "Check .codev/config.json shell.architect setting" — identified during Codex review.
+- **Phase 3**: Added `workspaceRoot` parameter to `buildWorktreeLaunchScript()` — needed for harness resolution from config. Optional parameter, backward compatible.
+
+## Lessons Learned
+
+### What Went Well
+
+- The forge/consult pattern was a good model for extensibility — the harness provider abstraction follows the same shape
+- Splitting `buildRoleInjection` (Node spawn) from `buildScriptRoleInjection` (bash scripts) cleanly handles the two distinct integration patterns
+- 3-way consultation caught real issues at every phase (env value validation, shell quoting, error message, call-site test coverage)
+
+### Challenges Encountered
+
+- **Gemini consultation reliability**: Gemini CLI frequently failed to produce verdicts, requiring re-runs. Appears to be an issue with the Gemini CLI's agent recursion prevention when reviewing complex codebases.
+- **ESM module mocking in tests**: `vi.doMock` with dynamic imports didn't work as expected for changing mock behavior per-test. Solved by using shared mock functions with `mockReturnValue()` per-test instead.
+
+### What Would Be Done Differently
+
+- Would have included `shellEscapeSingleQuote()` in the original plan — shell quoting for generated scripts is a predictable need
+- Would have tested call-site functions directly from the start in Phase 3 rather than testing providers in isolation (which overlapped with Phase 1 unit tests)
+
+## Technical Debt
+
+- `architect.ts` uses `spawn({ shell: true })` which means multiline role content passed as args may have escaping issues. This is a pre-existing issue, not introduced by this change. The harness abstraction doesn't make it worse (Claude passes content directly, Codex/Gemini use file paths).
+
+## Consultation Feedback
+
+### Specify Phase (Round 1)
+
+#### Claude
+- **Concern**: Codex `-c experimental_instructions_file` may only work via SDK, not CLI
+  - **Addressed**: Verified via CLI testing — works, but deprecated. Updated to `model_instructions_file`.
+- **Concern**: `--dangerously-skip-permissions` and equivalent flags not addressed
+  - **N/A**: These are user-configured, not hardcoded in source.
+
+#### Codex
+- **Concern**: Generic fallback for interactive modes underspecified
+  - **Addressed**: Changed to explicit config, no auto-detection, no fallback.
+- **Concern**: Shell detection by substring too ambiguous
+  - **Addressed**: Removed auto-detection per architect review.
+
+#### Gemini
+- **Concern**: `buildRoleArgs` array escaping breaks Claude via Node spawn
+  - **Addressed**: Split into two APIs — `buildRoleInjection` (content) and `buildScriptRoleInjection` (file path).
+- **Concern**: `architect.ts` doesn't write role to disk
+  - **Addressed**: Added file write in architect.ts.
+
+### Plan Phase (Round 1)
+
+#### Claude: APPROVE — No concerns
+#### Gemini: APPROVE — No concerns
+#### Codex
+- **Concern**: Missing `lib/config.ts` updates for custom harness validation
+  - **Addressed**: Added `CodevConfig` updates and `validateHarnessConfig()` at load time.
+- **Concern**: Missing `codex-sdk.test.ts` in deliverables
+  - **Addressed**: Added to Phase 2 deliverables.
+
+### Phase 1: harness-module (Implementation Round 1)
+
+#### Claude: APPROVE
+#### Gemini: APPROVE
+#### Codex
+- **Concern**: `roleEnv`/`roleScriptEnv` entries not validated as strings
+  - **Addressed**: Added per-entry string validation with descriptive error messages.
+
+### Phase 2: call-site-refactor (Implementation Round 1)
+
+#### Claude: APPROVE
+#### Gemini: APPROVE
+#### Codex
+- **Concern**: Unsafe shell quoting in script env exports
+  - **Addressed**: Added `shellEscapeSingleQuote()` to all script fragments and env exports.
+- **Concern**: Claude-specific "install claude" error message
+  - **Addressed**: Changed to generic message.
+
+### Phase 3: integration-tests (Implementation Round 1)
+
+#### Gemini: APPROVE
+#### Claude: COMMENT
+- **Concern**: Scenario 7 is a no-op, Scenario 8 tests providers not call sites
+  - **Addressed**: Rewrote with real call-site integration tests using dynamic imports and per-test mock configuration.
+#### Codex
+- **Concern**: Tests don't exercise real integration points
+  - **Addressed**: Same fix as Claude.
+
+## Architecture Updates
+
+New module added: `packages/codev/src/agent-farm/utils/harness.ts` — agent harness provider abstraction. Provides `HarnessProvider` interface, built-in providers (Claude, Codex, Gemini), custom harness config parsing with template variable expansion, and resolution logic. Integrated at all 5 call sites that previously hardcoded `--append-system-prompt`. Config extended with `shell.architectHarness`, `shell.builderHarness`, and `harness` section in both `UserConfig` and `CodevConfig`.
+
+## Lessons Learned Updates
+
+No generalizable lessons beyond what's documented in the existing lessons-learned.md entries about extensibility patterns and consultation value.
+
+## Flaky Tests
+
+No flaky tests encountered.
+
+## Follow-up Items
+
+- Consider a standalone `harness` CLI command for testing harness configs without spawning a full workspace
+- Consider addressing the pre-existing `shell: true` spawn issue in `architect.ts` where multiline content in args may have escaping problems