Skip to content

Commit bbfa70f

Browse files
authored
Merge pull request #654 from cluesmith/builder/spir-591-af-workspace-failure-with-code
[Spec 591] Agent Harness Abstraction
2 parents c86fa1a + 225c5af commit bbfa70f

20 files changed

Lines changed: 1704 additions & 65 deletions
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
# Plan: Agent Harness Abstraction
2+
3+
## Metadata
4+
- **Specification**: `codev/specs/591-af-workspace-failure-with-code.md`
5+
6+
## Executive Summary
7+
8+
Implement an extensible agent harness system that replaces hardcoded Claude-specific `--append-system-prompt` flags with per-harness role injection. Three phases: (1) create the harness module with built-in providers and config parsing, (2) refactor all call sites to use it, (3) add integration tests.
9+
10+
## Phases (Machine Readable)
11+
12+
```json
13+
{
14+
"phases": [
15+
{"id": "harness-module", "title": "Harness Provider Module + Config"},
16+
{"id": "call-site-refactor", "title": "Call Site Refactoring"},
17+
{"id": "integration-tests", "title": "Integration Tests"}
18+
]
19+
}
20+
```
21+
22+
## Phase Breakdown
23+
24+
### Phase 1: Harness Provider Module + Config
25+
**Dependencies**: None
26+
27+
#### Objectives
28+
- Create the harness provider module with interface, built-in providers, custom config parsing, and resolution function
29+
- Add harness config fields to the UserConfig type system
30+
31+
#### Deliverables
32+
- `packages/codev/src/agent-farm/utils/harness.ts` — new file
33+
- `packages/codev/src/agent-farm/types.ts` — updated UserConfig
34+
- `packages/codev/src/lib/config.ts` — updated CodevConfig + custom harness validation at load time
35+
- `packages/codev/src/agent-farm/__tests__/harness.test.ts` — new test file
36+
37+
#### Implementation Details
38+
39+
**`harness.ts`** — New module containing:
40+
41+
1. `HarnessProvider` interface with `buildRoleInjection()` and `buildScriptRoleInjection()`
42+
2. Built-in provider objects: `CLAUDE_HARNESS`, `CODEX_HARNESS`, `GEMINI_HARNESS`
43+
3. `BUILTIN_HARNESSES` registry map: `{ claude, codex, gemini }`
44+
4. `buildCustomHarnessProvider(config)` — constructs a HarnessProvider from a custom config definition, expanding `${ROLE_FILE}` and `${ROLE_CONTENT}` template variables
45+
5. `resolveHarness(harnessName: string | undefined, userConfig: UserConfig)` — resolves a harness name to a provider:
46+
- `undefined` → returns `CLAUDE_HARNESS` (default)
47+
- matches built-in → returns built-in
48+
- matches key in `userConfig.harness` → builds custom provider
49+
- otherwise → throws descriptive error
50+
51+
**`types.ts`** — Add to `UserConfig`:
52+
```typescript
53+
shell?: {
54+
architect?: string | string[];
55+
architectHarness?: string; // NEW
56+
builder?: string | string[];
57+
builderHarness?: string; // NEW
58+
shell?: string | string[];
59+
};
60+
harness?: Record<string, { // NEW
61+
roleArgs: string[];
62+
roleEnv?: Record<string, string>;
63+
roleScriptFragment: string;
64+
roleScriptEnv?: Record<string, string>;
65+
}>;
66+
```
67+
68+
**`lib/config.ts`** — Add same fields to `CodevConfig`:
69+
```typescript
70+
shell?: {
71+
architect?: string | string[];
72+
architectHarness?: string; // NEW
73+
builder?: string | string[];
74+
builderHarness?: string; // NEW
75+
shell?: string | string[];
76+
};
77+
harness?: Record<string, { ... }>; // NEW — same shape as UserConfig
78+
```
79+
80+
Add a `validateHarnessConfig()` function called during `loadConfig()` that checks:
81+
- Each custom harness entry has required fields (`roleArgs`, `roleScriptFragment`)
82+
- `roleArgs` is an array of strings
83+
- `roleScriptFragment` is a string
84+
If validation fails, throw a descriptive error at config load time (per spec: "missing required fields produce a descriptive error at config load time").
85+
86+
#### Acceptance Criteria
87+
- `resolveHarness('claude')` returns claude provider
88+
- `resolveHarness('codex')` returns codex provider with `-c model_instructions_file=<path>` args
89+
- `resolveHarness('gemini')` returns gemini provider with `GEMINI_SYSTEM_MD` env var
90+
- `resolveHarness(undefined)` defaults to claude
91+
- `resolveHarness('nonexistent')` throws clear error
92+
- Custom harness from config correctly expands `${ROLE_FILE}` and `${ROLE_CONTENT}`
93+
- Missing required fields in custom harness throw descriptive error
94+
95+
#### Test Plan
96+
- **Unit Tests**: Test each built-in provider's `buildRoleInjection()` and `buildScriptRoleInjection()` outputs. Test resolution logic (built-in, custom, default, error). Test template variable expansion. Test custom harness validation (missing required fields).
97+
98+
---
99+
100+
### Phase 2: Call Site Refactoring
101+
**Dependencies**: Phase 1
102+
103+
#### Objectives
104+
- Replace all 5 hardcoded `--append-system-prompt` locations with harness provider calls
105+
- Update `buildArchitectArgs()` return type and its 3 callers
106+
- Fix side issues (deprecated Codex flag, "Claude exited" message)
107+
108+
#### Deliverables
109+
- `packages/codev/src/agent-farm/commands/spawn-worktree.ts` — updated
110+
- `packages/codev/src/agent-farm/commands/architect.ts` — updated
111+
- `packages/codev/src/agent-farm/servers/tower-utils.ts` — updated
112+
- `packages/codev/src/agent-farm/servers/tower-terminals.ts` — updated (2 call sites)
113+
- `packages/codev/src/agent-farm/servers/tower-instances.ts` — updated (1 call site)
114+
- `packages/codev/src/commands/consult/index.ts` — side fix
115+
- `packages/codev/src/agent-farm/utils/config.ts` — add harness resolution helpers
116+
- `packages/codev/src/agent-farm/__tests__/spawn-worktree.test.ts` — updated
117+
- `packages/codev/src/agent-farm/__tests__/af-architect.test.ts` — updated
118+
- `packages/codev/src/commands/consult/__tests__/codex-sdk.test.ts` — updated (`experimental_instructions_file``model_instructions_file`)
119+
120+
#### Implementation Details
121+
122+
**`spawn-worktree.ts`** — Two functions to update:
123+
124+
1. `startBuilderSession()` (line 597): Import `resolveHarness`. Get builder harness from config. Call `provider.buildScriptRoleInjection(roleContent, roleFile)`. Generate `export` lines from returned `env`. Replace hardcoded `--append-system-prompt "$(cat '${roleFile}')"` with the harness fragment. Change "Claude exited" to "Agent exited" in restart message.
125+
126+
2. `buildWorktreeLaunchScript()` (line 668): Same pattern. Both functions need the harness name passed in (from config) or resolved internally.
127+
128+
**`architect.ts`** (line 29): Write role to `<workspaceRoot>/.architect-role.md` (align with `tower-utils.ts`). Call `resolveHarness(harnessName).buildRoleInjection(role.content, roleFilePath)`. Spread returned args into spawn args. Merge returned env into spawn env.
129+
130+
**`tower-utils.ts`** (line 175): `buildArchitectArgs()` return type changes from `string[]` to `{ args: string[]; env: Record<string, string> }`. Resolve harness, call `buildRoleInjection()`, return combined args and env.
131+
132+
**Caller updates** — Three callers of `buildArchitectArgs()`:
133+
- `tower-terminals.ts:536` — destructure `{ args, env }`, pass env to session creation
134+
- `tower-terminals.ts:725` — same pattern for resume
135+
- `tower-instances.ts:377` — same pattern for workspace launch
136+
137+
**Config helpers** — Add `getArchitectHarness()` and `getBuilderHarness()` to `agent-farm/utils/config.ts` that read `architectHarness`/`builderHarness` from UserConfig and call `resolveHarness()`. These helpers are the single point where harness resolution happens — callers (`startBuilderSession()`, `buildWorktreeLaunchScript()`, `architect()`, `buildArchitectArgs()`) call the helper rather than resolving inline, avoiding duplication.
138+
139+
**Consult side fix**`consult/index.ts:383`: Change `experimental_instructions_file` to `model_instructions_file` in the Codex SDK config.
140+
141+
#### Acceptance Criteria
142+
- Claude workflows unchanged (regression-safe)
143+
- No remaining references to `--append-system-prompt` except inside `CLAUDE_HARNESS` provider
144+
- `buildArchitectArgs()` returns `{ args, env }` and all callers handle it
145+
- `architect.ts` writes role to `.architect-role.md`
146+
- "Agent exited" message in restart loop
147+
- `consult` uses `model_instructions_file` for Codex
148+
149+
#### Test Plan
150+
- **Unit Tests**: Update `spawn-worktree.test.ts` to test with claude harness (regression) and verify no hardcoded `--append-system-prompt` outside the provider. Update `af-architect.test.ts` similarly.
151+
- **Manual Testing**: If possible, verify `afx workspace start` with claude config still works.
152+
153+
---
154+
155+
### Phase 3: Integration Tests
156+
**Dependencies**: Phase 2
157+
158+
#### Objectives
159+
- Add comprehensive integration tests verifying all call sites produce correct commands for each harness type
160+
- Verify custom harness config end-to-end
161+
162+
#### Deliverables
163+
- `packages/codev/src/agent-farm/__tests__/harness-integration.test.ts` — new test file
164+
165+
#### Implementation Details
166+
167+
Integration tests that verify the final output of each call site for each harness type:
168+
169+
1. **`buildWorktreeLaunchScript()` integration**: For each harness (claude, codex, gemini, custom), verify the generated bash script contains the correct role injection fragment and env exports.
170+
171+
2. **`buildArchitectArgs()` integration**: For each harness, verify the returned args and env are correct. For claude: `--append-system-prompt` with content. For codex: `-c model_instructions_file=<path>`. For gemini: env `GEMINI_SYSTEM_MD`.
172+
173+
3. **Custom harness end-to-end**: Define a custom harness in mock config, resolve it, verify template expansion works correctly with `${ROLE_FILE}` and `${ROLE_CONTENT}`.
174+
175+
4. **Error cases**: Unknown harness name, missing required fields in custom harness config.
176+
177+
5. **No-role behavior**: Verify all call sites skip harness injection when role is `null`.
178+
179+
#### Acceptance Criteria
180+
- All test scenarios from spec section "Test Scenarios" (1-8) have corresponding test cases
181+
- Tests pass for all 3 built-in harnesses + custom harness + error cases
182+
- No-role behavior tested
183+
184+
#### Test Plan
185+
- **Integration Tests**: Parameterized tests across harness types for each call site function.
186+
187+
## Risk Assessment
188+
189+
| Risk | Impact | Mitigation |
190+
|------|--------|------------|
191+
| `tower-terminals.ts` env propagation may not reach PTY sessions | High | Verify how `createPtySession()` handles env vars; may need to update PTY creation |
192+
| `architect.ts` `shell: true` spawn with multiline role content | Medium | Existing issue — not introduced by this change. Document for future fix. |
193+
| Codex `model_instructions_file` flag may change again | Low | Verified against current Codex v0.117.0; easy to update |
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Review: Agent Harness Abstraction
2+
3+
## Summary
4+
5+
Implemented an extensible agent harness system that replaces hardcoded Claude-specific `--append-system-prompt` flags with per-harness role injection. Built-in providers for Claude, Codex, and Gemini. Custom harness definitions configurable in `.codev/config.json`. Fixes GitHub issue #591 where `afx workspace start` fails with Codex.
6+
7+
## Spec Compliance
8+
9+
- [x] `afx workspace start` does not crash with `architect: "codex"` — role injected via `model_instructions_file`
10+
- [x] `afx spawn` does not crash with `builder: "codex"`
11+
- [x] Existing Claude-based workflows unchanged (no regression)
12+
- [x] All existing tests pass (updated as needed)
13+
- [x] Explicit `architectHarness`/`builderHarness` config resolves to built-in or custom providers
14+
- [x] Gemini support via `GEMINI_SYSTEM_MD` env var
15+
- [x] Custom harness definitions with template variable expansion (`${ROLE_FILE}`, `${ROLE_CONTENT}`)
16+
- [x] Unknown harness names produce clear error and fail to launch
17+
- [x] Deprecated `experimental_instructions_file``model_instructions_file` in consult
18+
19+
## Deviations from Plan
20+
21+
- **Phase 2**: Added `shellEscapeSingleQuote()` utility (not in original plan) — identified during Codex review as necessary for safe shell quoting in script fragments and env exports. Paths containing single quotes (e.g., `/Users/O'Neil/...`) would otherwise break generated bash scripts.
22+
- **Phase 2**: Changed architect error message from "install claude" to generic "Check .codev/config.json shell.architect setting" — identified during Codex review.
23+
- **Phase 3**: Added `workspaceRoot` parameter to `buildWorktreeLaunchScript()` — needed for harness resolution from config. Optional parameter, backward compatible.
24+
25+
## Lessons Learned
26+
27+
### What Went Well
28+
29+
- The forge/consult pattern was a good model for extensibility — the harness provider abstraction follows the same shape
30+
- Splitting `buildRoleInjection` (Node spawn) from `buildScriptRoleInjection` (bash scripts) cleanly handles the two distinct integration patterns
31+
- 3-way consultation caught real issues at every phase (env value validation, shell quoting, error message, call-site test coverage)
32+
33+
### Challenges Encountered
34+
35+
- **Gemini consultation reliability**: Gemini CLI frequently failed to produce verdicts, requiring re-runs. Appears to be an issue with the Gemini CLI's agent recursion prevention when reviewing complex codebases.
36+
- **ESM module mocking in tests**: `vi.doMock` with dynamic imports didn't work as expected for changing mock behavior per-test. Solved by using shared mock functions with `mockReturnValue()` per-test instead.
37+
38+
### What Would Be Done Differently
39+
40+
- Would have included `shellEscapeSingleQuote()` in the original plan — shell quoting for generated scripts is a predictable need
41+
- Would have tested call-site functions directly from the start in Phase 3 rather than testing providers in isolation (which overlapped with Phase 1 unit tests)
42+
43+
## Technical Debt
44+
45+
- `architect.ts` uses `spawn({ shell: true })` which means multiline role content passed as args may have escaping issues. This is a pre-existing issue, not introduced by this change. The harness abstraction doesn't make it worse (Claude passes content directly, Codex/Gemini use file paths).
46+
47+
## Consultation Feedback
48+
49+
### Specify Phase (Round 1)
50+
51+
#### Claude
52+
- **Concern**: Codex `-c experimental_instructions_file` may only work via SDK, not CLI
53+
- **Addressed**: Verified via CLI testing — works, but deprecated. Updated to `model_instructions_file`.
54+
- **Concern**: `--dangerously-skip-permissions` and equivalent flags not addressed
55+
- **N/A**: These are user-configured, not hardcoded in source.
56+
57+
#### Codex
58+
- **Concern**: Generic fallback for interactive modes underspecified
59+
- **Addressed**: Changed to explicit config, no auto-detection, no fallback.
60+
- **Concern**: Shell detection by substring too ambiguous
61+
- **Addressed**: Removed auto-detection per architect review.
62+
63+
#### Gemini
64+
- **Concern**: `buildRoleArgs` array escaping breaks Claude via Node spawn
65+
- **Addressed**: Split into two APIs — `buildRoleInjection` (content) and `buildScriptRoleInjection` (file path).
66+
- **Concern**: `architect.ts` doesn't write role to disk
67+
- **Addressed**: Added file write in architect.ts.
68+
69+
### Plan Phase (Round 1)
70+
71+
#### Claude: APPROVE — No concerns
72+
#### Gemini: APPROVE — No concerns
73+
#### Codex
74+
- **Concern**: Missing `lib/config.ts` updates for custom harness validation
75+
- **Addressed**: Added `CodevConfig` updates and `validateHarnessConfig()` at load time.
76+
- **Concern**: Missing `codex-sdk.test.ts` in deliverables
77+
- **Addressed**: Added to Phase 2 deliverables.
78+
79+
### Phase 1: harness-module (Implementation Round 1)
80+
81+
#### Claude: APPROVE
82+
#### Gemini: APPROVE
83+
#### Codex
84+
- **Concern**: `roleEnv`/`roleScriptEnv` entries not validated as strings
85+
- **Addressed**: Added per-entry string validation with descriptive error messages.
86+
87+
### Phase 2: call-site-refactor (Implementation Round 1)
88+
89+
#### Claude: APPROVE
90+
#### Gemini: APPROVE
91+
#### Codex
92+
- **Concern**: Unsafe shell quoting in script env exports
93+
- **Addressed**: Added `shellEscapeSingleQuote()` to all script fragments and env exports.
94+
- **Concern**: Claude-specific "install claude" error message
95+
- **Addressed**: Changed to generic message.
96+
97+
### Phase 3: integration-tests (Implementation Round 1)
98+
99+
#### Gemini: APPROVE
100+
#### Claude: COMMENT
101+
- **Concern**: Scenario 7 is a no-op, Scenario 8 tests providers not call sites
102+
- **Addressed**: Rewrote with real call-site integration tests using dynamic imports and per-test mock configuration.
103+
#### Codex
104+
- **Concern**: Tests don't exercise real integration points
105+
- **Addressed**: Same fix as Claude.
106+
107+
## Architecture Updates
108+
109+
New module added: `packages/codev/src/agent-farm/utils/harness.ts` — agent harness provider abstraction. Provides `HarnessProvider` interface, built-in providers (Claude, Codex, Gemini), custom harness config parsing with template variable expansion, and resolution logic. Integrated at all 5 call sites that previously hardcoded `--append-system-prompt`. Config extended with `shell.architectHarness`, `shell.builderHarness`, and `harness` section in both `UserConfig` and `CodevConfig`.
110+
111+
## Lessons Learned Updates
112+
113+
No generalizable lessons beyond what's documented in the existing lessons-learned.md entries about extensibility patterns and consultation value.
114+
115+
## Flaky Tests
116+
117+
No flaky tests encountered.
118+
119+
## Follow-up Items
120+
121+
- Consider a standalone `harness` CLI command for testing harness configs without spawning a full workspace
122+
- Consider addressing the pre-existing `shell: true` spawn issue in `architect.ts` where multiline content in args may have escaping problems

0 commit comments

Comments
 (0)