feat: phantom_loop - autonomous iteration primitive with evolution integration#48
Open
electronicBlacksmith wants to merge 5 commits intoghostwright:mainfrom
Open
Conversation
Introduces phantom_loop - an in-process MCP tool that lets the agent spawn iterative tasks where each tick is a fresh SDK session with state persisted in a markdown file. Termination signals: agent self-declares via status: done in the state file frontmatter, iteration or cost budget exhausted, optional success_command returns exit 0, or operator interrupt via Slack button. Runner is fully deterministic: budgets are enforced by TypeScript, the agent only reasons about the task. Crash recovery is just re-scheduling a tick against any loop still marked running; the state file is the source of truth.
Addresses two issues from code review: 1. parseFrontmatter now strips inline YAML comments (`status: done # yay`) and surrounding quotes (`status: "done"`). Without this, an agent writing natural-looking YAML would leave the loop status unparseable, silently burning through its iteration budget instead of terminating. 2. Loop.conversationId was stored but its only use was gating whether the start notice posted - tick and final updates posted regardless. It was meant to be the Slack thread target for loop status updates. Extend SlackChannel.postToChannel with an optional thread_ts and use it for the start notice, so status updates land in the caller's thread when provided.
- Guard workspace paths against traversal out of dataDir in start() - Document success_command env vars and timeout in MCP tool description - Narrow RunnerDeps.runtime to Pick<AgentRuntime,"handleMessage">, drop `as never` casts from tests - Harden SDK internals access in tool.test.ts via exported LOOP_TOOL_NAME constant with SDK version pinned in a comment - Add end-to-end test exercising autoSchedule:true setImmediate path - Collapse finalize() UPDATE+SELECT roundtrip: LoopStore.finalize now returns the updated Loop directly
Closes #5. AsyncLocalStorage context injection, reaction ladder, progress bar, state.md summary on completion. Stop button persists across tick edits. Tick/finalize race eliminated. LoopNotifier extracted from runner.ts. Verified end-to-end in Slack + non-Slack trigger. 945 tests passing.
* feat(loop): integrate evolution, memory, and mid-loop critique into loop ticks Loop ticks now use Phantom's full intelligence stack instead of running blind: Phase 1 - Memory context injection: cached once at loop start from the goal, injected into every tick prompt via TickPromptOptions. Cleared on finalize, rebuilt on resume. Phase 2 - Post-loop evolution and consolidation: bounded transcript accumulation (first tick + rolling 10 summaries + last tick), SessionData synthesis in finalize(), fire-and-forget evolution pipeline and LLM/heuristic memory consolidation with cost-cap guards matching the interactive path. Phase 3 - Mid-loop critique checkpoints: optional checkpoint_interval param lets the agent request Sonnet 4.6 review every N ticks. Guard requires evolution enabled, LLM judges active, and cost cap not exceeded. Critique is awaited before next tick to avoid race conditions. Closes #8 * fix(loop): address code review findings from PR #9 - Decouple postLoopDeps so evolution and memory run independently (evolution works when memory is down and vice versa) - Skip mid-loop critique on terminal ticks to avoid wasted Sonnet calls - Track judge cost on failure paths via JudgeParseError carrying usage data - Extract recordTranscript/clamp from runner.ts to post-loop.ts (292 < 300 lines) * fix(evolution): support OAuth tokens for LLM judge auth resolveJudgeMode() and judge client now check ANTHROPIC_AUTH_TOKEN and CLAUDE_CODE_OAUTH_TOKEN in addition to ANTHROPIC_API_KEY. Enables LLM judges on Max subscription deployments using OAuth bearer tokens. * docs: add phantom_loop documentation for upstream PR Covers MCP tool parameters, state file contract, tick lifecycle, Slack integration, mid-loop critique, post-loop evolution pipeline, memory context injection, and tips for writing effective goals. Closes #12 * fix(test): stabilize trigger-auth and judge-activation tests for CI trigger-auth: use inline Bun.serve instead of startServer to avoid module-level globals and disk I/O that can race across test files. judge-activation: save/restore ANTHROPIC_AUTH_TOKEN and CLAUDE_CODE_OAUTH_TOKEN alongside ANTHROPIC_API_KEY so tests that expect "no credentials" actually clear all auth env vars. --------- Co-authored-by: electronicBlacksmith <electronicBlacksmith@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
phantom_loop- an in-process MCP tool that lets the agent spawn iterative tasks where each tick is a fresh SDK session with state persisted in a markdown file.Core loop:
query()call with the goal + accumulated stateSlack integration:
Evolution integration:
Also includes:
docs/loop.mdFiles added/changed
src/loop/- Runner, store, state file, tool, prompt, notifications, critique, post-loop (8 files)src/loop/__tests__/- 6 test filessrc/agent/slack-context.ts- AsyncLocalStorage for Slack thread contextsrc/index.ts- Wiringdocs/loop.md- DocumentationTest plan
bun run typecheckcleanbun run lintclean