Skip to content

feat: phantom_loop - autonomous iteration primitive with evolution integration#48

Open
electronicBlacksmith wants to merge 5 commits intoghostwright:mainfrom
electronicBlacksmith:upstream/feat/loop-primitive
Open

feat: phantom_loop - autonomous iteration primitive with evolution integration#48
electronicBlacksmith wants to merge 5 commits intoghostwright:mainfrom
electronicBlacksmith:upstream/feat/loop-primitive

Conversation

@electronicBlacksmith
Copy link
Copy Markdown

Summary

Adds phantom_loop - an in-process MCP tool that lets the agent spawn iterative tasks where each tick is a fresh SDK session with state persisted in a markdown file.

Core loop:

  • Each tick is a fresh query() call with the goal + accumulated state
  • State persisted in markdown frontmatter (the agent reads/writes it naturally)
  • Termination: agent self-declares done, budget exhausted, success_command exits 0, or operator interrupt via Slack button

Slack integration:

  • AsyncLocalStorage context injection so loop ticks auto-target the operator's thread
  • Reaction ladder, progress bar, stop button
  • Status updates land in the originating thread

Evolution integration:

  • Post-loop evolution pipeline: bounded transcript accumulation, SessionData synthesis, fire-and-forget evolution + memory consolidation
  • Mid-loop critique checkpoints: optional Sonnet 4.6 review every N ticks
  • Memory context injection: cached at loop start, injected into every tick

Also includes:

  • OAuth token support for LLM judges (ANTHROPIC_AUTH_TOKEN, CLAUDE_CODE_OAUTH_TOKEN)
  • Documentation at docs/loop.md

Files added/changed

  • src/loop/ - Runner, store, state file, tool, prompt, notifications, critique, post-loop (8 files)
  • src/loop/__tests__/ - 6 test files
  • src/agent/slack-context.ts - AsyncLocalStorage for Slack thread context
  • src/index.ts - Wiring
  • docs/loop.md - Documentation

Test plan

  • 945+ tests passing
  • End-to-end verified in Slack + non-Slack trigger
  • bun run typecheck clean
  • bun run lint clean

electronicBlacksmith and others added 5 commits April 8, 2026 03:41
Introduces phantom_loop - an in-process MCP tool that lets the agent spawn
iterative tasks where each tick is a fresh SDK session with state persisted
in a markdown file. Termination signals: agent self-declares via status: done
in the state file frontmatter, iteration or cost budget exhausted, optional
success_command returns exit 0, or operator interrupt via Slack button.

Runner is fully deterministic: budgets are enforced by TypeScript, the agent
only reasons about the task. Crash recovery is just re-scheduling a tick
against any loop still marked running; the state file is the source of truth.
Addresses two issues from code review:

1. parseFrontmatter now strips inline YAML comments (`status: done # yay`)
   and surrounding quotes (`status: "done"`). Without this, an agent
   writing natural-looking YAML would leave the loop status unparseable,
   silently burning through its iteration budget instead of terminating.

2. Loop.conversationId was stored but its only use was gating whether the
   start notice posted - tick and final updates posted regardless. It was
   meant to be the Slack thread target for loop status updates. Extend
   SlackChannel.postToChannel with an optional thread_ts and use it for
   the start notice, so status updates land in the caller's thread when
   provided.
- Guard workspace paths against traversal out of dataDir in start()
- Document success_command env vars and timeout in MCP tool description
- Narrow RunnerDeps.runtime to Pick<AgentRuntime,"handleMessage">, drop
  `as never` casts from tests
- Harden SDK internals access in tool.test.ts via exported LOOP_TOOL_NAME
  constant with SDK version pinned in a comment
- Add end-to-end test exercising autoSchedule:true setImmediate path
- Collapse finalize() UPDATE+SELECT roundtrip: LoopStore.finalize now
  returns the updated Loop directly
Closes #5.

AsyncLocalStorage context injection, reaction ladder, progress bar,
state.md summary on completion. Stop button persists across tick edits.
Tick/finalize race eliminated. LoopNotifier extracted from runner.ts.

Verified end-to-end in Slack + non-Slack trigger. 945 tests passing.
* feat(loop): integrate evolution, memory, and mid-loop critique into loop ticks

Loop ticks now use Phantom's full intelligence stack instead of running blind:

Phase 1 - Memory context injection: cached once at loop start from the goal,
injected into every tick prompt via TickPromptOptions. Cleared on finalize,
rebuilt on resume.

Phase 2 - Post-loop evolution and consolidation: bounded transcript
accumulation (first tick + rolling 10 summaries + last tick), SessionData
synthesis in finalize(), fire-and-forget evolution pipeline and LLM/heuristic
memory consolidation with cost-cap guards matching the interactive path.

Phase 3 - Mid-loop critique checkpoints: optional checkpoint_interval param
lets the agent request Sonnet 4.6 review every N ticks. Guard requires
evolution enabled, LLM judges active, and cost cap not exceeded. Critique
is awaited before next tick to avoid race conditions.

Closes #8

* fix(loop): address code review findings from PR #9

- Decouple postLoopDeps so evolution and memory run independently
  (evolution works when memory is down and vice versa)
- Skip mid-loop critique on terminal ticks to avoid wasted Sonnet calls
- Track judge cost on failure paths via JudgeParseError carrying usage data
- Extract recordTranscript/clamp from runner.ts to post-loop.ts (292 < 300 lines)

* fix(evolution): support OAuth tokens for LLM judge auth

resolveJudgeMode() and judge client now check ANTHROPIC_AUTH_TOKEN and
CLAUDE_CODE_OAUTH_TOKEN in addition to ANTHROPIC_API_KEY. Enables LLM
judges on Max subscription deployments using OAuth bearer tokens.

* docs: add phantom_loop documentation for upstream PR

Covers MCP tool parameters, state file contract, tick lifecycle,
Slack integration, mid-loop critique, post-loop evolution pipeline,
memory context injection, and tips for writing effective goals.

Closes #12

* fix(test): stabilize trigger-auth and judge-activation tests for CI

trigger-auth: use inline Bun.serve instead of startServer to avoid
module-level globals and disk I/O that can race across test files.

judge-activation: save/restore ANTHROPIC_AUTH_TOKEN and
CLAUDE_CODE_OAUTH_TOKEN alongside ANTHROPIC_API_KEY so tests that
expect "no credentials" actually clear all auth env vars.

---------

Co-authored-by: electronicBlacksmith <electronicBlacksmith@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant