Skip to content

feat(session): task checkpointing — blocked agents hand off to next session#351

Open
krrish-berri-2 wants to merge 1 commit into
mainfrom
worktree-session-checkpointing
Open

feat(session): task checkpointing — blocked agents hand off to next session#351
krrish-berri-2 wants to merge 1 commit into
mainfrom
worktree-session-checkpointing

Conversation

@krrish-berri-2
Copy link
Copy Markdown
Contributor

Summary

  • New MCP server session-task-mcp.mjs exposes 3 tools: save_task_progress, list_blocked_tasks, get_blocked_task
  • Platform routes: POST/GET /sessions/{id}/task_checkpoint and GET /sessions/{id}/blocked_tasks
  • DB migration: adds task_checkpoint JSONB column to managed_agent_session
  • gen-mcp-config.mjs: wires lap-session-task MCP into opencode.json at harness boot
  • E2E test: tests/agent-task-checkpoint.spec.ts checks implicit tool use behavior

How it works

Agent calls save_task_progress({summary, status, blocked_reason}) at each milestone and before giving up. Status blocked means another session should pick it up.

On next session start, agent calls list_blocked_tasks (per system prompt instruction), sees prior blocked work, calls get_blocked_task for full context, and resumes instead of picking a new ticket.

Verified locally

  • All 3 tools appear as lap-session-task_* in agent toolset ✓
  • list_blocked_tasks returns blocked checkpoints from prior sessions ✓
  • save_task_progress writes to DB and GET task_checkpoint reads it back ✓
  • Agent calls list_blocked_tasks unprompted when asked to pick a GitHub issue ✓

Test plan

  • Deploy inline harness image (new session-task-mcp.mjs + updated gen-mcp-config.mjs must be in the Docker image)
  • Run npx prisma migrate deploy on prod DB (adds task_checkpoint column)
  • Run tests/agent-task-checkpoint.spec.ts against staging
  • Manual: ask Shin to pick a ticket, confirm it calls list_blocked_tasks first
  • Manual: let a session get blocked (sandbox failure), verify next session sees it and resumes

🤖 Generated with Claude Code

…ext session

Adds three MCP tools (save_task_progress, list_blocked_tasks, get_blocked_task)
and backing platform routes so agents can persist their work state before giving
up, and successor sessions can resume blocked tasks instead of starting from scratch.

- harnesses/opencode/session-task-mcp.mjs: new standalone stdio MCP server
  exposing the three tools; follows same env/retry/proxy pattern as report-issue-mcp
- harnesses/opencode/gen-mcp-config.mjs: wire lap-session-task into opencode.json
- prisma/schema.prisma + migration: add task_checkpoint JSONB to managed_agent_session
- POST /sessions/{id}/task_checkpoint: agent writes {summary, status, blocked_reason}
- GET  /sessions/{id}/task_checkpoint: read checkpoint for any session (same agent)
- GET  /sessions/{id}/blocked_tasks: list other sessions with status=blocked
- tests/agent-task-checkpoint.spec.ts: e2e behavioral tests

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 27, 2026

Greptile Summary

This PR introduces a task-checkpointing system for managed agents: a new stdio MCP server (session-task-mcp.mjs) exposes three tools for saving and retrieving task progress, two new API routes persist and read checkpoint state from a new task_checkpoint JSONB column, and an E2E test validates the behavior end-to-end.

  • MCP server + routes: save_task_progress, list_blocked_tasks, and get_blocked_task are wired to POST/GET /sessions/{id}/task_checkpoint and GET /sessions/{id}/blocked_tasks. Auth correctly scopes reads and writes to the owning agent.
  • DB migration: Additive JSONB column on managed_agent_session, safe to deploy and roll back.
  • Core gap: When a new session picks up a blocked task, the old session's task_checkpoint.status is never updated from \"blocked\"save_task_progress always writes to the caller's own session_id. Every future session will continue seeing the old blocked entry in list_blocked_tasks, leading to repeated duplicate pickup attempts with no way to suppress them short of a manual DB update.

Confidence Score: 3/5

The blocked-task handoff mechanism has a persistent-state bug that will cause every subsequent session to re-attempt already-claimed work, and the E2E test will fail immediately on any environment without pre-provisioning a specific agent.

Once a new session picks up a blocked task, the prior session's checkpoint remains in state 'blocked' indefinitely. Every future session for that agent will discover the same stale entry, attempt to resume it, and produce duplicate or conflicting work. There is no MCP tool path that allows the new session to update the old session's record. The E2E test also relies on a hardcoded agent UUID that will not exist outside the original dev environment, so the test plan cannot be executed without additional setup steps that are not yet documented.

src/app/api/v1/managed_agents/sessions/[session_id]/blocked_tasks/route.ts (stale-blocked logic) and tests/agent-task-checkpoint.spec.ts (hardcoded agent UUID) need attention before this is ready to ship.

Important Files Changed

Filename Overview
src/app/api/v1/managed_agents/sessions/[session_id]/blocked_tasks/route.ts New route returning blocked checkpoints for the same agent; has a logic gap where picked-up sessions are never cleared from the list, and ordering by created_at is less useful than ordering by last_seen_at
src/app/api/v1/managed_agents/sessions/[session_id]/task_checkpoint/route.ts New POST/GET routes for writing and reading task checkpoints; auth is correct; Zod validation is missing a refine constraint to require blocked_reason when status=blocked
harnesses/opencode/session-task-mcp.mjs New stdio MCP server exposing three task-checkpoint tools; retry logic and token refresh are solid; save_task_progress always writes to the caller's own session, leaving no mechanism to clear a prior session's blocked status
harnesses/opencode/gen-mcp-config.mjs Adds lap-session-task MCP entry guarded by the same issueBase+issueAccess check as the existing lap-issue-reporter; no issues
prisma/migrations/20260526000001_add_task_checkpoint/migration.sql Adds nullable JSONB column task_checkpoint to managed_agent_session; additive-only migration, safe to deploy and roll back
prisma/schema.prisma Adds task_checkpoint Json? field to Session model with an inline comment; matches migration
tests/agent-task-checkpoint.spec.ts E2E behavioral tests for checkpoint tools; AGENT_ID falls back to a hardcoded UUID that won't exist in any environment except the original dev setup, causing immediate failures on staging or CI

Reviews (1): Last reviewed commit: "feat(session): add task checkpointing so..." | Re-trigger Greptile

Comment on lines +46 to +64
const rows = await prisma.session.findMany({
where: {
agent_id: sessionRow.agent_id,
session_id: { not: session_id },
task_checkpoint: {
path: ["status"],
equals: "blocked",
},
},
select: {
session_id: true,
task_checkpoint: true,
// Fall back to session updated_at for ordering when checkpoint updated_at is unavailable.
last_seen_at: true,
created_at: true,
},
orderBy: { created_at: "desc" },
take: 10,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Blocked tasks stay visible after pickup — persistent duplicate resumption

When a new session (S2) picks up a blocked task from S1, it calls save_task_progress on its own session_id (S2). S1's task_checkpoint.status remains "blocked" forever because neither the MCP tool nor any API operation updates S1's record. Every subsequent session will also see S1 in list_blocked_tasks and attempt to resume the same work, with no way to know S2 is already handling it.

The MCP tool callSaveTaskProgress always writes to process.env.SESSION_ID || input.session_id (the caller's current session), so there is no code path today that clears the prior session's blocked status after pickup.

Comment on lines +37 to +41
const CheckpointBody = z.object({
summary: z.string().min(1),
status: z.enum(["in_progress", "blocked", "complete"]),
blocked_reason: z.string().optional(),
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 blocked_reason is not enforced when status is "blocked"

The Zod schema marks blocked_reason as optional unconditionally. When an agent sets status="blocked" without providing a reason, the next agent receiving this checkpoint from list_blocked_tasks has no context to determine whether the original blocker is still present. The tool description says "Required when status=blocked" but the server doesn't enforce it, so agents can silently omit it.

Suggested change
const CheckpointBody = z.object({
summary: z.string().min(1),
status: z.enum(["in_progress", "blocked", "complete"]),
blocked_reason: z.string().optional(),
});
const CheckpointBody = z.object({
summary: z.string().min(1),
status: z.enum(["in_progress", "blocked", "complete"]),
blocked_reason: z.string().optional(),
}).refine(
(data) => data.status !== "blocked" || (typeof data.blocked_reason === "string" && data.blocked_reason.length > 0),
{ message: "blocked_reason is required when status is 'blocked'", path: ["blocked_reason"] },
);

last_seen_at: true,
created_at: true,
},
orderBy: { created_at: "desc" },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Ordering by created_at instead of checkpoint's updated_at may surface stale tasks first

Sessions are sorted by when they were created, not by when they were blocked. A task blocked yesterday in a newer session would appear after an older session from last week. Ordering by last_seen_at (already selected) is a better approximation than created_at.

Suggested change
orderBy: { created_at: "desc" },
orderBy: { last_seen_at: "desc" },

Comment on lines +18 to +20
process.env.CHECKPOINT_TEST_AGENT_ID ?? "9cbb91a6-e66d-43c5-92ed-68a570429527";

const TURN_TIMEOUT_MS = 90_000;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Hardcoded agent UUID will fail in every non-dev environment

AGENT_ID falls back to a hardcoded UUID when CHECKPOINT_TEST_AGENT_ID is not set. spawnAndWait calls POST /agents/{AGENT_ID}/session, which will return a 404 on any staging or CI environment that doesn't have this specific agent. The test plan mentions running against staging, but the test will immediately fail there unless CHECKPOINT_TEST_AGENT_ID is explicitly configured — and there is no documentation of this requirement in the test file or the test plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant