Skip to content

feat(runtime): replan oversized tasks after repeated recovery failures#131

Open
jafreck wants to merge 3 commits intomainfrom
cadre/issue-104
Open

feat(runtime): replan oversized tasks after repeated recovery failures#131
jafreck wants to merge 3 commits intomainfrom
cadre/issue-104

Conversation

@jafreck
Copy link
Owner

@jafreck jafreck commented Mar 6, 2026

Summary

Adds an optional replanning stage that triggers after repeated recovery failures, so AAMF can split or re-scope an oversized task instead of repeatedly remediating the same broad unit.

Closes #104

Changes

Config (schema.ts)

  • Add options.replanning config block: enabled, triggerAttempts, maxSubtasks, minIssueOverlapForTrigger

Types (agents/types.ts)

  • Add parentTaskId field to MigrationTask for tracking sub-task lineage

Orchestrator (core/orchestrator.ts)

  • Detect non-convergence: track unresolved issues across parity retry attempts and compute issue-set overlap
  • Trigger replanning when recovery attempts >= threshold AND overlap >= configured minimum
  • replanTask(): invoke task-decomposer agent to split failing task → generate deterministic sub-task IDs (task-001a, task-001b, ...)
  • Inject sub-tasks into Phase 4 queue with dependency-safe rewiring
  • Depth-1 guard: skip replanning for sub-tasks (prevent recursive replan loops)

Checkpoint (core/checkpoint.ts)

  • recordReplanningEvent(): persist replanning events for resume safety
  • blockTask(): mark parent tasks as blocked after replanning
  • Expose replanningEvents in checkpoint state

Task Queue (execution/task-queue.ts)

  • replaceWithSubtasks(): mark parent as blocked + inject sub-tasks into active task map

Schema (task-decomposer.tasks.schema.json)

  • Relax ID patterns to ^task-[0-9]+[a-z]?$ to support sub-task IDs

Resume support

  • Replay prior replanning events from checkpoint on resume
  • Skip already-completed sub-tasks from prior replanning runs

Observability

  • Emit task-replanned and subtasks-injected events
  • Record replanning events in progress file
  • Log overlap % and issue counts at trigger time

Tests

  • 20 files changed, 1898 insertions, 3014 deletions (net reduction from test consolidation)
  • New: task-decomposer-schema.test.ts (7 cases for ID pattern validation)
  • New: orchestrator replanning tests (trigger detection, sub-task generation, maxSubtasks enforcement, depth guard, checkpoint resume)
  • New: config schema tests for replanning options
  • New: checkpoint tests for replanning event recording
  • All 1086 tests pass, 185 skipped

#104)

Add an optional replanning stage that triggers after repeated recovery
failures, splitting oversized tasks into smaller sub-tasks instead of
repeatedly remediating the same broad unit.

Config: Add options.replanning block (enabled, triggerAttempts,
maxSubtasks, minIssueOverlapForTrigger).

Orchestrator: Detect non-convergence via issue-set overlap across
parity retry attempts. Trigger replanning when threshold met.
Invoke task-decomposer to split failing task, generate deterministic
sub-task IDs (task-001a, task-001b, ...), inject into Phase 4 queue
with dependency-safe rewiring. Depth-1 guard prevents recursive
replanning of sub-tasks.

Checkpoint: Record replanning events and blocked tasks for resume
safety. Replay prior replanning on resume, skip completed sub-tasks.

Schema: Relax ID patterns to ^task-[0-9]+[a-z]?$ for sub-task IDs.

TaskQueue: Add replaceWithSubtasks() for queue injection.

Tests: New task-decomposer-schema tests, replanning trigger detection,
sub-task generation, maxSubtasks enforcement, depth guard, checkpoint
resume, and observability event tests.
jafreck added 2 commits March 6, 2026 13:12
Forward parityIssues, lineRange, parentTaskId, sourceFiles, and
targetFiles into the task-decomposer agentPayload during Phase 4
replanning so the agent has the failing task's scope and unresolved
issues.

Change replanning to mark the parent task as completed (not blocked)
since its work is now represented by the generated sub-tasks.
Update TaskQueue.replaceWithSubtasks(), checkpoint calls, and resume
logic accordingly. Remove the now-unnecessary replannedParentIds
skip in the resume path.

Add context-builder tests for replanning payload forwarding and
update orchestrator replanning tests to assert completedTasks.
…#104)

Replace the incorrect completed/blocked marking of replanned parents
with a dedicated "replanned" state that properly handles dependencies
and interruption recovery.

TaskQueue: Add replanned set and subtaskMap. getReady() skips
replanned tasks. isComplete() treats them as terminal.
isDepSatisfied() resolves a replanned dependency only when ALL of
its replacement sub-tasks are completed or blocked, so downstream
tasks correctly wait for sub-task completion.

Checkpoint: Add replannedTasks[] field with replanTask() method that
moves the parent out of failedTasks/blockedTasks into replannedTasks.
Backward-compat default for old checkpoints without the field.

Orchestrator: replanTask() calls checkpoint.replanTask() instead of
completeTask(). Resume path replays replanning events, then applies
replannedTasks from checkpoint. Result filtering excludes replanned
parents from failed/blocked lists.

Progress: Track replanned status in progress writer for observability.

Interruption safety: The checkpoint write order is:
1. recordReplanningEvent() — persists parent→subtask mapping
2. replanTask() — marks parent as replanned
On resume, replanning events are replayed first (restoring sub-tasks
into the queue), then replannedTasks are applied. If interrupted
between steps 1 and 2, the parent is still in the original plan and
the replanning event triggers replay; if before step 1, nothing
happened and the parent retries from scratch.

Tests: Add 6 checkpoint tests for replannedTasks lifecycle. Update
orchestrator replanning tests to use replanTask().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(runtime): replan oversized tasks after repeated recovery failures

1 participant