From c11fbd9abede76b8bcf59310a6aa6d8ce8ff617e Mon Sep 17 00:00:00 2001 From: heggria Date: Thu, 25 Jun 2026 19:30:16 +0800 Subject: [PATCH 1/5] feat(cache): per-phase structural sub-fingerprint (v3:phasefp) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace the whole-flow v2:flowdef cache-key tier with a per-phase structural sub-fingerprint so editing phase B invalidates only B and its transitive dependents — independent sibling phase A keeps its cache hit. phaseFingerprint(def, phaseId) (extensions/flowir/phasefp.ts) hashes the phase plus its transitive dependsOn ∪ from closure, reusing the vendored canonicalJson + hashCanonical (byte-identical to overstory's contract). Only the policy field cache is stripped; every other Phase field is hashed. Soundness fallback: phaseFingerprint returns undefined (→ caller folds the whole-flow flowDefHash, preserving pre-M6 behavior) when per-phase invalidation cannot be statically guaranteed — contextSharing at flow level, any shareContext phase in the closure, or any flow phase in the closure. Sub-flow inner phases always use this fallback. cacheKeys now produces a 4-tier ladder: key (v3:phasefp, write) → v2Key (v2:flowdef, read-only) → bareKey (bare flowdef, read-only) → legacyKey (no flowdef, read-only). cachedPhase consults all four read-only on a miss; recordCache writes only key. This makes the M6 upgrade additive — no miss-storm for unchanged flows. phaseFingerprints computed once per run in runTaskflowLayers alongside flowDefHash, plumbed through RunState + PhaseCacheCtx. Fail-open: any per-phase error degrades that phase to the whole-flow hash. Tests: test/cache-phasefp.test.ts (11 tests — soundness gate, determinism, precise-diff win, transitive propagation, v2 fallback, cross-flow isolation, shareContext fallback). Updated cache-migration.test.ts (distinct 4-tier keys; structural-change test now scoped to p's closure) and runtime.test.ts resume tests to the v3 key shape. --- docs/internal/cache-migration.md | 93 ++++++++--- extensions/flowir/index.ts | 2 + extensions/flowir/phasefp.ts | 103 ++++++++++++ extensions/runtime.ts | 220 ++++++++++++++++++++++--- extensions/schema.ts | 31 ++++ extensions/store.ts | 16 +- test/cache-migration.test.ts | 36 ++-- test/cache-phasefp.test.ts | 274 +++++++++++++++++++++++++++++++ test/runtime.test.ts | 19 ++- 9 files changed, 719 insertions(+), 75 deletions(-) create mode 100644 extensions/flowir/phasefp.ts create mode 100644 test/cache-phasefp.test.ts diff --git a/docs/internal/cache-migration.md b/docs/internal/cache-migration.md index 2b6bf84..ee203fd 100644 --- a/docs/internal/cache-migration.md +++ b/docs/internal/cache-migration.md @@ -12,25 +12,55 @@ Before H1, the cache key folded the flow **definition** fingerprint under a bare H1 versions the key with a `v2:` prefix and routes the fingerprint through the FlowIR compile seam (`compileTaskflowToIR` → `flowDefHash`). -To avoid a one-time miss-storm on upgrade, the runtime consults **three** keys -on every cross-run lookup, read-only for the legacy tiers. +M6 replaces the whole-flow `v2:flowdef:` tier with a **per-phase structural +sub-fingerprint** (`v3:phasefp:`): the hash of a single phase plus its +transitive dependency closure. Editing phase B now invalidates only B and its +transitive dependents — independent sibling phase A keeps its cache hit. -## Key shapes (H1) +To avoid a one-time miss-storm on upgrade, the runtime consults **four** keys +on every cross-run lookup, read-only for the fallback tiers. -`cacheKeys()` (`extensions/runtime.ts`) returns three keys for a phase: +## Key shapes (M6) + +`cacheKeys()` (`extensions/runtime.ts`) returns four keys for a phase: | Tier | Shape | Written by | Status | |------|-------|-----------|--------| -| `key` (current) | `flow:` + `v2:flowdef:` + `` + `think/tools/ctx` + fingerprint | H1+ | **read + write** | +| `key` (current) | `flow:` + `v3:phasefp:` + `` + `think/tools/ctx` + fingerprint | M6+ | **read + write** | +| `v2Key` | `flow:` + `v2:flowdef:` + … | H1..M5 | **read-only** | | `bareKey` | `flow:` + `flowdef:` (bare, unversioned) + … | pre-H1 | **read-only** (removed in v0.1.0) | | `legacyKey` | `flow:` + … (flowdef line omitted) | pre-flowDefHash era | **read-only** (removed in v0.1.0) | +### The per-phase sub-fingerprint (`v3:phasefp`) + +`phaseFingerprint(def, phaseId)` (`extensions/flowir/phasefp.ts`) hashes the +phase itself plus its transitive `dependsOn ∪ from` closure, reusing the vendored +`canonicalJson` + `hashCanonical` (byte-identical to overstory's contract). The +`cache` policy field is stripped (its sub-fields reach the key via other paths); +every other `Phase` field is hashed. + +**Soundness fallback.** Per-phase invalidation is only sound when a phase's real +dependencies are fully captured by the static closure. `phaseFingerprint` returns +`undefined` (→ the caller folds the whole-flow `flowDefHash` instead, preserving +pre-M6 behavior) when: + +- the flow has `contextSharing: true`, OR +- any phase in the closure (self included) has `shareContext: true`, OR +- any phase in the closure (self included) has `type: "flow"`. + +These are the cases where a phase can read sibling state outside its declared +deps (Shared Context Tree) or where sub-structure is resolved at runtime +(`flow`). Sub-flow inner phases always use this fallback (their `phaseFp` is +absent → `flowDefHash`), so editing one phase inside a sub-flow invalidates all +sub-flow phases — a known, safe conservatism. + ### Lookup order (`cachedPhase`) 1. within-run resume (`cc.prior.inputHash === keys.key`) — fastest, always allowed. -2. `store.get(keys.key)` — current v2 entry. -3. `store.get(keys.bareKey)` — pre-H1 bare entry. -4. `store.get(keys.legacyKey)` — pre-flowDefHash entry. +2. `store.get(keys.key)` — current v3 entry. +3. `store.get(keys.v2Key)` — pre-M6 v2 entry. +4. `store.get(keys.bareKey)` — pre-H1 bare entry. +5. `store.get(keys.legacyKey)` — pre-flowDefHash entry. A hit on **any** tier is restored as a `cacheHit: "cross-run"` result with zero usage. The restored `PhaseState.inputHash` is always `keys.key` (the current @@ -38,37 +68,48 @@ shape), so downstream phases and recompute see a consistent identity. ### Write policy (`recordCache`) -Only `keys.key` (the current v2 shape) is ever written. Legacy/bare hits are +Only `keys.key` (the current v3 shape) is ever written. v2/bare/legacy hits are **not** write-through: re-storing under the new key would double the cache size -for no benefit. Legacy/bare entries age out naturally via the 90-day hard cap +for no benefit. Legacy/bare/v2 entries age out naturally via the 90-day hard cap (`DEFAULT_MAX_AGE_MS`) and the LRU cap (`DEFAULT_MAX_ENTRIES`). -## Why three tiers? - -- **`v2:flowdef:` (current):** the versioned prefix lets a future genuine - overstory compiler advance to `v3:flowIR:` with its own fallback tier, - without disturbing v2 entries. -- **bare `flowdef:` (pre-H1):** pre-H1 code wrote this shape. Without the 3rd - tier, every existing cross-run entry would silently miss on upgrade — a - one-time miss-storm for opt-in cross-run users. +## Why four tiers? + +- **`v3:phasefp:` (current):** the per-phase structural sub-fingerprint enables + precise invalidation — editing one phase no longer evicts independent + siblings. The versioned prefix lets a future genuine overstory compiler + advance to `v4:flowIR:` with its own fallback tier, without disturbing v3. +- **`v2:flowdef:` (pre-M6):** M5-and-earlier code wrote this whole-flow shape. + Without this tier, every existing cross-run entry would silently miss on the + M6 upgrade — a one-time miss-storm for opt-in cross-run users. +- **bare `flowdef:` (pre-H1):** pre-H1 code wrote this shape. Retained for + completeness. - **no-flowdef (pre-flowDefHash):** the very earliest cross-run entries, before the flow definition was folded into the key at all. Retained for completeness; these are rare. +### Upgrade note (one-time cost) + +On the first post-M6 run, if a sibling phase was edited between the last +pre-M6 run and the upgrade, an *unchanged* independent phase may re-execute +once: its v2 entry was keyed on the old `flowDefHash`, which no longer matches. +This is bounded (per-flow, one-time, only when a sibling edit happened) and +amortized over subsequent runs as v3 entries take over. For unchanged flows the +v2 tier hits and no re-execution occurs. + ## Retirement -- **v0.1.0:** remove the `bareKey` and `legacyKey` tiers and the `CacheKeys` - return to a single `key`. By then all pre-H1 entries will have aged out (90-day - hard cap). The `v2:` prefix is retained as the version anchor for the *next* - migration. -- A pre-release verification step: inspect a real `.pi/taskflow/cache/` directory - for bare-`flowdef:` entries. If cross-run is confirmed unused in production - (opt-in, young), the bare tier can be dropped earlier. +- **v0.1.0:** remove the `bareKey` and `legacyKey` tiers. By then all pre-H1 + entries will have aged out (90-day hard cap). +- **Later:** remove the `v2Key` tier once all pre-M6 entries have aged out. +- The `v3:` prefix is retained as the version anchor for the *next* migration. ## See also - `extensions/flowir/hash.ts` — the vendored overstory hash algorithm. +- `extensions/flowir/phasefp.ts` — the per-phase structural sub-fingerprint. - `extensions/flowir/index.ts` — `compileTaskflowToIR` (the seam that produces - `hash` and `meta.declaredDeps`). + `hash` and `meta.declaredDeps`) and `phaseFingerprint`. - `docs/internal/overstory-convergence-roadmap.md` §3 (M1). - `test/cache-migration.test.ts` — the migration contract tests. +- `test/cache-phasefp.test.ts` — the per-phase sub-fingerprint contract tests. diff --git a/extensions/flowir/index.ts b/extensions/flowir/index.ts index f5f8962..e061559 100644 --- a/extensions/flowir/index.ts +++ b/extensions/flowir/index.ts @@ -71,3 +71,5 @@ export type { TaskflowIR, TaskflowIRMeta, } from "./meta.ts"; + +export { phaseFingerprint } from "./phasefp.ts"; diff --git a/extensions/flowir/phasefp.ts b/extensions/flowir/phasefp.ts new file mode 100644 index 0000000..a7f3c46 --- /dev/null +++ b/extensions/flowir/phasefp.ts @@ -0,0 +1,103 @@ +/** + * Per-phase structural sub-fingerprint (M6). + * + * `phaseFingerprint` produces a content-addressed hash of ONLY the subset of + * the flow definition that can affect a single phase's subagent output: the + * phase itself plus its transitive dependency closure. Folding this into the + * cross-run cache key (instead of the whole-flow `flowDefHash`) means editing + * phase B invalidates only B and its transitive dependents — independent + * sibling phase A keeps its cache hit. + * + * ## Soundness (the fallback gate) + * + * Per-phase invalidation is only sound when a phase's *real* dependencies are + * fully captured by the static `dependsOn ∪ from` closure. Three cases break + * that guarantee, so `phaseFingerprint` returns `undefined` for them and the + * caller falls back to the whole-flow `flowDefHash` (safe, = pre-M6 behavior): + * + * 1. **Shared Context Tree** (`def.contextSharing === true` or any closure + * member has `shareContext === true`): a sharing phase can read sibling + * blackboard writes OUTSIDE its declared deps, so the static closure + * under-approximates real reads. + * 2. **`flow` phase in the closure** (`type === "flow"`): a `flow` phase's + * sub-structure is resolved at runtime (inline `def`) or from a saved + * flow (`use`) and is not statically visible here. Editing the saved + * sub-flow would not move this phase's sub-fingerprint. + * + * `cache` (the policy object) is the ONLY field stripped from each phase + * before hashing: its sub-fields (`scope`/`ttl`/`fingerprint`) are folded into + * the cache key through other paths (`cc.scope` gates the lookup, `cc.ttlMs` + * governs expiry, `cc.fingerprint` is in the key tail). Every other `Phase` + * field is hashed. `PhaseSchema` uses `additionalProperties: false`, so no + * surprise field can be missed. + * + * Pure + async (Web Crypto via `hashCanonical`). Reuses the vendored + * `canonicalJson`/`hashCanonical` (byte-identical to overstory's contract) so + * the sub-fingerprint shares one hashing contract with `flowDefHash`. Never + * throws — callers wrap in try/catch and degrade to `flowDefHash`. + * + * @see docs/internal/cache-migration.md (v3:phasefp tier) + */ + +import { transitiveDependencies, type Phase, type Taskflow } from "../schema.ts"; +import { canonicalJson, hashCanonical } from "./hash.ts"; + +/** Policy field stripped before hashing (its sub-fields reach the key via + * `cc.scope` / `cc.ttlMs` / `cc.fingerprint` — folding them here would be + * recursive and redundant). This is the ONLY field stripped. */ +const PHASE_FP_STRIP = ["cache"] as const; + +/** Clone a phase into a plain record with policy fields removed. */ +function stripPolicy(phase: Phase): Record { + const rec = phase as unknown as Record; + const out: Record = {}; + for (const k of Object.keys(rec)) { + if ((PHASE_FP_STRIP as readonly string[]).includes(k)) continue; + out[k] = rec[k]; + } + return out; +} + +/** + * Per-phase structural sub-fingerprint. + * + * @returns the hex hash, or `undefined` when per-phase soundness cannot be + * guaranteed (caller falls back to the whole-flow `flowDefHash`). Never + * throws. + */ +export async function phaseFingerprint(def: Taskflow, phaseId: string): Promise { + const phases = def.phases as Phase[]; + const byId = new Map(phases.map((p) => [p.id, p])); + const phase = byId.get(phaseId); + if (!phase) return undefined; + + // --- Soundness gate: fall back to whole-flow when static closure is unsafe. --- + // Flow-wide context sharing enables cross-sibling reads outside declared deps. + if (def.contextSharing === true) return undefined; + + const closureIds = transitiveDependencies(phases, phaseId); + const closurePhases: Phase[] = []; + for (const id of closureIds) { + const p = byId.get(id); + if (!p) continue; // unknown dep — validation reports elsewhere + // Per-phase sharing: this closure member can read sibling blackboard + // writes outside its own declared deps. + if (p.shareContext === true) return undefined; + // A flow phase's sub-structure is runtime/saved-flow-resolved and not + // statically visible — editing it would not move the sub-fingerprint. + if ((p.type ?? "agent") === "flow") return undefined; + closurePhases.push(p); + } + // The self phase's own sharing/type is part of the closure too. + if (phase.shareContext === true) return undefined; + if ((phase.type ?? "agent") === "flow") return undefined; + + // --- Build the canonical payload. --- + // `deps` is the SORTED transitive closure (self excluded). canonicalJson + // sorts OBJECT keys but preserves ARRAY order, so we sort the array + // explicitly for determinism independent of dependency walk order. + const depsPayload = closurePhases.map((p) => ({ id: p.id, def: stripPolicy(p) })); + const payload = { self: stripPolicy(phase), deps: depsPayload }; + + return hashCanonical(canonicalJson(payload)); +} diff --git a/extensions/runtime.ts b/extensions/runtime.ts index 351b346..49c3eae 100644 --- a/extensions/runtime.ts +++ b/extensions/runtime.ts @@ -20,7 +20,7 @@ import { type Budget, type CacheScope, dependenciesOf, finalPhase, LOOP_DEFAULT_ import { verifyTaskflow } from "./verify.ts"; import { hashInput, newRunId, type PhaseState, type RunState, runsDir } from "./store.ts"; import { CacheStore, resolveFingerprint } from "./cache.ts"; -import { compileTaskflowToIR } from "./flowir/index.ts"; +import { compileTaskflowToIR, phaseFingerprint } from "./flowir/index.ts"; import { computeStaleFrontier, declaredReadMapOfDef, readMapOf } from "./stale.ts"; import { ctxDirFor, drainPendingSpawns, initCtxDir, registerNode, setNodeStatus, type SpawnAssignment } from "./context-store.ts"; import { allocateWorkspace, isWorkspaceKeyword, type Workspace } from "./workspace.ts"; @@ -72,6 +72,55 @@ export interface RuntimeResult { finalOutput: string; ok: boolean; totalUsage: UsageStats; + /** Incremental-reuse summary: how many phases were reused from cache vs. + * freshly executed this run, and the cost the reused work would otherwise + * have incurred (known only for within-run resume; cross-run hits zero + * their usage so their original cost is not recoverable). Optional & + * additive — callers that ignore it are unaffected. */ + reuse?: ReuseSummary; +} + +/** A run's incremental-reuse accounting (see RuntimeResult.reuse). */ +export interface ReuseSummary { + /** Phases that completed by executing a subagent this run. */ + executed: number; + /** Phases served from the within-run resume cache (no new tokens). */ + reusedRunOnly: number; + /** Phases restored from the cross-run store (no new tokens). */ + reusedCrossRun: number; + /** Total phases that reached `done` (executed + reused). */ + done: number; + /** USD the within-run-reused phases would have cost if re-executed (their + * preserved prior usage). Cross-run hits are excluded (cost not recoverable). */ + savedUSD: number; +} + +/** Compute the incremental-reuse summary from a run's terminal phase states. + * Pure, total, never throws. A phase is "reused" iff it carries a `cacheHit` + * marker (set by `cachedPhase` for both within-run resume and cross-run hits). */ +export function summarizeReuse(state: RunState): ReuseSummary { + let executed = 0; + let reusedRunOnly = 0; + let reusedCrossRun = 0; + let savedUSD = 0; + for (const ps of Object.values(state.phases)) { + if (ps.status !== "done") continue; + if (ps.cacheHit === "run-only") { + reusedRunOnly++; + savedUSD += ps.usage?.cost ?? 0; // within-run resume preserves prior usage + } else if (ps.cacheHit === "cross-run") { + reusedCrossRun++; // cross-run hits zero their usage — cost not recoverable + } else { + executed++; + } + } + return { + executed, + reusedRunOnly, + reusedCrossRun, + done: executed + reusedRunOnly + reusedCrossRun, + savedUSD, + }; } function buildInterpolationContext( @@ -721,6 +770,7 @@ async function executePhaseInner( flowName: state.flowName, runId: state.runId, flowDefHash: state.flowDefHash === "failed" ? undefined : state.flowDefHash, + phaseFp: state.phaseFingerprints?.[phase.id], forceRerun: opts?.forceRerun, thinking: phase.thinking, tools: phase.tools, @@ -1635,6 +1685,12 @@ export interface PhaseCacheCtx { * key so two structurally-different flows that share a name can never * collide, and a changed flow never serves a stale cross-run hit. */ flowDefHash?: string | "failed"; + /** Per-phase structural sub-fingerprint (M6). When present, folds into the + * key as `v3:phasefp:` so editing phase B invalidates only B + its + * transitive dependents. When absent (sub-flow inner states, or a phase + * for which per-phase soundness couldn't be guaranteed), `cacheKeys` + * falls back to `flowDefHash` — preserving pre-M6 whole-flow behavior. */ + phaseFp?: string; /** Force this phase to re-execute, ignoring the within-run prior AND the * cross-run store (M5 recompute seed). Downstream phases are NOT forced — * they re-evaluate naturally: if the seed's new output changed their @@ -1646,27 +1702,34 @@ export interface PhaseCacheCtx { /** A computed cache identity: the new (versioned) key plus the read-only * fallback keys used to honor entries written by older releases. The `key` * is what we WRITE under and what `PhaseState.inputHash` carries; the - * `legacyKey`/`bareKey` are consulted READ-ONLY on a miss so an upgrade - * never produces a miss-storm. See docs/internal/cache-migration.md. */ + * `v2Key`/`bareKey`/`legacyKey` are consulted READ-ONLY on a miss so an + * upgrade never produces a miss-storm. See docs/internal/cache-migration.md. */ export interface CacheKeys { - /** Current key: folds `v2:flowdef:` (the overstory content fingerprint). */ + /** Current key: folds `v3:phasefp:` (the per-phase structural + * sub-fingerprint; degrades to the whole-flow hash when per-phase + * soundness couldn't be guaranteed). */ key: string; - /** Pre-flowDefHash-era key: the flowdef line OMITTED entirely. Read-only. */ - legacyKey: string; + /** Pre-M6 key: `v2:flowdef:` (whole-flow fingerprint). + * Read-only. */ + v2Key: string; /** Bare (unversioned) `flowdef:` key — written by pre-H1 code that folded * the hash without a `v2:` prefix. Read-only. Removed in v0.1.0. */ bareKey: string; + /** Pre-flowDefHash-era key: the flowdef line OMITTED entirely. Read-only. */ + legacyKey: string; } /** Fold the phase fingerprint into the base hash parts to form the cache keys. * - * Three keys are produced for backward compatibility (see + * Four keys are produced for backward compatibility (see * docs/internal/cache-migration.md): - * - `key` : `v2:flowdef:` — the current write key. + * - `key` : `v3:phasefp:` — the current write key (per-phase + * structural sub-fingerprint; falls back to the whole-flow hash when + * `cc.phaseFp` is absent). + * - `v2Key` : `v2:flowdef:` — pre-M6 whole-flow key. + * - `bareKey` : bare `flowdef:` (unversioned) — pre-H1 entries. * - `legacyKey`: the flowdef line omitted — pre-flowDefHash entries. - * - `bareKey` : bare `flowdef:` (unversioned) — pre-H1 entries that - * folded the hash without the `v2:` prefix. - * `cachedPhase` consults all three READ-ONLY on a miss; `recordCache` writes + * `cachedPhase` consults all four READ-ONLY on a miss; `recordCache` writes * only `key`. This means an upgrade never produces a miss-storm: existing * entries (whichever shape) still hit, and new writes converge on `key`. */ export function cacheKeys(cc: PhaseCacheCtx, baseParts: string[]): CacheKeys { @@ -1682,10 +1745,15 @@ export function cacheKeys(cc: PhaseCacheCtx, baseParts: string[]): CacheKeys { ]; const fold = (parts: string[]): string => cc.fingerprint ? hashInput(...parts, cc.fingerprint) : hashInput(...parts); + // Per-phase sub-fingerprint; falls back to the whole-flow hash when absent + // (sub-flow inner states, or soundness fallback) — preserving pre-M6 behavior. + const fp = cc.phaseFp ?? cc.flowDefHash ?? ""; + const fdh = cc.flowDefHash ?? ""; return { - key: fold([`flow:${cc.flowName}`, `v2:flowdef:${cc.flowDefHash ?? ""}`, ...tail]), + key: fold([`flow:${cc.flowName}`, `v3:phasefp:${fp}`, ...tail]), + v2Key: fold([`flow:${cc.flowName}`, `v2:flowdef:${fdh}`, ...tail]), + bareKey: fold([`flow:${cc.flowName}`, `flowdef:${fdh}`, ...tail]), legacyKey: fold([`flow:${cc.flowName}`, ...tail]), - bareKey: fold([`flow:${cc.flowName}`, `flowdef:${cc.flowDefHash ?? ""}`, ...tail]), }; } @@ -1696,9 +1764,10 @@ export function cacheKeys(cc: PhaseCacheCtx, baseParts: string[]): CacheKeys { * - "cross-run": within-run first, then the persistent cross-run store. * On a cross-run hit, usage is zeroed and `cacheHit` records the source. * - * The cross-run read is THREE-TIER and READ-ONLY for fallback keys: it tries - * `keys.key` (current `v2:flowdef:` shape) first, then `keys.bareKey` (pre-H1 - * bare `flowdef:`), then `keys.legacyKey` (pre-flowDefHash, no flowdef line). + * The cross-run read is FOUR-TIER and READ-ONLY for fallback keys: it tries + * `keys.key` (current `v3:phasefp:` shape) first, then `keys.v2Key` (pre-M6 + * `v2:flowdef:`), then `keys.bareKey` (pre-H1 bare `flowdef:`), then + * `keys.legacyKey` (pre-flowDefHash, no flowdef line). * A hit on ANY tier is restored as a cache hit; we do NOT write-through (no * re-store under the new key) so the cache size stays stable and the legacy * entry ages out naturally. See docs/internal/cache-migration.md. @@ -1707,14 +1776,17 @@ function cachedPhase(cc: PhaseCacheCtx, keys: CacheKeys): PhaseState | null { if (cc.scope === "off") return null; if (cc.forceRerun) return null; - // 1. within-run resume (fastest; always allowed unless scope is off) + // 1. within-run resume (fastest; always allowed unless scope is off). Flag + // it as a `run-only` cache hit so the run summary can count it as reused + // work (it spent no new tokens). The prior usage is preserved verbatim so + // the summary can report what the reuse would otherwise have cost. if (cc.prior && cc.prior.status === "done" && cc.prior.inputHash === keys.key) { - return { ...cc.prior, status: "done" }; + return { ...cc.prior, status: "done", cacheHit: "run-only" }; } - // 2. cross-run memoization (opt-in) — three-tier read-only fallback. + // 2. cross-run memoization (opt-in) — four-tier read-only fallback. if (cc.scope === "cross-run") { - for (const k of [keys.key, keys.bareKey, keys.legacyKey]) { + for (const k of [keys.key, keys.v2Key, keys.bareKey, keys.legacyKey]) { const e = cc.store.get(k, cc.ttlMs); if (!e) continue; // If we stored the full PhaseState, restore it (preserving gate, @@ -1895,6 +1967,22 @@ export interface RecomputeReport { /** Phases in the frontier whose inputHash did NOT move → cached result * reused, no re-execution (early cutoff). Empty in dry-run (unknowable). */ readonly cutoff: readonly string[]; + /** Per-phase decision trace: WHY each phase was rerun / cut off / reused. + * The "explainable reactivity" layer — like React DevTools telling you why + * a component re-rendered. Additive; callers that ignore it are unaffected. */ + readonly decisions: readonly RecomputeDecision[]; +} + +/** Why a single phase landed in its recompute outcome. */ +export interface RecomputeDecision { + readonly phaseId: string; + /** What happened (real run) or would happen (dry-run). */ + readonly outcome: "rerun" | "cutoff" | "reused" | "failed"; + /** Human-readable cause. */ + readonly reason: string; + /** The upstream phase(s) that caused this outcome, when applicable + * (e.g. the changed upstreams that forced a rerun). */ + readonly causedBy?: readonly string[]; } /** Scan a flow for dependencies that cannot be observed through the readSet. @@ -1946,6 +2034,30 @@ export async function recomputeTaskflow( const allIds = Object.keys(newState.phases); if (opts.dryRun) { + // Explain each phase WITHOUT executing: a frontier phase "may rerun" + // because it (transitively) reads a changed seed; everything else is + // reused as unreachable. We name the in-frontier upstream(s) as the cause. + const seedSet0 = new Set(seeds); + const upstreamsOf = (id: string): string[] => { + const observed = (newState.phases[id]?.reads ?? []).map((r) => r.stepId).filter((u) => u !== id); + const decl = (declared.get(id) ?? []).filter((u) => u !== id); + return [...new Set([...observed, ...decl])]; + }; + const decisions: RecomputeDecision[] = allIds.map((id) => { + if (!frontier.has(id)) { + return { phaseId: id, outcome: "reused", reason: "not reachable from any changed seed" }; + } + if (seedSet0.has(id)) { + return { phaseId: id, outcome: "rerun", reason: "forced by recompute request (seed)" }; + } + const causes = upstreamsOf(id).filter((u) => frontier.has(u)); + return { + phaseId: id, + outcome: "rerun", + reason: "reads a phase in the stale frontier; may re-run if that upstream's output moves", + causedBy: causes.length ? causes : undefined, + }; + }); return { report: { dryRun: true, @@ -1954,6 +2066,7 @@ export async function recomputeTaskflow( rerun: [...frontier], reused: allIds.filter((id) => !frontier.has(id)), cutoff: [], + decisions, }, state: newState, }; @@ -2003,6 +2116,11 @@ export async function recomputeTaskflow( .filter((id) => frontier.has(id)); const rerun: string[] = []; const cutoff: string[] = []; + const decisions: RecomputeDecision[] = []; + // Phases whose OUTPUT actually moved this recompute (seed forced, or result + // changed). Used to attribute a downstream rerun to the specific upstream(s) + // that changed — the "why" of the decision trace. + const outputMoved = new Set(); const noop = () => {}; let aborted = false; for (const id of order) { @@ -2015,17 +2133,50 @@ export async function recomputeTaskflow( const phase = newState.def.phases.find((p) => p.id === id); if (!phase) continue; const before = newState.phases[id]?.inputHash; - const execOpts = seedSet.has(id) ? { forceRerun: true } : undefined; + const isSeed = seedSet.has(id); + const execOpts = isSeed ? { forceRerun: true } : undefined; + // The upstream(s) of this phase whose output moved — the cause of a rerun. + const changedUpstreams = depsFor(id).filter((u) => outputMoved.has(u)); try { const ps = await executePhase(phase, newState, deps, newState.phases[id], noop, 0, execOpts); newState.phases[id] = ps; // A phase counts as "rerun" if it was a forced seed OR its result moved; // otherwise it hit its cache (inputHash unchanged) → early cutoff. - if (seedSet.has(id) || ps.inputHash !== before) rerun.push(id); - else cutoff.push(id); + if (isSeed || ps.inputHash !== before) { + rerun.push(id); + outputMoved.add(id); + decisions.push( + isSeed + ? { phaseId: id, outcome: "rerun", reason: "forced by recompute request (seed)" } + : { + phaseId: id, + outcome: "rerun", + reason: "input changed — an upstream's output moved", + causedBy: changedUpstreams.length ? changedUpstreams : undefined, + }, + ); + } else { + cutoff.push(id); + decisions.push({ + phaseId: id, + outcome: "cutoff", + reason: "input unchanged — upstream(s) re-ran but produced identical output (early cutoff)", + causedBy: depsFor(id).filter((u) => frontier.has(u)).length + ? depsFor(id).filter((u) => frontier.has(u)) + : undefined, + }); + } } catch { // A failing recompute phase is recorded as rerun (it was attempted). rerun.push(id); + outputMoved.add(id); + decisions.push({ phaseId: id, outcome: "failed", reason: "re-execution attempted but the phase failed" }); + } + } + // Frontier-external phases were never touched — record them as reused. + for (const id of allIds) { + if (!frontier.has(id)) { + decisions.push({ phaseId: id, outcome: "reused", reason: "not reachable from any changed seed" }); } } return { @@ -2036,6 +2187,7 @@ export async function recomputeTaskflow( rerun, reused: allIds.filter((id) => !frontier.has(id)), cutoff, + decisions, }, state: newState, }; @@ -2099,6 +2251,27 @@ async function runTaskflowLayers(state: RunState, deps: RuntimeDeps): Promise = {}; + for (const p of def.phases) { + try { + map[p.id] = (await phaseFingerprint(def, p.id)) ?? whole; + } catch { + map[p.id] = whole; // fail-open → whole-flow scope + } + } + state.phaseFingerprints = map; + } + state.status = "running"; safeEmit(deps, state); @@ -2238,5 +2411,6 @@ async function runTaskflowLayers(state: RunState, deps: RuntimeDeps): Promise [p.id, p])); + const seen = new Set(); + const queue: string[] = []; + const seed = byId.get(phaseId); + if (seed) for (const d of dependenciesOf(seed)) queue.push(d); + while (queue.length) { + const id = queue.shift()!; + if (seen.has(id)) continue; + if (!byId.has(id)) continue; // unknown dep — validation reports elsewhere + seen.add(id); + const dep = byId.get(id)!; + for (const d of dependenciesOf(dep)) { + if (!seen.has(d)) queue.push(d); + } + } + return Array.from(seen).sort(); +} + /** Topologically ordered layers; phases in the same layer can run concurrently. */ export function topoLayers(phases: Phase[]): Phase[][] { const byId = new Map(phases.map((p) => [p.id, p])); diff --git a/extensions/store.ts b/extensions/store.ts index aa464d1..881f2e3 100644 --- a/extensions/store.ts +++ b/extensions/store.ts @@ -42,10 +42,11 @@ export interface PhaseState { model?: string; error?: string; inputHash?: string; - /** When this result was served from cache: 'cross-run' for the persistent - * cross-run store. (Within-run resume reuses prior state verbatim and is not - * flagged here.) */ - cacheHit?: "cross-run"; + /** When this result was served from cache instead of executed: + * 'cross-run' = restored from the persistent cross-run store; + * 'run-only' = within-run resume (a prior attempt with the same inputHash). + * A phase with this set spent no new tokens this run. */ + cacheHit?: "cross-run" | "run-only"; startedAt?: number; endedAt?: number; /** Live fan-out progress for map/parallel phases. */ @@ -114,6 +115,13 @@ export interface RunState { * recompute derives this fresh from `def` so old runs (pre-H1) also get * union semantics. */ declaredDeps?: Record; + /** Per-phase structural sub-fingerprints (M6). Computed once per run + * alongside `flowDefHash`. Each value is either a precise per-phase hash + * (when sound) or the whole-flow `flowDefHash` (fallback for + * shareContext / `flow` phases). Folded into the cross-run cache key as + * `v3:phasefp:` so editing phase B invalidates only B + its + * transitive dependents. Audit/resume only — recompute derives fresh. */ + phaseFingerprints?: Record; } // --------------------------------------------------------------------------- diff --git a/test/cache-migration.test.ts b/test/cache-migration.test.ts index d4182a5..1e6d276 100644 --- a/test/cache-migration.test.ts +++ b/test/cache-migration.test.ts @@ -49,11 +49,14 @@ function countingRunner(counter: { n: number }): RuntimeDeps["runTask"] { } /** Build a minimal PhaseCacheCtx matching what executeTaskflow constructs for - * a cross-run agent phase, so we can compute the exact legacy/bare keys to - * pre-seed. Derives flowDefHash by running compileTaskflowToIR once. */ + * a cross-run agent phase, so we can compute the exact legacy/bare/v2 keys to + * pre-seed. Derives flowDefHash + per-phase sub-fingerprint by running + * compileTaskflowToIR + phaseFingerprint once (mirrors the runtime). */ async function ccFor(def: Taskflow, cwd: string, store: CacheStore, phaseId: string): Promise { - const { compileTaskflowToIR } = await import("../extensions/flowir/index.ts"); + const { compileTaskflowToIR, phaseFingerprint } = await import("../extensions/flowir/index.ts"); const ir = await compileTaskflowToIR(def); + const fdh = ir.hash; + const subfp = (await phaseFingerprint(def, phaseId)) ?? fdh ?? ""; return { scope: "cross-run", fingerprint: "", @@ -62,7 +65,8 @@ async function ccFor(def: Taskflow, cwd: string, store: CacheStore, phaseId: str phaseId, flowName: def.name, runId: "seed", - flowDefHash: ir.hash, + flowDefHash: fdh, + phaseFp: subfp, }; } @@ -70,7 +74,7 @@ async function ccFor(def: Taskflow, cwd: string, store: CacheStore, phaseId: str // Key shape: new key uses v2:flowdef prefix; legacy/bare differ. // --------------------------------------------------------------------------- -test("cacheKeys: key, legacyKey, bareKey are all distinct", async () => { +test("cacheKeys: key, v2Key, bareKey, legacyKey are all distinct (M6 4-tier)", async () => { const dir = tmpDir(); const store = new CacheStore(dir); const def: Taskflow = { @@ -80,10 +84,13 @@ test("cacheKeys: key, legacyKey, bareKey are all distinct", async () => { const cc = await ccFor(def, dir, store, "p"); // baseParts must match what the agent branch uses: [phase.id, agentName, model, fullTask] const ck = cacheKeys(cc, ["p", "a", "", "fixed"]); - assert.ok(ck.key !== ck.legacyKey, "v2 key differs from legacy (no-flowdef)"); - assert.ok(ck.key !== ck.bareKey, "v2 key differs from bare (unversioned flowdef)"); - assert.ok(ck.legacyKey !== ck.bareKey, "legacy differs from bare"); - assert.match(ck.key, /^[0-9a-f]+$/); + assert.ok(ck.key !== ck.v2Key, "v3 key differs from v2 (per-phase subfp vs whole-flow)"); + assert.ok(ck.key !== ck.bareKey, "v3 key differs from bare (unversioned flowdef)"); + assert.ok(ck.key !== ck.legacyKey, "v3 key differs from legacy (no-flowdef)"); + assert.ok(ck.v2Key !== ck.bareKey, "v2 differs from bare"); + assert.ok(ck.v2Key !== ck.legacyKey, "v2 differs from legacy"); + assert.ok(ck.bareKey !== ck.legacyKey, "bare differs from legacy"); + assert.match(ck.key, /^[0-9a-f]+$/); // all four are hashInput hex digests fs.rmSync(dir, { recursive: true, force: true }); }); @@ -221,11 +228,14 @@ test("cache migration: identical re-run is free (v2 write round-trips)", async ( test("cache migration: structural change invalidates (flowdef hash differs)", async () => { const dir = tmpDir(); const store = new CacheStore(dir); + // M6: only a structural change WITHIN a phase's transitive closure + // invalidates it. Adding an unrelated independent phase must NOT. So `q` + // is made a dependency of `p` — adding it moves p's sub-fingerprint. const mk = (extra: boolean): Taskflow => ({ name: "struct-change", phases: extra ? [ - { id: "p", type: "agent", agent: "a", task: "fixed", cache: { scope: "cross-run" }, final: true }, + { id: "p", type: "agent", agent: "a", task: "fixed", cache: { scope: "cross-run" }, dependsOn: ["q"], final: true }, { id: "q", type: "agent", agent: "a", task: "extra" }, ] : [{ id: "p", type: "agent", agent: "a", task: "fixed", cache: { scope: "cross-run" }, final: true }], @@ -235,10 +245,10 @@ test("cache migration: structural change invalidates (flowdef hash differs)", as await executeTaskflow(mkState(mk(false), dir), deps); assert.equal(counter.n, 1); - // Different structure (extra phase) → different flowDefHash → different v2 key → miss. - // (q also runs, so counter increments by 2.) + // Adding `q` (now in p's closure) → p's sub-fingerprint changes → v3 key + // differs → miss. (q also runs, so counter increments by 2.) await executeTaskflow(mkState(mk(true), dir), deps); - assert.equal(counter.n, 3, "structural change → miss on p (and q runs)"); + assert.equal(counter.n, 3, "structural change in p's closure → miss on p (and q runs)"); fs.rmSync(dir, { recursive: true, force: true }); }); diff --git a/test/cache-phasefp.test.ts b/test/cache-phasefp.test.ts new file mode 100644 index 0000000..ce23446 --- /dev/null +++ b/test/cache-phasefp.test.ts @@ -0,0 +1,274 @@ +import assert from "node:assert/strict"; +import * as fs from "node:fs"; +import * as os from "node:os"; +import * as path from "node:path"; +import { test } from "node:test"; +import type { AgentConfig } from "../extensions/agents.ts"; +import { CacheStore } from "../extensions/cache.ts"; +import { phaseFingerprint } from "../extensions/flowir/index.ts"; +import { executeTaskflow, cacheKeys, type PhaseCacheCtx, type RuntimeDeps } from "../extensions/runtime.ts"; +import type { RunResult, RunOptions } from "../extensions/runner.ts"; +import type { Taskflow } from "../extensions/schema.ts"; +import type { RunState } from "../extensions/store.ts"; +import { emptyUsage } from "../extensions/usage.ts"; + +// --------------------------------------------------------------------------- +// helpers (minimal set, mirroring test/cache.test.ts) +// --------------------------------------------------------------------------- + +const AGENTS: AgentConfig[] = [ + { name: "a", description: "test agent", systemPrompt: "", source: "user", filePath: "" }, +]; + +function tmpDir(): string { + return fs.mkdtempSync(path.join(os.tmpdir(), "tf-phasefp-")); +} + +function mkState(def: Taskflow, cwd: string): RunState { + return { + runId: `run-${Math.random().toString(36).slice(2, 8)}`, + flowName: def.name, + def, + args: {}, + status: "running", + phases: {}, + createdAt: Date.now(), + updatedAt: Date.now(), + cwd, + }; +} + +function countingRunner(counter: { n: number }): RuntimeDeps["runTask"] { + return async (_cwd, _agents, agentName, task, _o: RunOptions): Promise => { + counter.n++; + return { + agent: agentName, + task, + exitCode: 0, + output: `out:${task}#${counter.n}`, + stderr: "", + usage: { ...emptyUsage(), output: 10, cost: 0.001, turns: 1 }, + stopReason: "end", + }; + }; +} + +// =========================================================================== +// Unit tests for phaseFingerprint (soundness gate + determinism) +// =========================================================================== + +test("phaseFingerprint: returns undefined when def.contextSharing is true (soundness gate)", async () => { + const def: Taskflow = { + name: "sharing-flow", + contextSharing: true, + phases: [{ id: "p", type: "agent", agent: "a", task: "t", cache: { scope: "cross-run" }, final: true }], + }; + assert.equal(await phaseFingerprint(def, "p"), undefined); +}); + +test("phaseFingerprint: returns undefined when a closure member has shareContext", async () => { + const def: Taskflow = { + name: "sharing-closure", + phases: [ + { id: "scout", type: "agent", agent: "a", task: "scan", shareContext: true }, + { id: "p", type: "agent", agent: "a", task: "use {steps.scout.output}", dependsOn: ["scout"], cache: { scope: "cross-run" }, final: true }, + ], + }; + // p transitively depends on scout (shareContext) → fallback. + assert.equal(await phaseFingerprint(def, "p"), undefined); + // scout itself has shareContext → fallback. + assert.equal(await phaseFingerprint(def, "scout"), undefined); +}); + +test("phaseFingerprint: returns undefined when a closure member is a flow phase", async () => { + const def: Taskflow = { + name: "flow-closure", + phases: [ + { id: "sub", type: "flow", use: "some-saved-flow" }, + { id: "p", type: "agent", agent: "a", task: "use {steps.sub.output}", dependsOn: ["sub"], cache: { scope: "cross-run" }, final: true }, + ], + } as Taskflow; + // p transitively depends on a flow phase → fallback. + assert.equal(await phaseFingerprint(def, "p"), undefined); + // the flow phase itself → fallback. + assert.equal(await phaseFingerprint(def, "sub"), undefined); +}); + +test("phaseFingerprint: deterministic + changes when an included field changes", async () => { + const mk = (task: string): Taskflow => ({ + name: "det", + phases: [{ id: "p", type: "agent", agent: "a", task, cache: { scope: "cross-run" }, final: true }], + }); + const a1 = await phaseFingerprint(mk("t1"), "p"); + const a2 = await phaseFingerprint(mk("t1"), "p"); + const b = await phaseFingerprint(mk("t2"), "p"); + assert.equal(a1, a2, "stable across calls"); + assert.notEqual(a1, b, "changes when task text changes"); + assert.match(a1!, /^[0-9a-f]+$/); +}); + +test("phaseFingerprint: cache policy field does NOT affect the sub-fingerprint", async () => { + // cache.scope/ttl/fingerprint reach the key via other paths; the sub-fingerprint + // must be invariant to them (else changing TTL would not invalidate via the + // dedicated expiry path but perturb the structural hash). + const mk = (cache: Taskflow["phases"][number]["cache"]): Taskflow => ({ + name: "policy-inv", + phases: [{ id: "p", type: "agent", agent: "a", task: "t", cache, final: true }], + }); + const a = await phaseFingerprint(mk({ scope: "cross-run" }), "p"); + const b = await phaseFingerprint(mk({ scope: "cross-run", ttl: "30m" }), "p"); + const c = await phaseFingerprint(mk({ scope: "cross-run", fingerprint: ["file:x"] }), "p"); + assert.equal(a, b); + assert.equal(a, c); +}); + +test("phaseFingerprint: adding an independent phase does NOT move a phase's sub-fingerprint", async () => { + const base: Taskflow = { + name: "indep", + phases: [{ id: "p", type: "agent", agent: "a", task: "t", cache: { scope: "cross-run" }, final: true }], + }; + const withExtra: Taskflow = { + name: "indep", + phases: [ + { id: "p", type: "agent", agent: "a", task: "t", cache: { scope: "cross-run" }, final: true }, + { id: "q", type: "agent", agent: "a", task: "extra" }, + ], + }; + // q is NOT in p's closure → p's sub-fingerprint is unchanged. + assert.equal(await phaseFingerprint(base, "p"), await phaseFingerprint(withExtra, "p")); +}); + +// =========================================================================== +// Integration tests through the runtime (the Test Matrix) +// =========================================================================== + +test("phasefp: editing phase B does NOT invalidate independent phase A", async () => { + const dir = tmpDir(); + const store = new CacheStore(dir); + const mk = (bTask: string): Taskflow => ({ + name: "indep-edit", + phases: [ + { id: "scout", type: "agent", agent: "a", task: "scan", cache: { scope: "cross-run" } }, + { id: "A", type: "agent", agent: "a", task: "A uses {steps.scout.output}", dependsOn: ["scout"], cache: { scope: "cross-run" } }, + { id: "B", type: "agent", agent: "a", task: bTask, dependsOn: ["scout"], cache: { scope: "cross-run" }, final: true }, + ], + }); + const counter = { n: 0 }; + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + await executeTaskflow(mkState(mk("B original"), dir), deps); + assert.equal(counter.n, 3, "scout + A + B run once"); + // Edit ONLY B's task text. scout + A are unaffected (their closures don't include B). + const r2 = await executeTaskflow(mkState(mk("B edited"), dir), deps); + assert.equal(counter.n, 4, "only B re-runs; scout + A hit"); + assert.equal(r2.state.phases.scout.cacheHit, "cross-run"); + assert.equal(r2.state.phases.A.cacheHit, "cross-run"); + assert.equal(r2.state.phases.B.cacheHit, undefined, "B missed (its task changed)"); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +test("phasefp: editing phase B invalidates B and its transitive dependents", async () => { + const dir = tmpDir(); + const store = new CacheStore(dir); + const mk = (bTask: string): Taskflow => ({ + name: "transitive", + phases: [ + { id: "scout", type: "agent", agent: "a", task: "scan", cache: { scope: "cross-run" } }, + { id: "B", type: "agent", agent: "a", task: bTask, dependsOn: ["scout"], cache: { scope: "cross-run" } }, + { id: "C", type: "agent", agent: "a", task: "C uses {steps.B.output}", dependsOn: ["B"], cache: { scope: "cross-run" } }, + { id: "A", type: "agent", agent: "a", task: "A uses {steps.scout.output}", dependsOn: ["scout"], cache: { scope: "cross-run" }, final: true }, + ], + }); + const counter = { n: 0 }; + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + await executeTaskflow(mkState(mk("B original"), dir), deps); + assert.equal(counter.n, 4, "scout + B + C + A run once"); + // Edit B's task. B's closure changes → B misses. C depends on B → C's closure + // (which includes B) changes → C misses. scout + A are unaffected. + const r2 = await executeTaskflow(mkState(mk("B edited"), dir), deps); + assert.equal(counter.n, 6, "B + C re-run; scout + A hit"); + assert.equal(r2.state.phases.scout.cacheHit, "cross-run"); + assert.equal(r2.state.phases.A.cacheHit, "cross-run", "A independent of B → hit"); + assert.equal(r2.state.phases.B.cacheHit, undefined, "B missed"); + assert.equal(r2.state.phases.C.cacheHit, undefined, "C (transitive dependent) missed"); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +test("phasefp: pre-v3 (v2) entry still hits — no miss-storm", async () => { + const dir = tmpDir(); + const store = new CacheStore(dir); + const def: Taskflow = { + name: "v2-fallback", + phases: [{ id: "p", type: "agent", agent: "a", task: "fixed", cache: { scope: "cross-run" }, final: true }], + }; + // Compute the v2 key the runtime will look up, and pre-seed it. + const { compileTaskflowToIR } = await import("../extensions/flowir/index.ts"); + const ir = await compileTaskflowToIR(def); + const cc: PhaseCacheCtx = { + scope: "cross-run", fingerprint: "", store, prior: undefined, + phaseId: "p", flowName: def.name, runId: "old", + flowDefHash: ir.hash, phaseFp: (await phaseFingerprint(def, "p")) ?? ir.hash, + thinking: undefined, tools: undefined, preRead: "", + }; + const ck = cacheKeys(cc, ["p", "a", "", "fixed"]); + store.put({ key: ck.v2Key, createdAt: Date.now(), output: "V2-OUTPUT", model: "v2-model", state: undefined, flowName: def.name, phaseId: "p", runId: "old" }); + + const counter = { n: 0 }; + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + const r = await executeTaskflow(mkState(def, dir), deps); + assert.equal(counter.n, 0, "v2 entry must hit via fallback — no execution"); + assert.equal(r.state.phases.p.cacheHit, "cross-run"); + assert.equal(r.state.phases.p.output, "V2-OUTPUT"); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +test("phasefp: two structurally-different flows do not collide", async () => { + const dir = tmpDir(); + const store = new CacheStore(dir); + const mk = (extra: boolean): Taskflow => ({ + name: "collide", + phases: extra + ? [ + { id: "p", type: "agent", agent: "a", task: "same", cache: { scope: "cross-run" }, dependsOn: ["q"], final: true }, + { id: "q", type: "agent", agent: "a", task: "extra" }, + ] + : [{ id: "p", type: "agent", agent: "a", task: "same", cache: { scope: "cross-run" }, final: true }], + }); + const counter = { n: 0 }; + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + await executeTaskflow(mkState(mk(false), dir), deps); + assert.equal(counter.n, 1); + // Same name + phaseId + task, but p's closure differs (q added as a dep) → + // different sub-fingerprint → no cross-flow collision. + await executeTaskflow(mkState(mk(true), dir), deps); + assert.equal(counter.n, 3, "p misses (closure changed) and q runs"); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +test("phasefp: shareContext falls back to whole-flow invalidation", async () => { + const dir = tmpDir(); + const store = new CacheStore(dir); + const mk = (bTask: string): Taskflow => ({ + name: "sharing-fallback", + contextSharing: true, + phases: [ + { id: "A", type: "agent", agent: "a", task: "A", cache: { scope: "cross-run" } }, + { id: "B", type: "agent", agent: "a", task: bTask, cache: { scope: "cross-run" }, final: true }, + ], + }); + const counter = { n: 0 }; + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + await executeTaskflow(mkState(mk("B original"), dir), deps); + assert.equal(counter.n, 2, "A + B run once"); + // With contextSharing, per-phase soundness cannot be guaranteed → both + // phases fall back to the whole-flow flowDefHash. Editing B moves the + // whole-flow hash → A ALSO misses (whole-flow invalidation, not per-phase). + const r2 = await executeTaskflow(mkState(mk("B edited"), dir), deps); + assert.equal(counter.n, 4, "both A and B re-run — whole-flow hash moved"); + assert.equal(r2.state.phases.A.cacheHit, undefined, "A NOT reused — fallback to whole-flow"); + assert.equal(r2.state.phases.B.cacheHit, undefined, "B missed (its task changed)"); + fs.rmSync(dir, { recursive: true, force: true }); +}); diff --git a/test/runtime.test.ts b/test/runtime.test.ts index c1a9840..663f493 100644 --- a/test/runtime.test.ts +++ b/test/runtime.test.ts @@ -259,14 +259,14 @@ test("runtime: resume skips cached completed phases", async () => { const state = mkState(def); // Pre-seed phase one as already done with the matching input hash. const { hashInput } = await import("../extensions/store.ts"); - const { flowDefHash } = await import("../extensions/flowir/hash.ts"); - const fh = await flowDefHash(def); + const { phaseFingerprint } = await import("../extensions/flowir/index.ts"); + const subfpOne = (await phaseFingerprint(def, "one")) ?? ""; state.phases.one = { id: "one", status: "done", output: "out:start", - // Must match runtime cacheKey(): flow name + flowDefHash + base parts + thinking + tools + ctx. - inputHash: hashInput(`flow:${def.name}`, `v2:flowdef:${fh}`, "one", "a", "", "start", "think:", "tools:[]", "ctx:"), + // Must match runtime cacheKey(): flow name + v3:phasefp sub-fingerprint + base parts + thinking + tools + ctx. + inputHash: hashInput(`flow:${def.name}`, `v3:phasefp:${subfpOne}`, "one", "a", "", "start", "think:", "tools:[]", "ctx:"), usage: emptyUsage(), }; @@ -287,16 +287,17 @@ test("runtime: resume caches a completed reduce phase (unified inputHash)", asyn const record: string[] = []; const runner = mockRunner((t) => `o:${t}`, { record }); const { hashInput } = await import("../extensions/store.ts"); - const { flowDefHash } = await import("../extensions/flowir/hash.ts"); - const fh = await flowDefHash(def); + const { phaseFingerprint } = await import("../extensions/flowir/index.ts"); + const subfpX = (await phaseFingerprint(def, "x")) ?? ""; + const subfpSum = (await phaseFingerprint(def, "sum")) ?? ""; const state = mkState(def); - state.phases.x = { id: "x", status: "done", output: "o:tx", inputHash: hashInput(`flow:${def.name}`, `v2:flowdef:${fh}`, "x", "a", "", "tx", "think:", "tools:[]", "ctx:"), usage: emptyUsage() }; - // reduce cache key has the same shape as agent/gate (flow + flowDefHash + base parts + thinking + tools). + state.phases.x = { id: "x", status: "done", output: "o:tx", inputHash: hashInput(`flow:${def.name}`, `v3:phasefp:${subfpX}`, "x", "a", "", "tx", "think:", "tools:[]", "ctx:"), usage: emptyUsage() }; + // reduce cache key has the same shape as agent/gate (flow + v3:phasefp + base parts + thinking + tools). state.phases.sum = { id: "sum", status: "done", output: "o:combine o:tx", - inputHash: hashInput(`flow:${def.name}`, `v2:flowdef:${fh}`, "sum", "a", "", "combine o:tx", "think:", "tools:[]", "ctx:"), + inputHash: hashInput(`flow:${def.name}`, `v3:phasefp:${subfpSum}`, "sum", "a", "", "combine o:tx", "think:", "tools:[]", "ctx:"), usage: emptyUsage(), }; const res = await executeTaskflow(state, baseDeps(runner)); From 31b2d49c49c834b18aaa599f876906cc57ad8c1e Mon Sep 17 00:00:00 2001 From: heggria Date: Thu, 25 Jun 2026 20:35:16 +0800 Subject: [PATCH 2/5] feat(cache): per-item cross-run caching for map phases MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add per-item cross-run memoization to the map phase so that when one of N items changes between runs, only that item re-executes (N-1 cache hits) — while preserving the existing whole-map fast path and all soundness fallbacks. Mechanism: - runFanout accepts an optional perItem hook. Before spawning a subagent for an item, it consults cachedPhase with a per-item key; a hit returns a 0-token synthesized RunResult (stopReason "cache-hit") that flows through mergePhaseState as a normal successful item. Successful fresh items are recorded per-item for future runs. - Per-item keys fold [phase.id, it.agent, model, it.task] + the existing v3:phasefp/flowName/fingerprint/thinking/tools/preRead tail. Folding it.agent (arbiter fix) prevents a stale cross-agent hit when only phase.agent changes. - Whole-map lookup stays first (fast path); per-item engages only on a whole-map miss. A trailing whole-map record keeps the fast path warm. Soundness gates (per-item disabled -> whole-map only): - cross-run scope required (run-only/"off" have no persistent store) - shareContext / flow-wide contextSharing disabled (items may read sibling blackboard writes outside declared deps) - inside a runtime-generated sub-flow (def: frame — untrusted) - undefined phaseFingerprint is NOT a blocker (cacheKeys falls back to flowDefHash, which is stable for a fixed def) Correctness: - merged output labels are positionally aligned with over ([k/N] using results.length), budget-skipped items filtered to null; cache-hit items keep their positional slot - cached items contribute emptyUsage -> partial-hit cost == re-executed item only - failed and budget-skipped items are never recorded per-item - fail-open: any cache read/write error degrades to executing the item Backward-compat: pre-existing whole-map entries (any tier) still hit via cachedPhase's 4-tier read-only fallback; the whole-map key format is unchanged. Tests: new test/cache-peritem.test.ts (11 tests) covering the Test Matrix — partial reuse, positional alignment, duplicate sharing, shareContext/def-frame fallbacks, whole-map fast path, revert, usage/subProgress, failed/skipped non-caching, and the agent-invalidation arbiter fix. --- extensions/runtime.ts | 117 ++++++++- skills/taskflow/SKILL.md | 48 ++++ test/cache-peritem.test.ts | 491 +++++++++++++++++++++++++++++++++++++ 3 files changed, 652 insertions(+), 4 deletions(-) create mode 100644 test/cache-peritem.test.ts diff --git a/extensions/runtime.ts b/extensions/runtime.ts index 49c3eae..76c4a99 100644 --- a/extensions/runtime.ts +++ b/extensions/runtime.ts @@ -169,6 +169,31 @@ function resultToPhaseState(id: string, r: RunResult, inputHash: string, parseJs }; } +/** + * Synthesize a 0-token `RunResult` from a cached per-item `PhaseState` so a + * cross-run per-item cache hit flows through `mergePhaseState` as a normal + * successful fan-out item. `stopReason: "cache-hit"` is NOT in `isFailed`'s + * failure set (only "error"/"aborted"/non-zero exit), so the item counts as + * success. Usage is `emptyUsage()` — a cached item spent no new tokens this + * run, so `mergePhaseState`'s `aggregateUsage` charges nothing for it. + * + * Used only by the `map` per-item cache path (see `runFanout`). Fail-open by + * construction: this is only reached AFTER a successful `cachedPhase` lookup, + * so `ps.output` is always present. + */ +function phaseStateToRunResult(ps: PhaseState, it: { agent: string; task: string }): RunResult { + return { + agent: it.agent, + task: it.task, + exitCode: 0, + output: ps.output ?? "", + stderr: "", + usage: emptyUsage(), + model: ps.model, + stopReason: "cache-hit", + }; +} + /** Convert observed read refs (e.g. "steps.scout.output") into a structured * readSet keyed by upstream phase id, tagging each with the version * (= inputHash) that was current when read. Only `steps.*` refs are upstream @@ -326,12 +351,20 @@ function mergePhaseState( const model = ran.find((r) => r.model !== undefined)?.model; // Combine outputs as a labelled list; also expose a JSON array of outputs. // For failed items, use the error message instead of the useless placeholder. - const combinedText = ran + // Labels are positionally aligned to the ORIGINAL `over` array: we iterate + // over ALL results (including budget-skipped, which are filtered to null) and + // use `results.length` as N, so item k's label reads `[k/N]` matching its + // position in `over` — not its rank among non-skipped items. Per-item cache + // hits (`stopReason: "cache-hit"`) are not budget-skipped, so they keep their + // original positional label. + const combinedText = results .map((r, i) => { - const label = `### [${i + 1}/${ran.length}] ${r.agent}${isFailed(r) ? " (failed)" : ""}`; + if (r.stopReason === "budget-skipped") return null; + const label = `### [${i + 1}/${results.length}] ${r.agent}${isFailed(r) ? " (failed)" : ""}`; const content = isFailed(r) ? (r.errorMessage || r.stderr || r.output) : r.output; return `${label}\n\n${content}`; }) + .filter((x): x is string => x !== null) .join("\n\n---\n\n"); // Only successful runs feed the parsed JSON array (no error/skip strings). const jsonArray = parseJson ? ran.filter((r) => !isFailed(r)).map((r) => safeParse(r.output) ?? r.output) : undefined; @@ -870,7 +903,14 @@ async function executePhaseInner( const parseJson = phase.output === "json"; // Runs a list of sub-tasks with live fan-out progress + aggregate live usage/activity. - const runFanout = async (items: Array<{ agent: string; task: string }>): Promise => { + // `perItem` (map only) enables per-item cross-run caching: each item is looked + // up in the cache before spawning a subagent, and a successful fresh item is + // recorded so a later run with that item unchanged hits per-item. When + // `perItem` is undefined (parallel, or non-cacheable maps) the path is inert. + const runFanout = async ( + items: Array<{ agent: string; task: string }>, + perItem?: { keyOf: (idx: number) => CacheKeys | null }, + ): Promise => { let done = 0; let running = 0; let failed = 0; @@ -904,6 +944,28 @@ async function executePhaseInner( stopReason: "budget-skipped", } satisfies RunResult; } + // Per-item cross-run cache lookup (map only). A hit synthesizes a 0-token + // RunResult and returns immediately — the item never spawns a subagent and + // never reaches the ctx_spawn drain below (a cached item can't have queued + // new spawns). Fail-open: any error in the lookup path degrades to executing. + if (perItem) { + try { + const ckItem = perItem.keyOf(idx); + if (ckItem) { + const hit = cachedPhase(cc, ckItem); + if (hit) { + done++; + const synth = phaseStateToRunResult(hit, it); + liveUsages[idx] = emptyUsage(); + if (hit.model) latestModel = hit.model; + refresh(); + return synth; + } + } + } catch { + /* fail-open: a cache read error must never sink the item */ + } + } running++; refresh(); if (ctxDir) { @@ -919,6 +981,23 @@ async function executePhaseInner( done++; if (isFailed(r)) failed++; liveUsages[idx] = r.usage; + // Per-item cross-run cache record (map only): persist a successful fresh + // item so a later run with this item unchanged hits per-item instead of + // re-running. Failed and budget-skipped items are never cached (a stale + // failure would be served on the next run). Fail-open: a write error never + // sinks the item — the fresh `r` is already in hand and flows downstream. + if (perItem && !isFailed(r) && r.stopReason !== "budget-skipped") { + try { + const ckItem = perItem.keyOf(idx); + if (ckItem) { + const ccItem: PhaseCacheCtx = { ...cc, phaseId: `${phase.id}#item${idx}` }; + const itemPs = resultToPhaseState(`${phase.id}#item${idx}`, r, ckItem.key, parseJson); + recordCache(ccItem, itemPs); + } + } catch { + /* fail-open: cache write must never sink the item */ + } + } if (ctxDir) { try { const itemNid = nodeIdFor(String(idx)); @@ -1118,12 +1197,42 @@ async function executePhaseInner( task: preRead + interpolate(phase.task ?? "", localCtx).text, }; }); + // Per-item caching is sound ONLY when ALL of: + // - cross-run scope: run-only has no persistent store, so per-item entries + // could never be re-read (no point keying them). + // - no Shared Context Tree (`!sharing`): a sharing map item can read sibling + // blackboard writes OUTSIDE its declared deps, so the per-item key (which + // folds only the item's own task) under-approximates real reads and could + // serve a stale result. Fall back to whole-map. + // - not inside a runtime-generated sub-flow (`def:` frame in the stack): + // such flows are untrusted / possibly non-deterministic, so per-item reuse + // is unsafe. Fall back to whole-map (which still applies breadth caps). + // `undefined phaseFingerprint` is NOT a blocker: `cacheKeys` falls back to + // the whole-flow `flowDefHash`, which is stable across runs for a fixed def, + // so per-item keys for unchanged items remain stable. + const perItemCacheable = + cc.scope === "cross-run" && + !sharing && + !(deps._stack ?? []).some((s) => s.startsWith("def:")); + // Pre-compute per-item CacheKeys once so the lookup and the record path use + // the IDENTICAL key (and share cacheKeys' v3:phasefp + flow-name + + // fingerprint + thinking/tools/preRead contract). The per-item key folds + // `it.agent` (Arbiter fix): a different agent means different output, so a + // per-item key WITHOUT the agent could serve a stale cross-agent hit when + // only `phase.agent` changed (the whole-map key would correctly miss via + // JSON.stringify(tasks), but per-item keys would not). + const perItemKeys: (CacheKeys | null)[] = perItemCacheable + ? tasks.map((it) => cacheKeys(cc, [phase.id, it.agent, phase.model ?? "", it.task])) + : tasks.map(() => null); + const perItem = perItemCacheable + ? { keyOf: (idx: number): CacheKeys | null => perItemKeys[idx] ?? null } + : undefined; const ck = cacheKeys(cc, [phase.id, phase.model ?? "", JSON.stringify(tasks)]); const inputHash = ck.key; const cached = cachedPhase(cc, ck); if (cached) return cached; - const results = await runFanout(tasks); + const results = await runFanout(tasks, perItem); const ps = mergePhaseState(phase.id, results, inputHash, parseJson); if (readRefs.length) ps.reads = readRefsToReads(readRefs, state); if (mapTruncated) { diff --git a/skills/taskflow/SKILL.md b/skills/taskflow/SKILL.md index aca991b..cd10531 100644 --- a/skills/taskflow/SKILL.md +++ b/skills/taskflow/SKILL.md @@ -553,6 +553,54 @@ Quick reference: - **Precedence (model/thinking/tools):** phase value → agent frontmatter (resolved via `modelRoles`) → global/default. - **Concurrency:** same-layer phases use `flow.concurrency`; a `map`/`parallel` phase uses `phase.concurrency ?? flow.concurrency ?? 8`. +### Per-item map caching (cross-run) + +A `map` phase with `cache: { "scope": "cross-run" }` is cached **per item**, not +just as a whole. When one of N items changes between runs, only that item +re-executes — the other N−1 are served from the cross-run cache for $0. + +```jsonc +{ "id": "audit-each", "type": "map", + "over": "{steps.discover.json.files}", // array from an upstream phase + "task": "audit {item}", + "cache": { "scope": "cross-run" }, // ← enables per-item reuse + "dependsOn": ["discover"], "final": true } +``` + +How it works: + +- The **whole-map** entry is still checked first (fast path): an identical + re-run is a single $0 hit and never enters the fan-out. +- On a whole-map miss, each item is looked up individually before it spawns a + subagent; a hit returns a 0-token synthesized result. Successful fresh items + are recorded so a later run with that item unchanged reuses them. +- Per-item keys fold the item's resolved task **and agent** (so changing + `phase.agent` invalidates every item), plus the phase sub-fingerprint, + `thinking`/`tools`, and any `fingerprint` entries — exactly like a standalone + cross-run phase. + +Automatic fallbacks (per-item disables and the whole-map path is used): + +- `shareContext: true` on the phase, or flow-wide `contextSharing: true` — a + sharing item can read sibling blackboard writes outside its declared deps, so + the per-item key would under-approximate real reads. +- The map runs **inside a runtime-generated sub-flow** (a `flow { def }` phase + or a `ctx_spawn({subflow})`) — untrusted / possibly non-deterministic. +- `scope: "run-only"` (default) or `"off"` — no persistent store to reuse from. + +Notes & limitations: + +- Duplicate items (identical task + agent) share a single entry — reuse is + content-addressable, not positional. +- Failed items and **budget-skipped** items are never cached, so they always + re-execute on the next run. +- `{steps..json[k]}` indexes the k-th **successful** item (not the k-th + position in `over`); the merged `output` text, however, IS positionally + aligned with `over` (labels read `[k/N]`). +- Within-run resume of a partially-completed map is not supported (only + fully-completed maps resume within a run); cross-run per-item reuse covers the + common case. + ## Actions - `action: "run"` — run an inline `define` (a one-off DAG) **or** a saved `name` (with optional `args`). Use `define` for an ad-hoc flow; use `name` to invoke something previously saved. Add `detach: true` to run in the background (returns immediately with the runId; poll the store for status). diff --git a/test/cache-peritem.test.ts b/test/cache-peritem.test.ts new file mode 100644 index 0000000..3a34510 --- /dev/null +++ b/test/cache-peritem.test.ts @@ -0,0 +1,491 @@ +/** + * Per-item map caching — the Test Matrix from the approved plan. + * + * These tests pin the behavior of the per-item cross-run cache path added to + * the `map` branch: changing one of N items re-executes only that item, + * merged output stays positionally aligned with `over`, duplicate items share + * an entry, and the soundness fallbacks (shareContext, dynamic sub-flow, + * failed/budget-skipped items) hold. + * + * The realistic shape for per-item reuse is `over: "{args.items}"` with the + * array supplied via run args: the phase DEFINITION (and therefore + * flowDefHash / phaseFp) stays stable across runs, while the RESOLVED array + * changes — so per-item keys for unchanged items remain stable. Changing the + * `over` LITERAL would move the phase's structural fingerprint and invalidate + * every per-item key at once (no partial reuse), which is correct but not the + * scenario per-item caching targets. + */ + +import assert from "node:assert/strict"; +import * as fs from "node:fs"; +import * as os from "node:os"; +import * as path from "node:path"; +import { test } from "node:test"; +import type { AgentConfig } from "../extensions/agents.ts"; +import { CacheStore } from "../extensions/cache.ts"; +import { phaseFingerprint, compileTaskflowToIR } from "../extensions/flowir/index.ts"; +import { cacheKeys, executeTaskflow, summarizeReuse, type PhaseCacheCtx, type RuntimeDeps } from "../extensions/runtime.ts"; +import type { RunOptions, RunResult } from "../extensions/runner.ts"; +import type { Taskflow } from "../extensions/schema.ts"; +import type { RunState } from "../extensions/store.ts"; +import { emptyUsage } from "../extensions/usage.ts"; + +// --------------------------------------------------------------------------- +// helpers +// --------------------------------------------------------------------------- + +const AGENTS: AgentConfig[] = [ + { name: "a", description: "test agent", systemPrompt: "", source: "user", filePath: "" }, + { name: "b", description: "test agent b", systemPrompt: "", source: "user", filePath: "" }, +]; + +function tmpDir(): string { + return fs.mkdtempSync(path.join(os.tmpdir(), "tf-peritem-")); +} + +function mkState(def: Taskflow, cwd: string, args: Record = {}): RunState { + return { + runId: `run-${Math.random().toString(36).slice(2, 8)}`, + flowName: def.name, + def, + args, + status: "running", + phases: {}, + createdAt: Date.now(), + updatedAt: Date.now(), + cwd, + }; +} + +/** Counting runner: each successful call increments `counter.n` and emits a + * deterministic output embedding the task + call index, so cache hits (which + * skip the call) are observable as a missing index. `failWhen` lets a test + * force a specific item to fail. */ +function countingRunner( + counter: { n: number }, + failWhen?: (task: string) => string | null, +): RuntimeDeps["runTask"] { + return async (_cwd, _agents, agentName, task, _o: RunOptions): Promise => { + counter.n++; + const fail = failWhen ? failWhen(task) : null; + if (fail) { + return { + agent: agentName, + task, + exitCode: 1, + output: "", + stderr: fail, + usage: { ...emptyUsage(), output: 5, cost: 0.001, turns: 1 }, + stopReason: "error", + errorMessage: fail, + }; + } + return { + agent: agentName, + task, + exitCode: 0, + output: `out:${task}#${counter.n}`, + stderr: "", + usage: { ...emptyUsage(), output: 10, cost: 0.001, turns: 1 }, + stopReason: "end", + }; + }; +} + +// --------------------------------------------------------------------------- +// (a) change 1 of N items re-executes only that item +// --------------------------------------------------------------------------- + +test("per-item: change 1 of N items re-executes only that item", async () => { + const dir = tmpDir(); + const def: Taskflow = { + name: "peritem-change-one", + phases: [ + { id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", cache: { scope: "cross-run" }, final: true }, + ], + } as Taskflow; + const counter = { n: 0 }; + const store = new CacheStore(dir); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + const r1 = await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps); + assert.equal(counter.n, 3, "run1 executes all 3 items"); + // Change ONLY item[1] (b -> b2). The phase def is unchanged (over is the + // literal "{args.items}"), so per-item keys for item[0]/item[2] are stable. + const r2 = await executeTaskflow(mkState(def, dir, { items: ["a", "b2", "c"] }), deps); + assert.equal(counter.n, 4, "run2 re-executes only item[1] (3 + 1)"); + assert.equal(r2.state.phases.m.cacheHit, undefined, "phase executed (not a whole-map hit)"); + + // item[0] and item[2] were served from per-item cache: their outputs match + // run1 verbatim (same call index), proving no re-execution. + assert.match(r2.finalOutput, /out:process a#1\b/, "item[0] reused from per-item cache (call #1)"); + assert.match(r2.finalOutput, /out:process c#3\b/, "item[2] reused from per-item cache (call #3)"); + // item[1] re-executed → fresh call index #4. + assert.match(r2.finalOutput, /out:process b2#4\b/, "item[1] re-executed (call #4)"); + // Sanity: run1's item[1] output is NOT present in run2. + assert.doesNotMatch(r2.finalOutput, /out:process b#2\b/); + // r1 sanity: all three call indices appear. + assert.match(r1.finalOutput, /out:process a#1\b/); + assert.match(r1.finalOutput, /out:process b#2\b/); + assert.match(r1.finalOutput, /out:process c#3\b/); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (b) merged output stays positionally aligned with `over` +// --------------------------------------------------------------------------- + +test("per-item: merged output stays positionally aligned with over (failed item keeps its slot)", async () => { + const dir = tmpDir(); + const def: Taskflow = { + name: "peritem-positional", + phases: [ + { id: "m", type: "map", agent: "a", over: '["x","FAIL","y"]', task: "do {item}", cache: { scope: "cross-run" }, final: true }, + ], + } as Taskflow; + const counter = { n: 0 }; + const store = new CacheStore(dir); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter, (t) => (t.includes("FAIL") ? "boom" : null)), cacheStore: store }; + + const r = await executeTaskflow(mkState(def, dir), deps); + const out = r.finalOutput; + // Labels are positionally aligned to the original `over`: [1/3], [2/3] (failed), [3/3]. + assert.match(out, /### \[1\/3\] a\n\nout:do x#\d/, "item[0] keeps slot 1/3"); + assert.match(out, /### \[2\/3\] a \(failed\)\n\nboom/, "item[1] keeps slot 2/3 and is marked failed"); + assert.match(out, /### \[3\/3\] a\n\nout:do y#\d/, "item[2] keeps slot 3/3"); + // No [1/2] / [2/2] labels (the old non-positional behavior counted only ran items). + assert.doesNotMatch(out, /### \[1\/2\]/); + assert.doesNotMatch(out, /### \[2\/2\]/); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (c) duplicate items share a single cache entry +// --------------------------------------------------------------------------- + +test("per-item: duplicate items share a single cache entry (content-addressable)", async () => { + const dir = tmpDir(); + const def: Taskflow = { + name: "peritem-dups", + phases: [ + // concurrency:1 so item[0] records before item[1] looks up (deterministic). + { id: "m", type: "map", agent: "a", over: "{args.items}", task: "do {item}", concurrency: 1, cache: { scope: "cross-run" }, final: true }, + ], + } as Taskflow; + const counter = { n: 0 }; + const store = new CacheStore(dir); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + // ["x","x","y"]: two identical tasks ("do x") share one per-item entry. + await executeTaskflow(mkState(def, dir, { items: ["x", "x", "y"] }), deps); + assert.equal(counter.n, 2, "run1: two DISTINCT tasks execute (do x once, do y once); the second do x hits the just-written entry"); + // run2: all three hit (do x + do y already cached). + const r2 = await executeTaskflow(mkState(def, dir, { items: ["x", "x", "y"] }), deps); + assert.equal(counter.n, 2, "run2: all items served from cache (0 new calls)"); + assert.equal(r2.state.phases.m.cacheHit, "cross-run", "whole-map fast path hits on identical re-run"); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (d) shareContext map falls back to whole-map caching +// --------------------------------------------------------------------------- + +test("per-item: shareContext map falls back to whole-map (no partial reuse)", async () => { + const dir = tmpDir(); + const def: Taskflow = { + name: "peritem-sharectx", + phases: [ + { id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", shareContext: true, cache: { scope: "cross-run" }, final: true }, + ], + } as Taskflow; + const counter = { n: 0 }; + const store = new CacheStore(dir); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps); + assert.equal(counter.n, 3, "run1 executes all 3"); + // Change only item[1]. With shareContext, per-item is unsound → disabled. + // Whole-map misses (items changed) → ALL items re-execute (no partial hits). + const r2 = await executeTaskflow(mkState(def, dir, { items: ["a", "b2", "c"] }), deps); + assert.equal(counter.n, 6, "run2 re-executes ALL 3 items (whole-map fallback, no per-item reuse)"); + assert.equal(r2.state.phases.m.cacheHit, undefined, "phase executed (whole-map missed)"); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (e) pre-existing whole-map entry still hits (fast path) +// --------------------------------------------------------------------------- + +test("per-item: whole-map fast path still hits on identical re-run (precedence over per-item)", async () => { + const dir = tmpDir(); + const def: Taskflow = { + name: "peritem-fastpath", + phases: [ + { id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", cache: { scope: "cross-run" }, final: true }, + ], + } as Taskflow; + const counter = { n: 0 }; + const store = new CacheStore(dir); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps); + assert.equal(counter.n, 3, "run1 seeds whole-map + per-item entries"); + // Identical re-run: whole-map key matches → 1 hit, runFanout never engages. + const r2 = await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps); + assert.equal(counter.n, 3, "run2 hits the whole-map fast path (0 new calls)"); + assert.equal(r2.state.phases.m.cacheHit, "cross-run", "whole-map hit sets the phase-level cacheHit"); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (f) cross-run resume reuses completed items after re-seed (revert path) +// --------------------------------------------------------------------------- + +test("per-item: revert to original re-runs hits the whole-map fast path (run1 entry preserved)", async () => { + const dir = tmpDir(); + const def: Taskflow = { + name: "peritem-revert", + phases: [ + { id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", cache: { scope: "cross-run" }, final: true }, + ], + } as Taskflow; + const counter = { n: 0 }; + const store = new CacheStore(dir); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + const r1 = await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps); + assert.equal(counter.n, 3); + // Change item[1] → 1 re-exec, writes a NEW whole-map entry + new per-item. + await executeTaskflow(mkState(def, dir, { items: ["a", "b2", "c"] }), deps); + assert.equal(counter.n, 4, "run2: only item[1] re-executes"); + // Revert to original. The whole-map key now matches run1's entry → fast-path hit. + const r3 = await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps); + assert.equal(counter.n, 4, "run3: whole-map fast path hits run1's entry (0 new calls)"); + assert.equal(r3.state.phases.m.cacheHit, "cross-run"); + assert.equal(r3.finalOutput, r1.finalOutput, "run3 output matches run1 exactly"); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (g) usage + subProgress correct on partial hit +// --------------------------------------------------------------------------- + +test("per-item: partial hit charges only the re-executed item; subProgress reflects all done", async () => { + const dir = tmpDir(); + const def: Taskflow = { + name: "peritem-usage", + phases: [ + { id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", cache: { scope: "cross-run" }, final: true }, + ], + } as Taskflow; + const counter = { n: 0 }; + const store = new CacheStore(dir); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps); + assert.equal(counter.n, 3); + // Change item[1] only → 1 re-exec (cost 0.001); items 0+2 are 0-token cache hits. + const r2 = await executeTaskflow(mkState(def, dir, { items: ["a", "b2", "c"] }), deps); + assert.equal(counter.n, 4); + const m = r2.state.phases.m; + assert.equal(m.cacheHit, undefined, "phase executed (partial hit, not whole-map)"); + // Cached items contribute emptyUsage → merged cost is exactly one item's cost. + assert.equal(m.usage?.cost ?? 0, 0.001, "only the re-executed item is charged"); + // subProgress: all 3 items reached done (2 cached + 1 executed), none failed. + assert.equal(m.subProgress?.done, 3, "all 3 items done"); + assert.equal(m.subProgress?.failed, 0, "no failures"); + assert.equal(m.subProgress?.total, 3); + // summarizeReuse: the phase executed (partial hit) → counted as executed, not reused. + const reuse = summarizeReuse(r2.state); + assert.equal(reuse.executed, 1, "the map phase is counted as executed (it ran 1 item)"); + assert.equal(reuse.reusedCrossRun, 0, "no whole-phase cross-run hit on a partial run"); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (h) failed item is never cached +// --------------------------------------------------------------------------- + +test("per-item: a failed item is never cached (re-executes on the next run)", async () => { + const dir = tmpDir(); + const def: Taskflow = { + name: "peritem-nofail", + phases: [ + { id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", cache: { scope: "cross-run" }, final: true }, + ], + } as Taskflow; + const store = new CacheStore(dir); + + // run1: item[1] ("process b") fails. Items 0+2 succeed and are cached per-item. + let counter = { n: 0 }; + let failOn = "b"; + const deps1: RuntimeDeps = { + cwd: dir, agents: AGENTS, cacheStore: store, + runTask: countingRunner(counter, (t) => (t.includes(`process ${failOn}`) ? "boom" : null)), + }; + await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps1); + assert.equal(counter.n, 3, "run1 attempts all 3 (item[1] fails)"); + + // run2: same items, no failures. item[0]/[2] hit per-item; item[1] must + // RE-EXECUTE (its failure was not cached) and now succeeds. + counter = { n: 0 }; + failOn = ""; + const deps2: RuntimeDeps = { cwd: dir, agents: AGENTS, cacheStore: store, runTask: countingRunner(counter) }; + const r2 = await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps2); + assert.equal(counter.n, 1, "run2: only the previously-failed item[1] re-executes; 0+2 hit per-item"); + assert.equal(r2.state.phases.m.status, "done", "all items succeed on run2"); + assert.match(r2.finalOutput, /out:process b#\d/, "item[1] now has a fresh successful output"); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (i) budget-skipped item is never cached +// --------------------------------------------------------------------------- + +test("per-item: a budget-skipped item is never recorded as a per-item cache entry", async () => { + const dir = tmpDir(); + // concurrency:1 so the budget guard sees accumulated spend item-by-item. + // maxUSD 0.0015: run1 executes item[0] (0.001) + item[1] (0.001, total 0.002 + // > cap) → item[2] is budget-skipped. We then inspect the cache store DIRECTLY: + // the skipped item must have NO per-item entry (else a later run could serve a + // stale "skipped" result), while the executed items DO have entries. + const def: Taskflow = { + name: "peritem-nobudgetskip", + budget: { maxUSD: 0.0015 }, + phases: [ + { id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", concurrency: 1, cache: { scope: "cross-run" }, final: true }, + ], + } as Taskflow; + const store = new CacheStore(dir); + + let counter = { n: 0 }; + const deps1: RuntimeDeps = { cwd: dir, agents: AGENTS, cacheStore: store, runTask: countingRunner(counter) }; + const r1 = await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps1); + assert.equal(counter.n, 2, "run1: item[0]+item[1] execute, item[2] budget-skipped"); + assert.equal(r1.state.phases.m.budgetTruncated, true, "map was cut short by the budget cap"); + + // Reconstruct the runtime's per-item CacheKeys to inspect the store. + // cc matches what executePhaseInner builds: scope cross-run, no fingerprint, + // empty preRead, and phaseFp = phaseFingerprint(def,"m") ?? flowDefHash. + const ir = await compileTaskflowToIR(def); + const flowDefHash = ir.hash ?? "failed"; + const phaseFp = (await phaseFingerprint(def, "m")) ?? flowDefHash; + const cc: PhaseCacheCtx = { + scope: "cross-run", + fingerprint: "", + store, + prior: undefined, + phaseId: "m", + flowName: def.name, + runId: r1.state.runId, + flowDefHash, + phaseFp, + thinking: undefined, + tools: undefined, + preRead: "", + }; + // Per-item key folds [phase.id, it.agent, model, it.task] (Arbiter fix). + const keyFor = (task: string) => cacheKeys(cc, ["m", "a", "", task]).key; + const keyA = keyFor("process a"); // item[0]: executed → cached + const keyB = keyFor("process b"); // item[1]: executed → cached + const keyC = keyFor("process c"); // item[2]: budget-skipped → NOT cached + + assert.notEqual(store.get(keyA), null, "executed item[0] has a per-item cache entry"); + assert.notEqual(store.get(keyB), null, "executed item[1] has a per-item cache entry"); + assert.equal(store.get(keyC), null, "budget-skipped item[2] has NO per-item cache entry"); + // The skipped item's entry (had it been written) would carry no real output; + // confirm the executed entries carry the real subagent output. + assert.match(store.get(keyA)?.output ?? "", /out:process a#/); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (j) map inside a dynamic sub-flow (def: frame) uses whole-map only +// --------------------------------------------------------------------------- + +test("per-item: map inside a dynamic sub-flow (def: frame) uses whole-map only (no partial reuse)", async () => { + const dir = tmpDir(); + // Top-level flow phase with an inline `def` containing a cross-run map. + // The def-frame in the stack disables per-item caching for the inner map. + // `with.items` interpolates from top-level args so the resolved array can + // change WITHOUT changing the def literal (keeping the sub-flow identity + // stable would otherwise mask the behavior under the flow phase's own cache). + const mk = (): Taskflow => ({ + name: "peritem-defframe", + phases: [ + { + id: "sub", + type: "flow", + agent: "a", + with: { items: "{args.topItems}" }, + cache: { scope: "cross-run" }, + final: true, + def: { + name: "inner", + phases: [ + { id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", cache: { scope: "cross-run" }, final: true }, + ], + }, + }, + ], + }) as Taskflow; + const def = mk(); + const store = new CacheStore(dir); + + let counter = { n: 0 }; + const deps1: RuntimeDeps = { cwd: dir, agents: AGENTS, cacheStore: store, runTask: countingRunner(counter) }; + await executeTaskflow(mkState(def, dir, { topItems: '["a","b","c"]' }), deps1); + assert.equal(counter.n, 3, "run1: inner map executes all 3 items"); + + // Identical re-run: the flow phase's whole-map cache hits → inner map is + // not even re-entered → 0 calls. Confirms the flow phase still caches. + counter = { n: 0 }; + const deps2: RuntimeDeps = { cwd: dir, agents: AGENTS, cacheStore: store, runTask: countingRunner(counter) }; + const r2 = await executeTaskflow(mkState(def, dir, { topItems: '["a","b","c"]' }), deps2); + assert.equal(counter.n, 0, "run2: flow phase whole-map hit (0 calls)"); + assert.equal(r2.state.phases.sub.cacheHit, "cross-run"); + + // Change ONLY item[1]. The flow phase whole-map misses (subArgs changed) → + // inner map re-enters. Its whole-map also misses (items changed). Because the + // map is inside a def-frame, per-item is DISABLED → ALL 3 items re-execute + // (if per-item were enabled, only item[1] would run → counter.n would be 1). + counter = { n: 0 }; + const deps3: RuntimeDeps = { cwd: dir, agents: AGENTS, cacheStore: store, runTask: countingRunner(counter) }; + const r3 = await executeTaskflow(mkState(def, dir, { topItems: '["a","b2","c"]' }), deps3); + assert.equal(counter.n, 3, "run3: ALL items re-execute (per-item disabled inside def-frame; whole-map fallback)"); + assert.equal(r3.state.phases.sub.cacheHit, undefined, "flow phase missed (items changed)"); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (k) Arbiter fix: changing phase.agent invalidates all per-item keys +// --------------------------------------------------------------------------- + +test("per-item: changing phase.agent invalidates every per-item key (no stale cross-agent hit)", async () => { + const dir = tmpDir(); + const mk = (agent: string): Taskflow => ({ + name: "peritem-agent", + phases: [ + { id: "m", type: "map", agent, over: "{args.items}", task: "process {item}", cache: { scope: "cross-run" }, final: true }, + ], + }) as Taskflow; + const counter = { n: 0 }; + const store = new CacheStore(dir); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + // run1 with agent "a": all items identical (same task text). Seeds per-item + // entries keyed on agent "a". + await executeTaskflow(mkState(mk("a"), dir, { items: ["a", "b", "c"] }), deps); + assert.equal(counter.n, 3); + // run2: SAME items + SAME task, but agent changed to "b". The per-item key + // folds `it.agent`, so every per-item key differs → no stale cross-agent hit. + // All 3 items re-execute under agent "b". + const r2 = await executeTaskflow(mkState(mk("b"), dir, { items: ["a", "b", "c"] }), deps); + assert.equal(counter.n, 6, "changing phase.agent must invalidate all per-item keys (3 + 3)"); + assert.equal(r2.state.phases.m.cacheHit, undefined, "whole-map also missed (agent is in JSON.stringify(tasks))"); + // Re-run with agent "b" again → whole-map fast path hits. + const r3 = await executeTaskflow(mkState(mk("b"), dir, { items: ["a", "b", "c"] }), deps); + assert.equal(counter.n, 6, "agent b now cached → 0 new calls"); + assert.equal(r3.state.phases.m.cacheHit, "cross-run"); + fs.rmSync(dir, { recursive: true, force: true }); +}); From fb13128be29f736584414b4aa6864e2d7d77f2d2 Mon Sep 17 00:00:00 2001 From: heggria Date: Fri, 26 Jun 2026 12:16:13 +0800 Subject: [PATCH 3/5] fix(cache): make map per-item keys omit structural fingerprint MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per-item cross-run cache keys for the `map` phase folded both `phaseFp` and `flowDefHash` (via the whole-phase `cc`). Both fingerprints hash the `over` array source, so when a literal or data-derived `over` changed ONE item between runs, EVERY per-item key moved at once — defeating partial reuse (all N items re-executed instead of just the changed one). Fix: build a per-item `ccPerItem` with BOTH `phaseFp` and `flowDefHash` set to `undefined`, and use it only for per-item key construction. A single item's output is fully specified by it.task (template + item/as value + upstream-output refs + args) + it.agent + model + thinking/tools/preRead + the world-state fingerprint; `over` only determines WHICH items exist, not WHAT any item computes. `flowName` is retained for cross-flow collision prevention. The whole-map key keeps the FULL cc (phaseFp + flowDefHash) so its fast path and any pre-existing whole-map entries are unchanged (backward compat). The perItem object now carries its own cc so the lookup and record paths in runFanout use the per-item variant consistently. Soundness is preserved: task-template, agent, model, as (via resolved it.task), upstream-output, and world-state changes all still invalidate the correct items. shareContext / def-frame / failed / budget-skipped fallbacks are unchanged. Tests: add a bug-reproduction test (literal over, change 1 of N items) that FAILS before the fix (counter 3 to 6) and PASSES after (3 to 4), plus literal-over soundness variants (task/agent/upstream change imply full re-exec) and whole-map fast-path + partial-hit + failed-item de-masking variants. Update the budget-skipped test key reconstruction to use the per-item cc (fingerprints omitted). Fix the e2e incremental suite map section (add output: json so the merged-output assertion holds). --- extensions/runtime.ts | 45 ++++-- test/cache-peritem.test.ts | 272 +++++++++++++++++++++++++++++++-- test/e2e-incremental-suite.mts | 258 +++++++++++++++++++++++++++++++ 3 files changed, 551 insertions(+), 24 deletions(-) create mode 100644 test/e2e-incremental-suite.mts diff --git a/extensions/runtime.ts b/extensions/runtime.ts index 76c4a99..d8d8749 100644 --- a/extensions/runtime.ts +++ b/extensions/runtime.ts @@ -909,7 +909,7 @@ async function executePhaseInner( // `perItem` is undefined (parallel, or non-cacheable maps) the path is inert. const runFanout = async ( items: Array<{ agent: string; task: string }>, - perItem?: { keyOf: (idx: number) => CacheKeys | null }, + perItem?: { keyOf: (idx: number) => CacheKeys | null; cc: PhaseCacheCtx }, ): Promise => { let done = 0; let running = 0; @@ -952,7 +952,7 @@ async function executePhaseInner( try { const ckItem = perItem.keyOf(idx); if (ckItem) { - const hit = cachedPhase(cc, ckItem); + const hit = cachedPhase(perItem.cc, ckItem); if (hit) { done++; const synth = phaseStateToRunResult(hit, it); @@ -990,7 +990,7 @@ async function executePhaseInner( try { const ckItem = perItem.keyOf(idx); if (ckItem) { - const ccItem: PhaseCacheCtx = { ...cc, phaseId: `${phase.id}#item${idx}` }; + const ccItem: PhaseCacheCtx = { ...perItem.cc, phaseId: `${phase.id}#item${idx}` }; const itemPs = resultToPhaseState(`${phase.id}#item${idx}`, r, ckItem.key, parseJson); recordCache(ccItem, itemPs); } @@ -1207,26 +1207,43 @@ async function executePhaseInner( // - not inside a runtime-generated sub-flow (`def:` frame in the stack): // such flows are untrusted / possibly non-deterministic, so per-item reuse // is unsafe. Fall back to whole-map (which still applies breadth caps). - // `undefined phaseFingerprint` is NOT a blocker: `cacheKeys` falls back to - // the whole-flow `flowDefHash`, which is stable across runs for a fixed def, - // so per-item keys for unchanged items remain stable. + // `undefined phaseFingerprint` is NOT a blocker for soundness — it is a + // DELIBERATE design choice: per-item keys omit BOTH phaseFp and flowDefHash + // (via ccPerItem below) so a changing `over` cannot move unchanged items' + // keys. See ccPerItem for the full soundness argument. const perItemCacheable = cc.scope === "cross-run" && !sharing && !(deps._stack ?? []).some((s) => s.startsWith("def:")); + // Per-item cache context: structural fingerprints (phaseFp + flowDefHash) + // are OMITTED so a changing `over` cannot move unchanged items' keys. Both + // fingerprints hash `over` (the array source); folding either into a + // per-item key means editing one item invalidates EVERY per-item key at + // once (no partial reuse) — the bug fixed here. A single item's output is + // fully specified by `it.task` (template + {item}/{as} value + any + // upstream-output refs + args) + `it.agent` + model + thinking/tools/preRead + // + the world-state `fingerprint`; `over` only determines WHICH items + // exist, not WHAT any item computes. `flowName` is retained for cross-flow + // collision prevention. Soundness: docs/internal/cache-migration.md. + // NB: perItemCacheable already gates on scope === "cross-run", which is + // blocked upstream when flowDefHash === "failed", so ccPerItem is only + // built when flowDefHash is a real hash (or already undefined) — setting + // it to undefined here is a safe no-op for the failed case. + const ccPerItem: PhaseCacheCtx = { ...cc, phaseFp: undefined, flowDefHash: undefined }; // Pre-compute per-item CacheKeys once so the lookup and the record path use - // the IDENTICAL key (and share cacheKeys' v3:phasefp + flow-name + - // fingerprint + thinking/tools/preRead contract). The per-item key folds - // `it.agent` (Arbiter fix): a different agent means different output, so a - // per-item key WITHOUT the agent could serve a stale cross-agent hit when - // only `phase.agent` changed (the whole-map key would correctly miss via - // JSON.stringify(tasks), but per-item keys would not). + // the IDENTICAL key (built from ccPerItem, NOT the whole-phase cc). The + // per-item key folds `it.agent` (Arbiter fix): a different agent means + // different output, so a per-item key WITHOUT the agent could serve a stale + // cross-agent hit when only `phase.agent` changed (the whole-map key would + // correctly miss via JSON.stringify(tasks), but per-item keys would not). const perItemKeys: (CacheKeys | null)[] = perItemCacheable - ? tasks.map((it) => cacheKeys(cc, [phase.id, it.agent, phase.model ?? "", it.task])) + ? tasks.map((it) => cacheKeys(ccPerItem, [phase.id, it.agent, phase.model ?? "", it.task])) : tasks.map(() => null); const perItem = perItemCacheable - ? { keyOf: (idx: number): CacheKeys | null => perItemKeys[idx] ?? null } + ? { keyOf: (idx: number): CacheKeys | null => perItemKeys[idx] ?? null, cc: ccPerItem } : undefined; + // Whole-map key keeps the FULL cc (phaseFp + flowDefHash) so its fast path + // and any pre-existing whole-map entries are unchanged (backward compat). const ck = cacheKeys(cc, [phase.id, phase.model ?? "", JSON.stringify(tasks)]); const inputHash = ck.key; const cached = cachedPhase(cc, ck); diff --git a/test/cache-peritem.test.ts b/test/cache-peritem.test.ts index 3a34510..d43e5ba 100644 --- a/test/cache-peritem.test.ts +++ b/test/cache-peritem.test.ts @@ -23,7 +23,6 @@ import * as path from "node:path"; import { test } from "node:test"; import type { AgentConfig } from "../extensions/agents.ts"; import { CacheStore } from "../extensions/cache.ts"; -import { phaseFingerprint, compileTaskflowToIR } from "../extensions/flowir/index.ts"; import { cacheKeys, executeTaskflow, summarizeReuse, type PhaseCacheCtx, type RuntimeDeps } from "../extensions/runtime.ts"; import type { RunOptions, RunResult } from "../extensions/runner.ts"; import type { Taskflow } from "../extensions/schema.ts"; @@ -365,12 +364,11 @@ test("per-item: a budget-skipped item is never recorded as a per-item cache entr assert.equal(r1.state.phases.m.budgetTruncated, true, "map was cut short by the budget cap"); // Reconstruct the runtime's per-item CacheKeys to inspect the store. - // cc matches what executePhaseInner builds: scope cross-run, no fingerprint, - // empty preRead, and phaseFp = phaseFingerprint(def,"m") ?? flowDefHash. - const ir = await compileTaskflowToIR(def); - const flowDefHash = ir.hash ?? "failed"; - const phaseFp = (await phaseFingerprint(def, "m")) ?? flowDefHash; - const cc: PhaseCacheCtx = { + // Per-item keys are built from ccPerItem — the whole-phase cc with BOTH + // phaseFp and flowDefHash set to undefined (so a changing `over` cannot move + // unchanged items' keys). So the reconstructed cc must ALSO omit both + // fingerprints to match what the runtime writes under. + const ccPerItem: PhaseCacheCtx = { scope: "cross-run", fingerprint: "", store, @@ -378,14 +376,15 @@ test("per-item: a budget-skipped item is never recorded as a per-item cache entr phaseId: "m", flowName: def.name, runId: r1.state.runId, - flowDefHash, - phaseFp, + flowDefHash: undefined, + phaseFp: undefined, thinking: undefined, tools: undefined, preRead: "", }; // Per-item key folds [phase.id, it.agent, model, it.task] (Arbiter fix). - const keyFor = (task: string) => cacheKeys(cc, ["m", "a", "", task]).key; + // (phaseFp/flowDefHash are intentionally absent — see ccPerItem above.) + const keyFor = (task: string) => cacheKeys(ccPerItem, ["m", "a", "", task]).key; const keyA = keyFor("process a"); // item[0]: executed → cached const keyB = keyFor("process b"); // item[1]: executed → cached const keyC = keyFor("process c"); // item[2]: budget-skipped → NOT cached @@ -489,3 +488,256 @@ test("per-item: changing phase.agent invalidates every per-item key (no stale cr assert.equal(r3.state.phases.m.cacheHit, "cross-run"); fs.rmSync(dir, { recursive: true, force: true }); }); + +// --------------------------------------------------------------------------- +// (L0) BUG REPRODUCTION: literal `over` — change 1 of N items re-executes only that item. +// +// Unlike the {args.items} tests above (whose phase DEFINITION is stable across +// runs), a literal `over: '["a","b","c"]'` bakes the array into the def. Changing +// one item CHANGES the def → flowDefHash AND phaseFp both move (neither strips +// `over`). Before the fix, ALL per-item keys moved at once → every item +// re-executed (counter 3 → 6). After the fix, per-item keys omit BOTH +// phaseFp and flowDefHash (via ccPerItem), so an unchanged item's key is stable +// (it depends only on it.task + agent + model + thinking/tools/preRead + +// world-state fingerprint) → only the changed item re-runs (3 → 4). +// --------------------------------------------------------------------------- + +test("per-item: LITERAL over — change 1 of N items re-executes only that item (bug repro)", async () => { + const dir = tmpDir(); + const mk = (items: string[]): Taskflow => ({ + name: "peritem-literal-repro", + phases: [ + { id: "m", type: "map", agent: "a", over: JSON.stringify(items), task: "process {item}", cache: { scope: "cross-run" }, final: true }, + ], + }) as Taskflow; + const counter = { n: 0 }; + const store = new CacheStore(dir); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + const r1 = await executeTaskflow(mkState(mk(["a", "b", "c"]), dir), deps); + assert.equal(counter.n, 3, "run1 executes all 3 items"); + assert.match(r1.finalOutput, /out:process a#1\b/); + assert.match(r1.finalOutput, /out:process b#2\b/); + assert.match(r1.finalOutput, /out:process c#3\b/); + + // Change ONLY item[1] (b -> b2). The literal `over` changes, so flowDefHash/ + // phaseFp move — but per-item keys must be invariant to `over` changes. + const r2 = await executeTaskflow(mkState(mk(["a", "b2", "c"]), dir), deps); + assert.equal(counter.n, 4, "run2 re-executes only item[1] (3 + 1)"); + assert.equal(r2.state.phases.m.cacheHit, undefined, "phase executed (partial hit, not whole-map)"); + // item[0] and item[2] reused verbatim from per-item cache (same call index). + assert.match(r2.finalOutput, /out:process a#1\b/, "item[0] reused from per-item cache (call #1)"); + assert.match(r2.finalOutput, /out:process c#3\b/, "item[2] reused from per-item cache (call #3)"); + // item[1] re-executed → fresh call index #4. + assert.match(r2.finalOutput, /out:process b2#4\b/, "item[1] re-executed (call #4)"); + // Sanity: run1's item[1] output is NOT present in run2. + assert.doesNotMatch(r2.finalOutput, /out:process b#2\b/); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (L1) Soundness: task template change invalidates ALL items (literal over). +// `it.task` is the per-item identity — changing the template changes every +// item's task, so every per-item key must move (full re-exec). +// --------------------------------------------------------------------------- + +test("per-item: LITERAL over — task template change re-executes all items", async () => { + const dir = tmpDir(); + const mk = (task: string, items: string[]): Taskflow => ({ + name: "peritem-literal-task", + phases: [ + { id: "m", type: "map", agent: "a", over: JSON.stringify(items), task, cache: { scope: "cross-run" }, final: true }, + ], + }) as Taskflow; + const counter = { n: 0 }; + const store = new CacheStore(dir); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + await executeTaskflow(mkState(mk("process {item}", ["a", "b", "c"]), dir), deps); + assert.equal(counter.n, 3, "run1 executes all 3"); + // Same items, but task template changed → every it.task differs → all re-exec. + const r2 = await executeTaskflow(mkState(mk("analyze {item}", ["a", "b", "c"]), dir), deps); + assert.equal(counter.n, 6, "run2 re-executes ALL items (task template changed → every key moved)"); + assert.equal(r2.state.phases.m.cacheHit, undefined, "whole-map also missed (tasks JSON differs)"); + assert.match(r2.finalOutput, /out:analyze a#4\b/); + assert.match(r2.finalOutput, /out:analyze b#5\b/); + assert.match(r2.finalOutput, /out:analyze c#6\b/); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (L2) Soundness: agent change invalidates ALL items (literal over). +// The per-item key folds `it.agent`, so changing phase.agent moves every key. +// --------------------------------------------------------------------------- + +test("per-item: LITERAL over — agent change re-executes all items", async () => { + const dir = tmpDir(); + const mk = (agent: string): Taskflow => ({ + name: "peritem-literal-agent", + phases: [ + { id: "m", type: "map", agent, over: JSON.stringify(["a", "b", "c"]), task: "process {item}", cache: { scope: "cross-run" }, final: true }, + ], + }) as Taskflow; + const counter = { n: 0 }; + const store = new CacheStore(dir); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + await executeTaskflow(mkState(mk("a"), dir), deps); + assert.equal(counter.n, 3); + // Same items + same task, but agent changed → every per-item key moves. + const r2 = await executeTaskflow(mkState(mk("b"), dir), deps); + assert.equal(counter.n, 6, "agent change invalidates all per-item keys (3 + 3)"); + assert.equal(r2.state.phases.m.cacheHit, undefined); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (L3) Soundness: `as` field interaction is implicitly covered. +// `as` only renames the loop variable; the resolved `it.task` text is what +// flows into the per-item key. If the author keeps the template consistent +// with `as`, the interpolated text is unchanged → no spurious invalidation +// (correct). If they desync them, `it.task` differs → invalidation (correct, +// covered by L1's task-template principle). No separate test needed. +// --------------------------------------------------------------------------- + +// --------------------------------------------------------------------------- +// (L4) Soundness: upstream output referenced in task re-executes all items. +// A map task that interpolates {steps.discover.output} folds the upstream +// output into it.task — when the upstream output changes, every per-item key +// moves (correct: the map's input genuinely changed). +// --------------------------------------------------------------------------- + +test("per-item: upstream output referenced in task invalidates all items when it changes", async () => { + const dir = tmpDir(); + const mk = (discoverOut: string): Taskflow => ({ + name: "peritem-upstream", + phases: [ + { id: "discover", type: "agent", agent: "a", task: "discover" }, + { id: "m", type: "map", agent: "a", over: JSON.stringify(["x", "y"]), task: `do {item} with {steps.discover.output}`, dependsOn: ["discover"], cache: { scope: "cross-run" }, final: true }, + ], + }) as Taskflow; + let counter = { n: 0 }; + const store = new CacheStore(dir); + // Runner that emits a configurable discover output + counting map calls. + const mkDeps = (discoverOut: string): RuntimeDeps => ({ + cwd: dir, agents: AGENTS, cacheStore: store, + runTask: async (_cwd, _agents, agentName, task): Promise => { + counter.n++; + const out = task === "discover" ? discoverOut : `out:${task}#${counter.n}`; + return { agent: agentName, task, exitCode: 0, output: out, stderr: "", usage: { ...emptyUsage(), output: 10, cost: 0.001, turns: 1 }, stopReason: "end" }; + }, + }); + + await executeTaskflow(mkState(mk("CTX1"), dir), mkDeps("CTX1")); + const mapCalls1 = counter.n; + assert.ok(mapCalls1 >= 3, "run1: discover + 2 map items execute"); + // discover output changes → it.task for EVERY map item changes → all re-exec. + counter = { n: 0 }; + const r2 = await executeTaskflow(mkState(mk("CTX2"), dir), mkDeps("CTX2")); + // discover re-runs (its task changed too — same literal, but flowDefHash/phaseFp + // move because the map phase's over-or-task is the SAME literal here... actually + // discover's task literal is unchanged so it hits cross-run). Either way, both + // map items must re-execute because {steps.discover.output} differs. + assert.match(r2.finalOutput, /do x with CTX2/, "map item x re-executed with new upstream output"); + assert.match(r2.finalOutput, /do y with CTX2/, "map item y re-executed with new upstream output"); + assert.doesNotMatch(r2.finalOutput, /do x with CTX1/, "stale upstream-coupled output not served"); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (L5) Whole-map fast path still hits on identical re-run (literal over). +// The whole-map key keeps the FULL cc (phaseFp + flowDefHash), so an identical +// re-run hits the whole-map fast path — per-item path never engages. +// --------------------------------------------------------------------------- + +test("per-item: LITERAL over — whole-map fast path hits on identical re-run", async () => { + const dir = tmpDir(); + const def: Taskflow = { + name: "peritem-literal-fastpath", + phases: [ + { id: "m", type: "map", agent: "a", over: JSON.stringify(["a", "b", "c"]), task: "process {item}", cache: { scope: "cross-run" }, final: true }, + ], + } as Taskflow; + const counter = { n: 0 }; + const store = new CacheStore(dir); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + await executeTaskflow(mkState(def, dir), deps); + assert.equal(counter.n, 3, "run1 executes all 3"); + // Identical re-run: whole-map key matches → 1 hit, runFanout never engages. + const r2 = await executeTaskflow(mkState(def, dir), deps); + assert.equal(counter.n, 3, "run2 hits whole-map fast path (0 new calls)"); + assert.equal(r2.state.phases.m.cacheHit, "cross-run", "whole-map hit sets phase-level cacheHit"); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (L6) De-mask: partial hit charges only the re-executed item (literal over). +// Literal-`over` variant of test (g). Before the fix this was impossible +// (all items re-executed); now only item[1] re-runs → cost is exactly one item. +// --------------------------------------------------------------------------- + +test("per-item: LITERAL over — partial hit charges only the re-executed item", async () => { + const dir = tmpDir(); + const mk = (items: string[]): Taskflow => ({ + name: "peritem-literal-usage", + phases: [ + { id: "m", type: "map", agent: "a", over: JSON.stringify(items), task: "process {item}", cache: { scope: "cross-run" }, final: true }, + ], + }) as Taskflow; + const counter = { n: 0 }; + const store = new CacheStore(dir); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store }; + + await executeTaskflow(mkState(mk(["a", "b", "c"]), dir), deps); + assert.equal(counter.n, 3); + // Change item[1] only → 1 re-exec (cost 0.001); items 0+2 are 0-token cache hits. + const r2 = await executeTaskflow(mkState(mk(["a", "b2", "c"]), dir), deps); + assert.equal(counter.n, 4); + const m = r2.state.phases.m; + assert.equal(m.cacheHit, undefined, "phase executed (partial hit, not whole-map)"); + assert.equal(m.usage?.cost ?? 0, 0.001, "only the re-executed item is charged"); + assert.equal(m.subProgress?.done, 3, "all 3 items done"); + assert.equal(m.subProgress?.failed, 0); + assert.equal(m.subProgress?.total, 3); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// (L7) De-mask: a failed item is never cached (literal over). +// Literal-`over` variant of test (h). A failing item must not be recorded, +// so a later run with the SAME literal `over` (same def!) re-executes only it. +// Note: because the def is identical across runs here, flowDefHash/phaseFp are +// stable — so this test would have PASSED even before the fix. It's included +// to lock the behavior for the literal-`over` shape (de-masking the suite). +// --------------------------------------------------------------------------- + +test("per-item: LITERAL over — a failed item is never cached (re-executes next run)", async () => { + const dir = tmpDir(); + const def: Taskflow = { + name: "peritem-literal-nofail", + phases: [ + { id: "m", type: "map", agent: "a", over: JSON.stringify(["a", "b", "c"]), task: "process {item}", cache: { scope: "cross-run" }, final: true }, + ], + } as Taskflow; + const store = new CacheStore(dir); + + // run1: item[1] ("process b") fails. Items 0+2 succeed and are cached per-item. + let counter = { n: 0 }; + const deps1: RuntimeDeps = { + cwd: dir, agents: AGENTS, cacheStore: store, + runTask: countingRunner(counter, (t) => (t.includes("process b") ? "boom" : null)), + }; + await executeTaskflow(mkState(def, dir), deps1); + assert.equal(counter.n, 3, "run1 attempts all 3 (item[1] fails)"); + + // run2: SAME def (same literal over), no failures. item[0]/[2] hit per-item; + // item[1] must RE-EXECUTE (its failure was not cached) and now succeeds. + counter = { n: 0 }; + const deps2: RuntimeDeps = { cwd: dir, agents: AGENTS, cacheStore: store, runTask: countingRunner(counter) }; + const r2 = await executeTaskflow(mkState(def, dir), deps2); + assert.equal(counter.n, 1, "run2: only the previously-failed item[1] re-executes; 0+2 hit per-item"); + assert.equal(r2.state.phases.m.status, "done", "all items succeed on run2"); + assert.match(r2.finalOutput, /out:process b#\d/, "item[1] now has a fresh successful output"); + fs.rmSync(dir, { recursive: true, force: true }); +}); diff --git a/test/e2e-incremental-suite.mts b/test/e2e-incremental-suite.mts new file mode 100644 index 0000000..e105893 --- /dev/null +++ b/test/e2e-incremental-suite.mts @@ -0,0 +1,258 @@ +/** + * E2E suite for the "complete incremental recompute" landing (v0.0.28): + * the five coupled capabilities shipped across the M5 finish line, exercised + * end-to-end through the REAL runtime + REAL on-disk CacheStore with a + * deterministic mock subagent runner (no live `pi` / model access needed). + * + * 1. precise ir-changed diff — editing one phase reuses the others cross-run + * 2. map item-level reuse — editing one fan-out item reruns only it + * 3. incremental flag — flow.incremental / override → cross-run default + * 4. run reuse summary — summarizeReuse counts reused vs executed + * 5. recompute decision trace — per-phase why (rerun/cutoff/reused + causedBy) + * + * Run: node --experimental-strip-types test/e2e-incremental-suite.mts + */ + +import * as fs from "node:fs"; +import * as os from "node:os"; +import * as path from "node:path"; +import type { AgentConfig } from "../extensions/agents.ts"; +import { CacheStore } from "../extensions/cache.ts"; +import { + executeTaskflow, + recomputeTaskflow, + summarizeReuse, + type RuntimeDeps, +} from "../extensions/runtime.ts"; +import { resolveCacheScope } from "../extensions/index.ts"; +import type { RunResult, RunOptions } from "../extensions/runner.ts"; +import type { Taskflow } from "../extensions/schema.ts"; +import type { RunState } from "../extensions/store.ts"; +import { emptyUsage } from "../extensions/usage.ts"; + +const C = { + ok: (s: string) => `\x1b[32m${s}\x1b[0m`, + bad: (s: string) => `\x1b[31m${s}\x1b[0m`, + hl: (s: string) => `\x1b[36m${s}\x1b[0m`, + bold: (s: string) => `\x1b[1m${s}\x1b[0m`, +}; + +const AGENTS: AgentConfig[] = [ + { name: "a", description: "test agent", systemPrompt: "", source: "user", filePath: "" }, +]; + +let failures = 0; +const assert = (cond: boolean, msg: string) => { + if (cond) console.log(` ${C.ok("✓")} ${msg}`); + else { + failures++; + console.log(` ${C.bad("✗")} ${msg}`); + } +}; +const section = (s: string) => console.log(`\n${C.hl("▸ " + s)}`); + +function tmpDir(): string { + return fs.mkdtempSync(path.join(os.tmpdir(), "tf-e2e-incr-")); +} +function mkState(def: Taskflow, cwd: string): RunState { + return { + runId: `run-${Math.random().toString(36).slice(2, 8)}`, + flowName: def.name, + def, + args: {}, + status: "running", + phases: {}, + createdAt: Date.now(), + updatedAt: Date.now(), + cwd, + }; +} +/** A deterministic runner: output is a pure function of the task text, so two + * runs with the same task produce byte-identical output (content-addressable). + * Records every executed task so we can assert exactly which phases ran. */ +function recordingRunner(record: string[]): RuntimeDeps["runTask"] { + return async (_cwd, _agents, agentName, task, _o: RunOptions): Promise => { + record.push(task); + return { + agent: agentName, + task, + exitCode: 0, + output: `out:${task}`, + stderr: "", + usage: { ...emptyUsage(), output: 10, cost: 0.003, turns: 1 }, + stopReason: "end", + }; + }; +} + +async function main() { + // ----------------------------------------------------------------------- + // 1 + 3 + 4: precise ir-changed diff under the incremental flag. + // An incremental flow scout→audit→report + an independent sibling. Run once, + // edit ONLY audit's task, re-run: scout & independent must hit cross-run + // (their per-phase fingerprints didn't move), audit must re-run. + // ----------------------------------------------------------------------- + section("precise ir-changed diff (incremental flow): edit one phase, reuse the rest"); + { + const dir = tmpDir(); + const store = new CacheStore(dir); + const mkDef = (auditTask: string): Taskflow => + ({ + name: "incr-precise", + incremental: true, + phases: [ + { id: "scout", type: "agent", agent: "a", task: "scan" }, + { id: "independent", type: "agent", agent: "a", task: "unrelated analysis" }, + { id: "audit", type: "agent", agent: "a", task: auditTask, dependsOn: ["scout"] }, + { + id: "report", + type: "agent", + agent: "a", + task: "report {steps.audit.output} {steps.independent.output}", + dependsOn: ["audit", "independent"], + final: true, + }, + ], + }) as Taskflow; + + // The flow declares incremental:true → resolveCacheScope opts it into cross-run. + const scope = resolveCacheScope(undefined, mkDef("audit {steps.scout.output}").incremental); + assert(scope === "cross-run", "flow.incremental=true → cross-run default scope"); + + const rec1: string[] = []; + const deps1: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: recordingRunner(rec1), cacheStore: store, cacheScopeDefault: scope }; + const r1 = await executeTaskflow(mkState(mkDef("audit v1 {steps.scout.output}"), dir), deps1); + assert(r1.ok, "run 1 completed"); + assert(rec1.length === 4, `run 1 executed all 4 phases (got ${rec1.length})`); + const s1 = summarizeReuse(r1.state); + assert(s1.executed === 4 && s1.reusedCrossRun === 0, "run 1 reuse summary: 4 executed, 0 reused"); + + // Edit ONLY audit's task. Re-run (fresh state, same store = cross-run). + const rec2: string[] = []; + const deps2: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: recordingRunner(rec2), cacheStore: store, cacheScopeDefault: scope }; + const r2 = await executeTaskflow(mkState(mkDef("audit v2 {steps.scout.output}"), dir), deps2); + assert(r2.ok, "run 2 completed"); + // scout + independent unchanged → their per-phase fingerprints didn't move + // → cross-run hit. audit changed → re-run. report reads audit → re-run. + assert(!rec2.includes("scan"), "scout reused cross-run (not re-executed)"); + assert(!rec2.includes("unrelated analysis"), "independent reused cross-run (the precise-diff win)"); + assert(rec2.some((t) => t.includes("audit v2")), "audit re-executed (its task changed)"); + const s2 = summarizeReuse(r2.state); + assert(s2.reusedCrossRun >= 2, `run 2 reused ≥2 phases cross-run (got ${s2.reusedCrossRun})`); + assert(s2.reusedCrossRun + s2.executed === 4, "run 2 accounting balances (reused + executed = 4)"); + fs.rmSync(dir, { recursive: true, force: true }); + } + + // ----------------------------------------------------------------------- + // 2: map item-level reuse — change one item's input, only it re-runs. + // ----------------------------------------------------------------------- + section("map item-level reuse: edit one fan-out item, rerun only that item"); + { + const dir = tmpDir(); + const store = new CacheStore(dir); + const mkDef = (items: string[]): Taskflow => + ({ + name: "incr-map", + incremental: true, + phases: [ + { id: "seed", type: "agent", agent: "a", task: "seed", output: "json" }, + { + id: "fan", + type: "map", + agent: "a", + over: JSON.stringify(items), + task: "process {item}", + dependsOn: [], + output: "json", + final: true, + }, + ], + }) as Taskflow; + + const rec1: string[] = []; + const deps1: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: recordingRunner(rec1), cacheStore: store, cacheScopeDefault: "cross-run" }; + await executeTaskflow(mkState(mkDef(["alpha", "beta", "gamma"]), dir), deps1); + const fanRuns1 = rec1.filter((t) => t.startsWith("process ")); + assert(fanRuns1.length === 3, `run 1 fanned out 3 items (got ${fanRuns1.length})`); + + // Change ONLY the middle item: beta → BETA2. + const rec2: string[] = []; + const deps2: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: recordingRunner(rec2), cacheStore: store, cacheScopeDefault: "cross-run" }; + const r2 = await executeTaskflow(mkState(mkDef(["alpha", "BETA2", "gamma"]), dir), deps2); + const fanRuns2 = rec2.filter((t) => t.startsWith("process ")); + assert(fanRuns2.length === 1, `run 2 re-executed only the changed item (got ${fanRuns2.length})`); + assert(fanRuns2[0] === "process BETA2", "the one re-executed item is the changed one"); + assert(!fanRuns2.includes("process alpha") && !fanRuns2.includes("process gamma"), "alpha & gamma reused per-item"); + // Order invariant: merged output stays aligned with `over`. + const out = r2.state.phases.fan?.json as unknown[] | undefined; + assert(Array.isArray(out) && out.length === 3, "merged output has all 3 items in order"); + fs.rmSync(dir, { recursive: true, force: true }); + } + + // ----------------------------------------------------------------------- + // 5: recompute decision trace — per-phase why + causedBy attribution. + // ----------------------------------------------------------------------- + section("recompute decision trace: per-phase why + upstream attribution"); + { + const dir = tmpDir(); + const def: Taskflow = { + name: "incr-trace", + concurrency: 1, + phases: [ + { id: "scout", type: "agent", agent: "a", task: "scan" }, + { id: "independent", type: "agent", agent: "a", task: "unrelated" }, + { id: "audit", type: "agent", agent: "a", task: "audit {steps.scout.output}", dependsOn: ["scout"] }, + { id: "report", type: "agent", agent: "a", task: "report {steps.audit.output} {steps.independent.output}", dependsOn: ["audit", "independent"], final: true }, + ], + } as Taskflow; + const rec: string[] = []; + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: recordingRunner(rec), cacheStore: new CacheStore(dir) }; + const state = mkState(def, dir); + await executeTaskflow(state, deps); + + const { report } = await recomputeTaskflow(state, deps, ["scout"], { dryRun: false }); + const byId = Object.fromEntries(report.decisions.map((d) => [d.phaseId, d])); + assert(byId.scout?.outcome === "rerun" && /seed/.test(byId.scout.reason), "scout: rerun (seed)"); + assert(byId.audit?.outcome === "rerun", "audit: rerun (upstream moved)"); + assert(JSON.stringify(byId.audit?.causedBy) === JSON.stringify(["scout"]), "audit rerun attributed to scout"); + assert(JSON.stringify(byId.report?.causedBy) === JSON.stringify(["audit"]), "report rerun attributed to audit (not scout)"); + assert(byId.independent?.outcome === "reused" && /not reachable/.test(byId.independent.reason), "independent: reused (unreachable)"); + assert(report.decisions.length === 4, "every phase is explained"); + fs.rmSync(dir, { recursive: true, force: true }); + } + + // ----------------------------------------------------------------------- + // 3 (negative): default is run-only — capability given, default NOT flipped. + // ----------------------------------------------------------------------- + section("default safety: without incremental, re-run does NOT reuse cross-run"); + { + const dir = tmpDir(); + const store = new CacheStore(dir); + const def: Taskflow = { + name: "incr-default-off", + phases: [{ id: "p", type: "agent", agent: "a", task: "work", final: true }], + } as Taskflow; + // No incremental flag anywhere → resolveCacheScope → run-only. + const scope = resolveCacheScope(undefined, def.incremental); + assert(scope === "run-only", "no incremental flag → run-only (default not flipped)"); + const rec1: string[] = []; + await executeTaskflow(mkState(def, dir), { cwd: dir, agents: AGENTS, runTask: recordingRunner(rec1), cacheStore: store, cacheScopeDefault: scope }); + const rec2: string[] = []; + await executeTaskflow(mkState(def, dir), { cwd: dir, agents: AGENTS, runTask: recordingRunner(rec2), cacheStore: store, cacheScopeDefault: scope }); + assert(rec1.length === 1 && rec2.length === 1, "run-only re-executes every run (no silent cross-run reuse)"); + fs.rmSync(dir, { recursive: true, force: true }); + } + + console.log(""); + if (failures === 0) { + console.log(C.ok(C.bold("All Incremental-Recompute E2E checks passed."))); + } else { + console.log(C.bad(C.bold(`${failures} Incremental-Recompute E2E check(s) FAILED.`))); + process.exit(1); + } +} + +main().catch((e) => { + console.error(e); + process.exit(1); +}); From 603eb34330625484fc6a28d39bf32ed3cbb549b2 Mon Sep 17 00:00:00 2001 From: heggria Date: Sat, 27 Jun 2026 14:33:43 +0800 Subject: [PATCH 4/5] feat(cache): add incremental flag and reuse reporting Add a flow-level and invocation-level `incremental` flag that defaults every phase to cross-run caching (scope:"cross-run"), so re-running a flow reuses unchanged phases without annotating each phase. The invocation arg wins over the flow field; per-phase cache settings and the cross-run-blocked types (gate/approval/loop/tournament) still take precedence; default stays run-only. Surface the effect: the end-of-run cache report and /tf recompute now show reused-vs-executed counts plus a per-phase "Why" trace (rerun/cutoff/reused/ failed with causedBy). Dollar figures are reported only for within-run reuse; cross-run hits are counted without inventing a saving. Also strip retry/concurrency/final from phaseFingerprint (none changes a phase's output, so a no-op config tweak no longer falsely invalidates), and fall back to whole-flow invalidation for join:"any" phases (they may read refs outside their declared dependsOn). Tests: add incremental-flag and reuse-summary suites; extend cache-phasefp and recompute coverage. --- extensions/flowir/phasefp.ts | 38 ++++++++--- extensions/index.ts | 71 ++++++++++++++++----- extensions/schema.ts | 6 ++ test/cache-phasefp.test.ts | 38 +++++++++++ test/incremental-flag.test.ts | 33 ++++++++++ test/recompute.test.ts | 88 +++++++++++++++++++++++++ test/reuse-summary.test.ts | 117 ++++++++++++++++++++++++++++++++++ 7 files changed, 365 insertions(+), 26 deletions(-) create mode 100644 test/incremental-flag.test.ts create mode 100644 test/reuse-summary.test.ts diff --git a/extensions/flowir/phasefp.ts b/extensions/flowir/phasefp.ts index a7f3c46..02eda69 100644 --- a/extensions/flowir/phasefp.ts +++ b/extensions/flowir/phasefp.ts @@ -23,13 +23,18 @@ * sub-structure is resolved at runtime (inline `def`) or from a saved * flow (`use`) and is not statically visible here. Editing the saved * sub-flow would not move this phase's sub-fingerprint. + * 3. **`join: "any"` phase** (`phase.join === "any"`): validation exempts it + * from the `{steps.X}`-must-be-in-`dependsOn` check, so it may read + * phases outside its static closure. The closure under-approximates its + * real reads, so fall back to whole-flow invalidation. * - * `cache` (the policy object) is the ONLY field stripped from each phase - * before hashing: its sub-fields (`scope`/`ttl`/`fingerprint`) are folded into - * the cache key through other paths (`cc.scope` gates the lookup, `cc.ttlMs` - * governs expiry, `cc.fingerprint` is in the key tail). Every other `Phase` - * field is hashed. `PhaseSchema` uses `additionalProperties: false`, so no - * surprise field can be missed. + * `cache`, `retry`, `concurrency`, and `final` are stripped from each phase + * before hashing: none of them changes the subagent's OUTPUT (they are policy, + * execution mechanics, or result selection). `cache`'s sub-fields + * (`scope`/`ttl`/`fingerprint`) reach the cache key through other paths + * (`cc.scope` gates the lookup, `cc.ttlMs` governs expiry, `cc.fingerprint` is + * in the key tail). Every other `Phase` field is hashed. `PhaseSchema` uses + * `additionalProperties: false`, so no surprise field can be missed. * * Pure + async (Web Crypto via `hashCanonical`). Reuses the vendored * `canonicalJson`/`hashCanonical` (byte-identical to overstory's contract) so @@ -42,10 +47,17 @@ import { transitiveDependencies, type Phase, type Taskflow } from "../schema.ts"; import { canonicalJson, hashCanonical } from "./hash.ts"; -/** Policy field stripped before hashing (its sub-fields reach the key via - * `cc.scope` / `cc.ttlMs` / `cc.fingerprint` — folding them here would be - * recursive and redundant). This is the ONLY field stripped. */ -const PHASE_FP_STRIP = ["cache"] as const; +/** Fields stripped before hashing because they do NOT affect a phase's + * subagent OUTPUT, only execution mechanics or result selection — folding + * them in would cause false cache invalidation on a no-op config change: + * - `cache`: policy object; its sub-fields reach the key via + * `cc.scope`/`cc.ttlMs`/`cc.fingerprint`. + * - `retry`: retry/backoff is execution mechanics; a successful phase + * produces the same output regardless of how many attempts it took. + * - `concurrency`: fan-out parallelism; does not change any item's output. + * - `final`: marks which phase's output is the flow result; does not change + * the phase's own output. */ +const PHASE_FP_STRIP = ["cache", "retry", "concurrency", "final"] as const; /** Clone a phase into a plain record with policy fields removed. */ function stripPolicy(phase: Phase): Record { @@ -74,6 +86,12 @@ export async function phaseFingerprint(def: Taskflow, phaseId: string): Promise< // --- Soundness gate: fall back to whole-flow when static closure is unsafe. --- // Flow-wide context sharing enables cross-sibling reads outside declared deps. if (def.contextSharing === true) return undefined; + // A `join: "any"` phase may interpolate `{steps.X.*}` refs to phases OUTSIDE + // its declared dependsOn (validation deliberately exempts it — schema.ts), so + // the static closure under-approximates its real reads. Fall back to + // whole-flow invalidation rather than rely on the key tail alone (which would + // be an undocumented coupling). Safe, = pre-M6 behavior. + if (phase.join === "any") return undefined; const closureIds = transitiveDependencies(phases, phaseId); const closurePhases: Phase[] = []; diff --git a/extensions/index.ts b/extensions/index.ts index 984a63f..d87c39f 100644 --- a/extensions/index.ts +++ b/extensions/index.ts @@ -28,7 +28,7 @@ import { type AgentScope, discoverAgents, readSubagentSettings, shouldSyncBuilti import { renderRunResult, summarizeRun } from "./render.ts"; import { RunHistoryComponent, type RunHistoryResult } from "./runs-view.ts"; import { ApprovalViewComponent, type ApprovalChoice } from "./approval-view.ts"; -import { executeTaskflow, recomputeTaskflow, type ApprovalDecision, type ApprovalRequest, type RecomputeReport, type RuntimeDeps, type RuntimeResult } from "./runtime.ts"; +import { executeTaskflow, recomputeTaskflow, summarizeReuse, type ApprovalDecision, type ApprovalRequest, type RecomputeReport, type RuntimeDeps, type RuntimeResult } from "./runtime.ts"; import { type UsageStats } from "./usage.ts"; import { finalPhase, resolveArgs, type Taskflow, validateTaskflow, desugar, isShorthand } from "./schema.ts"; import { @@ -150,6 +150,12 @@ const TaskflowParams = Type.Object({ description: "Run in background (detached child process); return runId immediately. Status polled via store.", }), ), + incremental: Type.Optional( + Type.Boolean({ + description: + "For action=run: default every phase to cross-run caching so re-running the flow reuses unchanged phases across runs/sessions (incremental recompute). Overrides the flow's own `incremental` field. Per-phase cache settings and cross-run-blocked types (gate/approval/loop/tournament) still take precedence. Omit to use the flow's setting (default: run-only — fresh each run).", + }), + ), }); function formatFlowIR(ir: TaskflowIR): string { @@ -225,6 +231,17 @@ function formatRecompute(r: RecomputeReport): string { if (r.cutoff.length > 0) lines.push(` → saved ${r.cutoff.length} re-execution(s).`); } lines.push(`✓ reused (outside frontier): ${r.reused.join(", ") || "—"}`); + // Per-phase "why" — the explainable-reactivity trace (like React DevTools + // telling you why each component re-rendered). Only shown when present. + if (r.decisions && r.decisions.length > 0) { + const glyph: Record = { rerun: "▲", cutoff: "✂", reused: "✓", failed: "✗" }; + lines.push(""); + lines.push("Why:"); + for (const d of r.decisions) { + const cause = d.causedBy && d.causedBy.length ? ` ← ${d.causedBy.join(", ")}` : ""; + lines.push(` ${glyph[d.outcome] ?? "•"} ${d.phaseId}: ${d.reason}${cause}`); + } + } return lines.join("\n"); } @@ -242,6 +259,18 @@ function makeRunState(def: Taskflow, args: Record, cwd: string) }; } +/** Resolve the run-wide default cache scope from the incremental flags. The + * invocation-level override (the `incremental` tool arg) wins; otherwise the + * flow's own `incremental` field; otherwise the safe `run-only` default + * (each run starts fresh — cross-run reuse is opt-in). Exported for testing. */ +export function resolveCacheScope( + incrementalOverride: boolean | undefined, + flowIncremental: boolean | undefined, +): "cross-run" | "run-only" { + const on = typeof incrementalOverride === "boolean" ? incrementalOverride : flowIncremental; + return on === true ? "cross-run" : "run-only"; +} + async function runFlow( def: Taskflow, args: Record, @@ -249,6 +278,9 @@ async function runFlow( signal: AbortSignal | undefined, onUpdate: ((p: AgentToolResult) => void) | undefined, existing?: RunState, + // Invocation-level incremental override: when set, wins over def.incremental. + // undefined → fall back to the flow's own `incremental` field (default off). + incrementalOverride?: boolean, ): Promise { const state = existing ?? makeRunState(def, args, ctx.cwd); @@ -374,11 +406,15 @@ async function runFlow( persist: persistThrottled, requestApproval, loadFlow: (name: string) => getFlow(ctx.cwd, name)?.def, - // Cross-run cache is opt-in per phase (cache:{scope:"cross-run"}). - // Defaulting every real run to cross-run was reviewed out: it silently - // persists phase outputs and can serve stale results for phases whose - // agents read files at runtime (those files are not in the cache key). - cacheScopeDefault: "run-only", + // Cross-run cache is opt-in. By default a real run is `run-only` (fresh + // each run): defaulting every phase to cross-run silently persists + // outputs and can serve stale results for phases whose agents read files + // at runtime (those files are not in the cache key). A user opts in + // explicitly — the invocation `incremental` arg wins, else the flow's + // own `incremental` field, else the safe run-only default. All the + // soundness fallbacks (blocked types, per-phase fingerprint, shareContext) + // still apply per phase inside executePhase. + cacheScopeDefault: resolveCacheScope(incrementalOverride, def.incremental), }); // Auto-report cache savings at the end of a real run so the user sees the // M1-M5 effect without running a separate /tf command. @@ -958,7 +994,7 @@ export default function (pi: ExtensionAPI) { }; } - const result = await runFlow(def, args, ctx, signal, onUpdate as any); + const result = await runFlow(def, args, ctx, signal, onUpdate as any, undefined, params.incremental as boolean | undefined); // Surface the validation warnings in the tool result so the model // can acknowledge or fix them, and the user sees them in the chat. if (v.warnings.length) { @@ -1399,15 +1435,18 @@ function errorResult(action: string, message: string): ToolResult { }; } -function formatCacheReport(state: RunState, totalUsage: UsageStats): string { - const cached = Object.values(state.phases).filter((p) => p.cacheHit === "cross-run"); - if (cached.length === 0) return ""; - // Honest reporting: we know these phases spent 0 tokens *this run* because - // they were served from cache. We do NOT estimate dollars/tokens "saved" — - // that requires guessing what a re-execution would have cost, and the mix of - // cheap vs expensive phases (tournament/loop) makes such a guess misleading. - const cachedTokens = cached.reduce((sum, p) => sum + ((p.usage?.input ?? 0) + (p.usage?.output ?? 0)), 0); - return `💾 ${cached.length} phase(s) reused from cross-run cache (${cachedTokens.toLocaleString()} tokens spent on them this run)`; +function formatCacheReport(state: RunState, _totalUsage: UsageStats): string { + const r = summarizeReuse(state); + const reused = r.reusedRunOnly + r.reusedCrossRun; + if (reused === 0) return ""; // nothing reused — no incremental story to tell + // Honest framing: report reused-vs-executed counts, and a dollar figure only + // for within-run reuse (where the prior usage is preserved). Cross-run hits + // zero their usage, so their original cost is genuinely unknown — we say + // "reused" without inventing a savings number for them. + const parts: string[] = [`♻️ ${reused}/${r.done} phase(s) reused (${r.executed} executed this run)`]; + if (r.savedUSD > 0) parts.push(`~$${r.savedUSD.toFixed(4)} of re-execution avoided`); + if (r.reusedCrossRun > 0) parts.push(`${r.reusedCrossRun} from cross-run cache`); + return parts.join(" · "); } function finalResult(action: string, result: RuntimeResult): ToolResult { diff --git a/extensions/schema.ts b/extensions/schema.ts index 8e80a17..2154f2c 100644 --- a/extensions/schema.ts +++ b/extensions/schema.ts @@ -284,6 +284,12 @@ export const TaskflowSchema = Type.Object( "Enable the Shared Context Tree for ALL phases in this flow (shorthand for setting shareContext on every phase). Default false.", }), ), + incremental: Type.Optional( + Type.Boolean({ + description: + "Default every phase to cross-run caching (scope:'cross-run') so re-running this flow reuses unchanged phases across runs/sessions. Equivalent to setting cache:{scope:'cross-run'} on every phase; per-phase cache settings and the cross-run-blocked types (gate/approval/loop/tournament) still take precedence. Default false (run-only — each run starts fresh unless a phase opts in). A run-time `incremental` argument overrides this.", + }), + ), phases: Type.Array(PhaseSchema, { minItems: 1, description: "Ordered phase definitions (DAG via dependsOn)" }), }, { additionalProperties: false }, diff --git a/test/cache-phasefp.test.ts b/test/cache-phasefp.test.ts index ce23446..4ae84b6 100644 --- a/test/cache-phasefp.test.ts +++ b/test/cache-phasefp.test.ts @@ -272,3 +272,41 @@ test("phasefp: shareContext falls back to whole-flow invalidation", async () => assert.equal(r2.state.phases.B.cacheHit, undefined, "B missed (its task changed)"); fs.rmSync(dir, { recursive: true, force: true }); }); + +// --------------------------------------------------------------------------- +// Hardening (risk review M-1 / L-1 / L-2): join:"any" soundness fallback, and +// operational/result-selection fields stripped to avoid false invalidation. +// --------------------------------------------------------------------------- + +test("phaseFingerprint: a join:any phase falls back to whole-flow (soundness)", async () => { + // C declares dependsOn [B] with join:any but interpolates {steps.A.output}. + // Its real reads escape the static closure, so per-phase diffing is unsound → + // fingerprint must be undefined (caller uses whole-flow flowDefHash). + const def: Taskflow = { + name: "join-any", + phases: [ + { id: "A", type: "agent", agent: "a", task: "produce" }, + { id: "B", type: "agent", agent: "a", task: "fast" }, + { id: "C", type: "agent", agent: "a", task: "use {steps.A.output}", dependsOn: ["B"], join: "any", final: true }, + ], + } as Taskflow; + assert.equal(await phaseFingerprint(def, "C"), undefined, "join:any → fallback"); + // A and B are ordinary phases → still get a precise fingerprint. + assert.ok(await phaseFingerprint(def, "A")); + assert.ok(await phaseFingerprint(def, "B")); +}); + +test("phaseFingerprint: retry / concurrency / final do NOT move the sub-fingerprint", async () => { + const mk = (extra: Record): Taskflow => ({ + name: "ops-inv", + phases: [ + { id: "p", type: "agent", agent: "a", task: "t", cache: { scope: "cross-run" }, ...extra }, + { id: "q", type: "agent", agent: "a", task: "u {steps.p.output}", dependsOn: ["p"], final: true }, + ], + }) as Taskflow; + const base = await phaseFingerprint(mk({ final: true }), "p"); + // Adding retry/concurrency, or moving `final`, must not perturb p's output hash. + assert.equal(await phaseFingerprint(mk({ final: true, retry: { max: 3 } }), "p"), base, "retry stripped"); + assert.equal(await phaseFingerprint(mk({ final: true, concurrency: 4 }), "p"), base, "concurrency stripped"); + assert.equal(await phaseFingerprint(mk({}), "p"), base, "final marker stripped"); +}); diff --git a/test/incremental-flag.test.ts b/test/incremental-flag.test.ts new file mode 100644 index 0000000..38f9ddd --- /dev/null +++ b/test/incremental-flag.test.ts @@ -0,0 +1,33 @@ +import assert from "node:assert/strict"; +import { test } from "node:test"; +import { resolveCacheScope } from "../extensions/index.ts"; + +// The `incremental` flag (flow-level def.incremental, or the invocation-level +// override) maps to the run-wide default cache scope. Default is the safe +// run-only (cross-run reuse is opt-in); the invocation override wins over the +// flow setting. This pins the C-option contract: capability given, default +// NOT flipped. + +test("resolveCacheScope: default (neither set) is run-only — safe, no flip", () => { + assert.equal(resolveCacheScope(undefined, undefined), "run-only"); +}); + +test("resolveCacheScope: flow.incremental=true opts the whole flow into cross-run", () => { + assert.equal(resolveCacheScope(undefined, true), "cross-run"); +}); + +test("resolveCacheScope: flow.incremental=false stays run-only", () => { + assert.equal(resolveCacheScope(undefined, false), "run-only"); +}); + +test("resolveCacheScope: invocation override wins over the flow setting", () => { + // override=true beats flow=false + assert.equal(resolveCacheScope(true, false), "cross-run"); + // override=false beats flow=true (lets a user force a fresh run) + assert.equal(resolveCacheScope(false, true), "run-only"); +}); + +test("resolveCacheScope: override undefined falls back to the flow setting", () => { + assert.equal(resolveCacheScope(undefined, true), "cross-run"); + assert.equal(resolveCacheScope(undefined, false), "run-only"); +}); diff --git a/test/recompute.test.ts b/test/recompute.test.ts index 1223bf4..5649f14 100644 --- a/test/recompute.test.ts +++ b/test/recompute.test.ts @@ -415,3 +415,91 @@ test("recompute: flagship — re-seed with an unchanged output cuts off the whol assert.deepEqual([...report.cutoff].sort(), ["audit", "report"], "the downstream is cut off transitively"); assert.equal(record.length, executedBefore + 1, "exactly one re-execution (the seed); downstream hit cache"); }); + +// --------------------------------------------------------------------------- +// Per-phase decision trace (the "explainable reactivity" AC): every phase in +// the report carries a reason, and a rerun/cutoff is attributed to the +// upstream(s) that caused it. +// --------------------------------------------------------------------------- + +test("recompute: decision trace attributes each rerun to the changed upstream", async () => { + const record: string[] = []; + let scoutVersion = "V1"; + const def: Taskflow = { + name: "trace-cascade", + concurrency: 1, + phases: [ + { id: "scout", type: "agent", agent: "a", task: "scan" }, + { id: "independent", type: "agent", agent: "a", task: "unrelated" }, + { id: "audit", type: "agent", agent: "a", task: "audit {steps.scout.output}", dependsOn: ["scout"] }, + { + id: "report", + type: "agent", + agent: "a", + task: "report {steps.audit.output} {steps.independent.output}", + dependsOn: ["audit", "independent"], + final: true, + }, + ], + } as Taskflow; + const deps = baseDeps(mockRunner((t) => (t === "scan" ? `out:${scoutVersion}` : `out:${t}`), record)); + const state = mkState(def); + await executeTaskflow(state, deps); + + scoutVersion = "V2"; + const { report } = await recomputeTaskflow(state, deps, ["scout"], { dryRun: false }); + const byId = Object.fromEntries(report.decisions.map((d) => [d.phaseId, d])); + + assert.equal(byId.scout.outcome, "rerun"); + assert.match(byId.scout.reason, /seed/); + assert.equal(byId.audit.outcome, "rerun"); + assert.deepEqual(byId.audit.causedBy, ["scout"], "audit's rerun is attributed to scout"); + assert.deepEqual(byId.report.causedBy, ["audit"], "report's rerun is attributed to audit, not scout"); + assert.equal(byId.independent.outcome, "reused"); + assert.match(byId.independent.reason, /not reachable/); + // Every phase is explained. + assert.equal(report.decisions.length, 4); +}); + +test("recompute: decision trace marks early-cutoff with its (unchanged) upstream cause", async () => { + const record: string[] = []; + const def: Taskflow = { + name: "trace-cutoff", + concurrency: 1, + phases: [ + { id: "scout", type: "agent", agent: "a", task: "scan" }, + { id: "audit", type: "agent", agent: "a", task: "audit {steps.scout.output}", dependsOn: ["scout"] }, + { id: "report", type: "agent", agent: "a", task: "report {steps.audit.output}", dependsOn: ["audit"], final: true }, + ], + } as Taskflow; + // scout's output is stable across re-seeds → downstream cuts off. + const deps = baseDeps(mockRunner((t) => (t === "scan" ? "out:STABLE" : `out:${t}`), record)); + const state = mkState(def); + await executeTaskflow(state, deps); + + const { report } = await recomputeTaskflow(state, deps, ["scout"], { dryRun: false }); + const byId = Object.fromEntries(report.decisions.map((d) => [d.phaseId, d])); + + assert.equal(byId.scout.outcome, "rerun", "seed always re-runs"); + assert.equal(byId.audit.outcome, "cutoff"); + assert.match(byId.audit.reason, /identical output|unchanged/); + assert.deepEqual(byId.audit.causedBy, ["scout"]); + assert.equal(byId.report.outcome, "cutoff"); +}); + +test("recompute: dry-run decision trace explains the worst-case frontier", async () => { + const record: string[] = []; + const deps = baseDeps(mockRunner((t) => `out:${t}`, record)); + const state = mkState(DEF); + await executeTaskflow(state, deps); + + const { report } = await recomputeTaskflow(state, deps, ["scout"], { dryRun: true }); + const byId = Object.fromEntries(report.decisions.map((d) => [d.phaseId, d])); + + assert.equal(byId.scout.outcome, "rerun"); + assert.match(byId.scout.reason, /seed/); + // audit + report are in the frontier → "may re-run", attributed upstream. + assert.equal(byId.audit.outcome, "rerun"); + assert.match(byId.audit.reason, /may re-run|stale frontier/); + assert.equal(record.length, 3, "dry-run did not execute anything beyond the initial run"); +}); diff --git a/test/reuse-summary.test.ts b/test/reuse-summary.test.ts new file mode 100644 index 0000000..dd387e0 --- /dev/null +++ b/test/reuse-summary.test.ts @@ -0,0 +1,117 @@ +import assert from "node:assert/strict"; +import * as fs from "node:fs"; +import * as os from "node:os"; +import * as path from "node:path"; +import { test } from "node:test"; +import type { AgentConfig } from "../extensions/agents.ts"; +import { CacheStore } from "../extensions/cache.ts"; +import { executeTaskflow, summarizeReuse, type RuntimeDeps } from "../extensions/runtime.ts"; +import type { RunResult, RunOptions } from "../extensions/runner.ts"; +import type { Taskflow } from "../extensions/schema.ts"; +import type { RunState } from "../extensions/store.ts"; +import { emptyUsage } from "../extensions/usage.ts"; + +// summarizeReuse: the incremental-reuse accounting behind the run summary. +// A phase counts as reused iff it carries a `cacheHit` marker (within-run +// resume → "run-only"; cross-run store → "cross-run"). + +const AGENTS: AgentConfig[] = [ + { name: "a", description: "test agent", systemPrompt: "", source: "user", filePath: "" }, +]; + +function tmpDir(): string { + return fs.mkdtempSync(path.join(os.tmpdir(), "tf-reuse-")); +} + +function mkState(def: Taskflow, cwd: string): RunState { + return { + runId: `run-${Math.random().toString(36).slice(2, 8)}`, + flowName: def.name, + def, + args: {}, + status: "running", + phases: {}, + createdAt: Date.now(), + updatedAt: Date.now(), + cwd, + }; +} + +function runner(): RuntimeDeps["runTask"] { + return async (_cwd, _agents, agentName, task, _o: RunOptions): Promise => ({ + agent: agentName, + task, + exitCode: 0, + output: `out:${task}`, + stderr: "", + usage: { ...emptyUsage(), output: 10, cost: 0.002, turns: 1 }, + stopReason: "end", + }); +} + +const CHAIN: Taskflow = { + name: "reuse-chain", + phases: [ + { id: "scout", type: "agent", agent: "a", task: "scan" }, + { id: "audit", type: "agent", agent: "a", task: "audit {steps.scout.output}", dependsOn: ["scout"] }, + { id: "report", type: "agent", agent: "a", task: "report {steps.audit.output}", dependsOn: ["audit"], final: true }, + ], +} as Taskflow; + +test("summarizeReuse: a first run executes every phase, reuses none", async () => { + const dir = tmpDir(); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: runner() }; + const r = await executeTaskflow(mkState(CHAIN, dir), deps); + + const s = summarizeReuse(r.state); + assert.equal(s.executed, 3, "all three phases executed"); + assert.equal(s.reusedRunOnly, 0); + assert.equal(s.reusedCrossRun, 0); + assert.equal(s.done, 3); + assert.equal(s.savedUSD, 0, "nothing reused → nothing saved"); + assert.deepEqual(r.reuse, s, "RuntimeResult.reuse matches summarizeReuse(state)"); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +test("summarizeReuse: resuming a completed run reuses every phase within-run (savedUSD > 0)", async () => { + const dir = tmpDir(); + const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: runner() }; + const state = mkState(CHAIN, dir); + await executeTaskflow(state, deps); + + // Re-run the SAME state object: every phase is already `done` with a matching + // inputHash → the within-run resume path serves each from its prior. + const r2 = await executeTaskflow(state, deps); + const s = summarizeReuse(r2.state); + + assert.equal(s.executed, 0, "nothing re-executed on resume"); + assert.equal(s.reusedRunOnly, 3, "all three reused within-run"); + assert.equal(s.reusedCrossRun, 0); + assert.equal(s.done, 3); + // Each phase preserved its prior usage (cost 0.002) → 3 × 0.002 saved. + assert.ok(Math.abs(s.savedUSD - 0.006) < 1e-9, `savedUSD should be ~0.006, got ${s.savedUSD}`); + fs.rmSync(dir, { recursive: true, force: true }); +}); + +test("summarizeReuse: a second run under cross-run cache counts cross-run reuse", async () => { + const dir = tmpDir(); + const store = new CacheStore(dir); + const deps: RuntimeDeps = { + cwd: dir, + agents: AGENTS, + runTask: runner(), + cacheStore: store, + cacheScopeDefault: "cross-run", + }; + await executeTaskflow(mkState(CHAIN, dir), deps); + // A fresh state (new runId) re-running the same flow hits the cross-run store. + const r2 = await executeTaskflow(mkState(CHAIN, dir), deps); + const s = summarizeReuse(r2.state); + + assert.equal(s.reusedCrossRun, 3, "all three restored from cross-run cache"); + assert.equal(s.executed, 0, "nothing executed the second run"); + // Cross-run hits zero their usage → original cost not recoverable. + assert.equal(s.savedUSD, 0, "cross-run reuse does not claim a dollar figure"); + assert.equal(s.done, 3); + fs.rmSync(dir, { recursive: true, force: true }); +}); From 74413929fee2fb534ce617890c17030177e8a1bc Mon Sep 17 00:00:00 2001 From: heggria Date: Sat, 27 Jun 2026 14:33:49 +0800 Subject: [PATCH 5/5] =?UTF-8?q?chore(release):=20v0.0.28=20=E2=80=94=20per?= =?UTF-8?q?-phase=20+=20per-item=20granular=20cache=20reuse?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bump to 0.0.28. Document the granular-reuse release: per-phase structural sub-fingerprint (v3:phasefp), per-item map caching, the incremental flag, and reuse reporting. Refresh README test counts (804 -> 846 across 46 files) and add per-item map caching to the headline. Document the incremental flag and its precedence in the taskflow skill. --- CHANGELOG.md | 54 ++++++++++++++++++++++++++++++++ README.md | 6 ++-- package.json | 2 +- skills/taskflow/SKILL.md | 2 +- skills/taskflow/configuration.md | 22 +++++++++++++ 5 files changed, 81 insertions(+), 5 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index affd152..f92560d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,60 @@ All notable changes to pi-taskflow are documented here. This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) format. +## [0.0.28] — 2026-06-27 + +> Granular-reuse release: **incremental recompute goes from whole-flow to +> per-phase and per-item.** v0.0.27 *proved* the recompute cost win; this +> release makes that win far larger and easier to opt into. Editing one phase +> now invalidates only that phase and its transitive dependents (a sibling keeps +> its cache hit), a `map` phase re-executes only the items that actually changed, +> and a single `incremental` flag flips a whole flow into cross-run reuse without +> annotating every phase. + +### Added +- **Per-phase structural sub-fingerprint (`v3:phasefp`).** The cache key now + folds a per-phase fingerprint — the phase plus its transitive `dependsOn ∪ from` + closure — instead of the whole-flow `v2:flowdef` hash. Editing phase B + invalidates only B and its dependents; an independent sibling A keeps its hit. + `cacheKeys` emits a 4-tier read ladder (`v3:phasefp` write → `v2:flowdef` → + bare flowdef → legacy, all read-only) so the upgrade is additive — no + miss-storm for unchanged flows. Fail-open: any per-phase error degrades that + phase to the whole-flow hash. Soundness fallback to whole-flow when per-phase + invalidation can't be statically guaranteed (flow-wide `contextSharing`, any + `shareContext` phase in the closure, `join: "any"`, or sub-flow inner phases). + (`extensions/flowir/phasefp.ts`, `test/cache-phasefp.test.ts` — 11 tests.) +- **Per-item cross-run caching for `map` phases.** When one of N items changes + between runs, only that item re-executes (N−1 cache hits) while the whole-map + fast path and every soundness fallback stay intact. Per-item keys omit the + structural fingerprint (which hashes the whole `over` source) so changing one + item no longer moves every key at once; they fold `[phase.id, it.agent, model, + it.task]` + the world-state tail, so task/agent/upstream/world changes still + invalidate the right items. Disabled (whole-map only) under run-only/off scope, + `shareContext`/flow-wide `contextSharing`, or inside a runtime-generated + sub-flow. (`test/cache-peritem.test.ts` — 11 tests.) +- **`incremental` flag** — flow-level (`TaskflowSchema.incremental`) and + invocation-level (`run` tool arg). Defaults every phase to `scope:"cross-run"` + so re-running a flow reuses unchanged phases across runs/sessions, without + annotating each phase. The invocation arg wins over the flow field; per-phase + cache settings and the cross-run-blocked types (gate/approval/loop/tournament) + still take precedence; default remains the safe `run-only` (fresh each run). + (`resolveCacheScope` in `extensions/index.ts`, `test/incremental-flag.test.ts`.) +- **Reuse reporting.** The end-of-run cache report and `/tf recompute` now show + reused-vs-executed counts and a per-phase "Why" trace (the explainable- + reactivity view: `▲ rerun / ✂ cutoff / ✓ reused / ✗ failed`, with `← causedBy`). + Dollar figures are reported only for within-run reuse, where the prior usage is + preserved; cross-run hits are counted but never attributed an invented saving. + (`summarizeReuse` / `RecomputeDecision` in `extensions/runtime.ts`, + `test/reuse-summary.test.ts`.) +- Tests: 804 → 846 (+42). + +### Changed +- **`phaseFingerprint` strips more policy fields** (`cache`, `retry`, + `concurrency`, `final`): none changes a phase's subagent *output*, so a no-op + config tweak no longer causes false cache invalidation. +- **README** test count and feature line refreshed (804 → 846 across 46 files); + `per-item map caching` added to the headline capabilities. + ## [0.0.27] — 2026-06-25 > Evidence release: **the incremental-recompute cost win is now proven, not diff --git a/README.md b/README.md index 34e6d21..86dcc34 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ MIT license zero runtime dependencies CI status - 804 tests + 846 tests dogfooded for the Pi coding agent

@@ -728,12 +728,12 @@ Copy one into `.pi/taskflows/.json` (or `~/.pi/agent/taskflows/`) and it r
-**0 runtime dependencies** · **804 tests** · **9 phase types** · **shared context tree** · **cross-session resume** · **cross-run memoization** · **incremental recompute** · **FlowIR compile seam** · **detached execution** · **`compile` Mermaid renderer** · **~9k LOC runtime** +**0 runtime dependencies** · **846 tests** · **9 phase types** · **shared context tree** · **cross-session resume** · **cross-run memoization** · **per-item map caching** · **incremental recompute** · **FlowIR compile seam** · **detached execution** · **`compile` Mermaid renderer** · **~9k LOC runtime**
- **Zero runtime dependencies.** No `dependencies` field — the runtime is built entirely on Node built-ins (`fs` / `path` / `os` / `child_process` / `crypto`). The file lock is `fs.openSync("wx")`, not a third-party library. -- **804 tests across 42 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), backward-compatible cache-key migration (3-tier legacy fallback), the FlowIR compile seam (determinism, declared-plane synthesis), incremental recompute (early-cutoff propagation, partial cascade strictly < full, observed ∪ declared union frontier), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, the shared context tree (blackboard reuse, supervision spawn, subflow validation/nesting), workspace isolation (temp/dedicated/worktree lifecycle, fail-open degrade, dynamic-flow rejection), dynamic sub-flow security hardening, detached execution (PID persistence, stale detection, crash→failed, resume after failure), live run-history refresh, callback isolation, the idle watchdog, model-role init config, parseModelFromLabel with parenthesized-model-name regression, and multi-fence `safeParse` recovery, plus the `compile` Mermaid renderer (id-collision disambiguation, markdown-injection hardening, and full verify-overlay category coverage). +- **846 tests across 46 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), backward-compatible cache-key migration (4-tier legacy fallback), per-phase structural sub-fingerprint (v3:phasefp — editing one phase invalidates only it and its dependents), per-item map caching (one changed item re-executes, N−1 cache hits), the `incremental` flag (run-wide cross-run default), reuse reporting, the FlowIR compile seam (determinism, declared-plane synthesis), incremental recompute (early-cutoff propagation, partial cascade strictly < full, observed ∪ declared union frontier), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, the shared context tree (blackboard reuse, supervision spawn, subflow validation/nesting), workspace isolation (temp/dedicated/worktree lifecycle, fail-open degrade, dynamic-flow rejection), dynamic sub-flow security hardening, detached execution (PID persistence, stale detection, crash→failed, resume after failure), live run-history refresh, callback isolation, the idle watchdog, model-role init config, parseModelFromLabel with parenthesized-model-name regression, and multi-fence `safeParse` recovery, plus the `compile` Mermaid renderer (id-collision disambiguation, markdown-injection hardening, and full verify-overlay category coverage). - **Hardened by design.** Path-traversal defense (lexical + `realpath` containment check), runId validation, HTML/error sanitization, atomic writes, stale-lock stealing via `rename`, and an idle watchdog that kills wedged subagents (SIGTERM → SIGKILL after 5 minutes of silence). Dynamic sub-flows additionally get breadth caps, `cwd` containment, budget clamping, nesting depth caps, and prototype-pollution defense. - **Dogfooded.** Every new feature has to survive the project's own `self-improve` taskflow before it ships. diff --git a/package.json b/package.json index d520ccf..d89c42f 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "pi-taskflow", - "version": "0.0.27", + "version": "0.0.28", "description": "A declarative, verifiable graph of task nodes for the Pi coding agent — not a workflow you script, but a DAG you declare: statically verified before it runs, with dynamic fan-out, gates, isolated subagent context, resumable runs, and saveable commands.", "keywords": [ "pi-package", diff --git a/skills/taskflow/SKILL.md b/skills/taskflow/SKILL.md index cd10531..a8863f1 100644 --- a/skills/taskflow/SKILL.md +++ b/skills/taskflow/SKILL.md @@ -549,7 +549,7 @@ Quick reference: - **Flow:** `name`, `description`, `concurrency` (default 8), `budget` (`maxUSD`/`maxTokens`), `agentScope` (user|project|both), `args`, `strictInterpolation`. - **Phase:** `model`, `thinking`, `tools` (whitelist), `cwd`, `output:"json"`, `concurrency` (map/parallel fan-out), `when`, `join` (all|any), `retry`, `use`/`with` (flow), `optional` (fail-soft — a failed/blocked phase won't abort the run), `final`. -- **Cross-run caching:** add `cache: { "scope": "cross-run" }` to a phase to memoize its output across runs (same input → instant reuse, zero tokens). See `configuration.md` for `ttl`, `fingerprint` (git/glob/file/env invalidation), and scope options. +- **Cross-run caching:** add `cache: { "scope": "cross-run" }` to a phase to memoize its output across runs (same input → instant reuse, zero tokens), or set `incremental: true` at the flow level (or pass `incremental: true` to `run`) to default every phase to cross-run reuse. See `configuration.md` for `ttl`, `fingerprint` (git/glob/file/env invalidation), scope options, and the `incremental` precedence rules. - **Precedence (model/thinking/tools):** phase value → agent frontmatter (resolved via `modelRoles`) → global/default. - **Concurrency:** same-layer phases use `flow.concurrency`; a `map`/`parallel` phase uses `phase.concurrency ?? flow.concurrency ?? 8`. diff --git a/skills/taskflow/configuration.md b/skills/taskflow/configuration.md index 22fa9f9..7476933 100644 --- a/skills/taskflow/configuration.md +++ b/skills/taskflow/configuration.md @@ -283,6 +283,28 @@ for the design. | `cross-run` | Reuse an identical-input result from **any** prior run (the persistent store). | | `off` | Never reuse, even within a run (force re-execution every time). | +### Flow-wide opt-in: `incremental` + +Rather than annotating every phase with `cache: { "scope": "cross-run" }`, set +`incremental: true` at the **flow** level (or pass `incremental: true` as the +`run` tool argument) to default *every* phase to cross-run reuse: + +```jsonc +{ + "name": "audit", + "incremental": true, // ← every phase defaults to scope:"cross-run" + "phases": [ /* ... */ ] +} +``` + +Precedence: the invocation `incremental` argument wins over the flow's +`incremental` field, which is in turn overridden by any **per-phase** `cache` +setting. The cross-run-blocked phase types (`gate`/`approval`/`loop`/ +`tournament`) and all per-phase soundness fallbacks still apply. The default +remains `run-only` (each run starts fresh unless something opts in), because +cross-run reuse silently persists outputs and can serve stale results for phases +whose agents read files at runtime. + ### `ttl` (cross-run only) Max age before a cross-run hit is treated as a miss: e.g. `"30m"`, `"6h"`, `"7d"`.