From c11fbd9abede76b8bcf59310a6aa6d8ce8ff617e Mon Sep 17 00:00:00 2001
From: heggria <bshengtao@gmail.com>
Date: Thu, 25 Jun 2026 19:30:16 +0800
Subject: [PATCH 1/5] feat(cache): per-phase structural sub-fingerprint
 (v3:phasefp)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Replace the whole-flow v2:flowdef cache-key tier with a per-phase
structural sub-fingerprint so editing phase B invalidates only B and its
transitive dependents — independent sibling phase A keeps its cache hit.

phaseFingerprint(def, phaseId) (extensions/flowir/phasefp.ts) hashes the
phase plus its transitive dependsOn ∪ from closure, reusing the vendored
canonicalJson + hashCanonical (byte-identical to overstory's contract).
Only the policy field cache is stripped; every other Phase field is hashed.

Soundness fallback: phaseFingerprint returns undefined (→ caller folds the
whole-flow flowDefHash, preserving pre-M6 behavior) when per-phase
invalidation cannot be statically guaranteed — contextSharing at flow level,
any shareContext phase in the closure, or any flow phase in the closure.
Sub-flow inner phases always use this fallback.

cacheKeys now produces a 4-tier ladder: key (v3:phasefp, write) → v2Key
(v2:flowdef, read-only) → bareKey (bare flowdef, read-only) → legacyKey
(no flowdef, read-only). cachedPhase consults all four read-only on a miss;
recordCache writes only key. This makes the M6 upgrade additive — no
miss-storm for unchanged flows.

phaseFingerprints computed once per run in runTaskflowLayers alongside
flowDefHash, plumbed through RunState + PhaseCacheCtx. Fail-open: any
per-phase error degrades that phase to the whole-flow hash.

Tests: test/cache-phasefp.test.ts (11 tests — soundness gate, determinism,
precise-diff win, transitive propagation, v2 fallback, cross-flow
isolation, shareContext fallback). Updated cache-migration.test.ts
(distinct 4-tier keys; structural-change test now scoped to p's closure)
and runtime.test.ts resume tests to the v3 key shape.
---
 docs/internal/cache-migration.md |  93 ++++++++---
 extensions/flowir/index.ts       |   2 +
 extensions/flowir/phasefp.ts     | 103 ++++++++++++
 extensions/runtime.ts            | 220 ++++++++++++++++++++++---
 extensions/schema.ts             |  31 ++++
 extensions/store.ts              |  16 +-
 test/cache-migration.test.ts     |  36 ++--
 test/cache-phasefp.test.ts       | 274 +++++++++++++++++++++++++++++++
 test/runtime.test.ts             |  19 ++-
 9 files changed, 719 insertions(+), 75 deletions(-)
 create mode 100644 extensions/flowir/phasefp.ts
 create mode 100644 test/cache-phasefp.test.ts
diff --git a/docs/internal/cache-migration.md b/docs/internal/cache-migration.md
index 2b6bf84..ee203fd 100644
--- a/docs/internal/cache-migration.md
+++ b/docs/internal/cache-migration.md
@@ -12,25 +12,55 @@ Before H1, the cache key folded the flow **definition** fingerprint under a bare
 H1 versions the key with a `v2:` prefix and routes the fingerprint through the
 FlowIR compile seam (`compileTaskflowToIR` → `flowDefHash`).
 
-To avoid a one-time miss-storm on upgrade, the runtime consults **three** keys
-on every cross-run lookup, read-only for the legacy tiers.
+M6 replaces the whole-flow `v2:flowdef:` tier with a **per-phase structural
+sub-fingerprint** (`v3:phasefp:`): the hash of a single phase plus its
+transitive dependency closure. Editing phase B now invalidates only B and its
+transitive dependents — independent sibling phase A keeps its cache hit.
 
-## Key shapes (H1)
+To avoid a one-time miss-storm on upgrade, the runtime consults **four** keys
+on every cross-run lookup, read-only for the fallback tiers.
 
-`cacheKeys()` (`extensions/runtime.ts`) returns three keys for a phase:
+## Key shapes (M6)
+
+`cacheKeys()` (`extensions/runtime.ts`) returns four keys for a phase:
 
 | Tier | Shape | Written by | Status |
 |------|-------|-----------|--------|
-| `key` (current) | `flow:<name>` + `v2:flowdef:<hash>` + `<phase>` + `think/tools/ctx` + fingerprint | H1+ | **read + write** |
+| `key` (current) | `flow:<name>` + `v3:phasefp:<subfp>` + `<phase>` + `think/tools/ctx` + fingerprint | M6+ | **read + write** |
+| `v2Key` | `flow:<name>` + `v2:flowdef:<flowDefHash>` + … | H1..M5 | **read-only** |
 | `bareKey` | `flow:<name>` + `flowdef:<hash>` (bare, unversioned) + … | pre-H1 | **read-only** (removed in v0.1.0) |
 | `legacyKey` | `flow:<name>` + … (flowdef line omitted) | pre-flowDefHash era | **read-only** (removed in v0.1.0) |
 
+### The per-phase sub-fingerprint (`v3:phasefp`)
+
+`phaseFingerprint(def, phaseId)` (`extensions/flowir/phasefp.ts`) hashes the
+phase itself plus its transitive `dependsOn ∪ from` closure, reusing the vendored
+`canonicalJson` + `hashCanonical` (byte-identical to overstory's contract). The
+`cache` policy field is stripped (its sub-fields reach the key via other paths);
+every other `Phase` field is hashed.
+
+**Soundness fallback.** Per-phase invalidation is only sound when a phase's real
+dependencies are fully captured by the static closure. `phaseFingerprint` returns
+`undefined` (→ the caller folds the whole-flow `flowDefHash` instead, preserving
+pre-M6 behavior) when:
+
+- the flow has `contextSharing: true`, OR
+- any phase in the closure (self included) has `shareContext: true`, OR
+- any phase in the closure (self included) has `type: "flow"`.
+
+These are the cases where a phase can read sibling state outside its declared
+deps (Shared Context Tree) or where sub-structure is resolved at runtime
+(`flow`). Sub-flow inner phases always use this fallback (their `phaseFp` is
+absent → `flowDefHash`), so editing one phase inside a sub-flow invalidates all
+sub-flow phases — a known, safe conservatism.
+
 ### Lookup order (`cachedPhase`)
 
 1. within-run resume (`cc.prior.inputHash === keys.key`) — fastest, always allowed.
-2. `store.get(keys.key)` — current v2 entry.
-3. `store.get(keys.bareKey)` — pre-H1 bare entry.
-4. `store.get(keys.legacyKey)` — pre-flowDefHash entry.
+2. `store.get(keys.key)` — current v3 entry.
+3. `store.get(keys.v2Key)` — pre-M6 v2 entry.
+4. `store.get(keys.bareKey)` — pre-H1 bare entry.
+5. `store.get(keys.legacyKey)` — pre-flowDefHash entry.
 
 A hit on **any** tier is restored as a `cacheHit: "cross-run"` result with zero
 usage. The restored `PhaseState.inputHash` is always `keys.key` (the current
@@ -38,37 +68,48 @@ shape), so downstream phases and recompute see a consistent identity.
 
 ### Write policy (`recordCache`)
 
-Only `keys.key` (the current v2 shape) is ever written. Legacy/bare hits are
+Only `keys.key` (the current v3 shape) is ever written. v2/bare/legacy hits are
 **not** write-through: re-storing under the new key would double the cache size
-for no benefit. Legacy/bare entries age out naturally via the 90-day hard cap
+for no benefit. Legacy/bare/v2 entries age out naturally via the 90-day hard cap
 (`DEFAULT_MAX_AGE_MS`) and the LRU cap (`DEFAULT_MAX_ENTRIES`).
 
-## Why three tiers?
-
-- **`v2:flowdef:` (current):** the versioned prefix lets a future genuine
-  overstory compiler advance to `v3:flowIR:` with its own fallback tier,
-  without disturbing v2 entries.
-- **bare `flowdef:` (pre-H1):** pre-H1 code wrote this shape. Without the 3rd
-  tier, every existing cross-run entry would silently miss on upgrade — a
-  one-time miss-storm for opt-in cross-run users.
+## Why four tiers?
+
+- **`v3:phasefp:` (current):** the per-phase structural sub-fingerprint enables
+  precise invalidation — editing one phase no longer evicts independent
+  siblings. The versioned prefix lets a future genuine overstory compiler
+  advance to `v4:flowIR:` with its own fallback tier, without disturbing v3.
+- **`v2:flowdef:` (pre-M6):** M5-and-earlier code wrote this whole-flow shape.
+  Without this tier, every existing cross-run entry would silently miss on the
+  M6 upgrade — a one-time miss-storm for opt-in cross-run users.
+- **bare `flowdef:` (pre-H1):** pre-H1 code wrote this shape. Retained for
+  completeness.
 - **no-flowdef (pre-flowDefHash):** the very earliest cross-run entries, before
   the flow definition was folded into the key at all. Retained for completeness;
   these are rare.
 
+### Upgrade note (one-time cost)
+
+On the first post-M6 run, if a sibling phase was edited between the last
+pre-M6 run and the upgrade, an *unchanged* independent phase may re-execute
+once: its v2 entry was keyed on the old `flowDefHash`, which no longer matches.
+This is bounded (per-flow, one-time, only when a sibling edit happened) and
+amortized over subsequent runs as v3 entries take over. For unchanged flows the
+v2 tier hits and no re-execution occurs.
+
 ## Retirement
 
-- **v0.1.0:** remove the `bareKey` and `legacyKey` tiers and the `CacheKeys`
-  return to a single `key`. By then all pre-H1 entries will have aged out (90-day
-  hard cap). The `v2:` prefix is retained as the version anchor for the *next*
-  migration.
-- A pre-release verification step: inspect a real `.pi/taskflow/cache/` directory
-  for bare-`flowdef:` entries. If cross-run is confirmed unused in production
-  (opt-in, young), the bare tier can be dropped earlier.
+- **v0.1.0:** remove the `bareKey` and `legacyKey` tiers. By then all pre-H1
+  entries will have aged out (90-day hard cap).
+- **Later:** remove the `v2Key` tier once all pre-M6 entries have aged out.
+- The `v3:` prefix is retained as the version anchor for the *next* migration.
 
 ## See also
 
 - `extensions/flowir/hash.ts` — the vendored overstory hash algorithm.
+- `extensions/flowir/phasefp.ts` — the per-phase structural sub-fingerprint.
 - `extensions/flowir/index.ts` — `compileTaskflowToIR` (the seam that produces
-  `hash` and `meta.declaredDeps`).
+  `hash` and `meta.declaredDeps`) and `phaseFingerprint`.
 - `docs/internal/overstory-convergence-roadmap.md` §3 (M1).
 - `test/cache-migration.test.ts` — the migration contract tests.
+- `test/cache-phasefp.test.ts` — the per-phase sub-fingerprint contract tests.
diff --git a/extensions/flowir/index.ts b/extensions/flowir/index.ts
index f5f8962..e061559 100644
--- a/extensions/flowir/index.ts
+++ b/extensions/flowir/index.ts
@@ -71,3 +71,5 @@ export type {
 	TaskflowIR,
 	TaskflowIRMeta,
 } from "./meta.ts";
+
+export { phaseFingerprint } from "./phasefp.ts";
diff --git a/extensions/flowir/phasefp.ts b/extensions/flowir/phasefp.ts
new file mode 100644
index 0000000..a7f3c46
--- /dev/null
+++ b/extensions/flowir/phasefp.ts
@@ -0,0 +1,103 @@
+/**
+ * Per-phase structural sub-fingerprint (M6).
+ *
+ * `phaseFingerprint` produces a content-addressed hash of ONLY the subset of
+ * the flow definition that can affect a single phase's subagent output: the
+ * phase itself plus its transitive dependency closure. Folding this into the
+ * cross-run cache key (instead of the whole-flow `flowDefHash`) means editing
+ * phase B invalidates only B and its transitive dependents — independent
+ * sibling phase A keeps its cache hit.
+ *
+ * ## Soundness (the fallback gate)
+ *
+ * Per-phase invalidation is only sound when a phase's *real* dependencies are
+ * fully captured by the static `dependsOn ∪ from` closure. Three cases break
+ * that guarantee, so `phaseFingerprint` returns `undefined` for them and the
+ * caller falls back to the whole-flow `flowDefHash` (safe, = pre-M6 behavior):
+ *
+ *   1. **Shared Context Tree** (`def.contextSharing === true` or any closure
+ *      member has `shareContext === true`): a sharing phase can read sibling
+ *      blackboard writes OUTSIDE its declared deps, so the static closure
+ *      under-approximates real reads.
+ *   2. **`flow` phase in the closure** (`type === "flow"`): a `flow` phase's
+ *      sub-structure is resolved at runtime (inline `def`) or from a saved
+ *      flow (`use`) and is not statically visible here. Editing the saved
+ *      sub-flow would not move this phase's sub-fingerprint.
+ *
+ * `cache` (the policy object) is the ONLY field stripped from each phase
+ * before hashing: its sub-fields (`scope`/`ttl`/`fingerprint`) are folded into
+ * the cache key through other paths (`cc.scope` gates the lookup, `cc.ttlMs`
+ * governs expiry, `cc.fingerprint` is in the key tail). Every other `Phase`
+ * field is hashed. `PhaseSchema` uses `additionalProperties: false`, so no
+ * surprise field can be missed.
+ *
+ * Pure + async (Web Crypto via `hashCanonical`). Reuses the vendored
+ * `canonicalJson`/`hashCanonical` (byte-identical to overstory's contract) so
+ * the sub-fingerprint shares one hashing contract with `flowDefHash`. Never
+ * throws — callers wrap in try/catch and degrade to `flowDefHash`.
+ *
+ * @see docs/internal/cache-migration.md (v3:phasefp tier)
+ */
+
+import { transitiveDependencies, type Phase, type Taskflow } from "../schema.ts";
+import { canonicalJson, hashCanonical } from "./hash.ts";
+
+/** Policy field stripped before hashing (its sub-fields reach the key via
+ *  `cc.scope` / `cc.ttlMs` / `cc.fingerprint` — folding them here would be
+ *  recursive and redundant). This is the ONLY field stripped. */
+const PHASE_FP_STRIP = ["cache"] as const;
+
+/** Clone a phase into a plain record with policy fields removed. */
+function stripPolicy(phase: Phase): Record<string, unknown> {
+	const rec = phase as unknown as Record<string, unknown>;
+	const out: Record<string, unknown> = {};
+	for (const k of Object.keys(rec)) {
+		if ((PHASE_FP_STRIP as readonly string[]).includes(k)) continue;
+		out[k] = rec[k];
+	}
+	return out;
+}
+
+/**
+ * Per-phase structural sub-fingerprint.
+ *
+ * @returns the hex hash, or `undefined` when per-phase soundness cannot be
+ *   guaranteed (caller falls back to the whole-flow `flowDefHash`). Never
+ *   throws.
+ */
+export async function phaseFingerprint(def: Taskflow, phaseId: string): Promise<string | undefined> {
+	const phases = def.phases as Phase[];
+	const byId = new Map(phases.map((p) => [p.id, p]));
+	const phase = byId.get(phaseId);
+	if (!phase) return undefined;
+
+	// --- Soundness gate: fall back to whole-flow when static closure is unsafe. ---
+	// Flow-wide context sharing enables cross-sibling reads outside declared deps.
+	if (def.contextSharing === true) return undefined;
+
+	const closureIds = transitiveDependencies(phases, phaseId);
+	const closurePhases: Phase[] = [];
+	for (const id of closureIds) {
+		const p = byId.get(id);
+		if (!p) continue; // unknown dep — validation reports elsewhere
+		// Per-phase sharing: this closure member can read sibling blackboard
+		// writes outside its own declared deps.
+		if (p.shareContext === true) return undefined;
+		// A flow phase's sub-structure is runtime/saved-flow-resolved and not
+		// statically visible — editing it would not move the sub-fingerprint.
+		if ((p.type ?? "agent") === "flow") return undefined;
+		closurePhases.push(p);
+	}
+	// The self phase's own sharing/type is part of the closure too.
+	if (phase.shareContext === true) return undefined;
+	if ((phase.type ?? "agent") === "flow") return undefined;
+
+	// --- Build the canonical payload. ---
+	// `deps` is the SORTED transitive closure (self excluded). canonicalJson
+	// sorts OBJECT keys but preserves ARRAY order, so we sort the array
+	// explicitly for determinism independent of dependency walk order.
+	const depsPayload = closurePhases.map((p) => ({ id: p.id, def: stripPolicy(p) }));
+	const payload = { self: stripPolicy(phase), deps: depsPayload };
+
+	return hashCanonical(canonicalJson(payload));
+}
diff --git a/extensions/runtime.ts b/extensions/runtime.ts
index 351b346..49c3eae 100644
--- a/extensions/runtime.ts
+++ b/extensions/runtime.ts
@@ -20,7 +20,7 @@ import { type Budget, type CacheScope, dependenciesOf, finalPhase, LOOP_DEFAULT_
 import { verifyTaskflow } from "./verify.ts";
 import { hashInput, newRunId, type PhaseState, type RunState, runsDir } from "./store.ts";
 import { CacheStore, resolveFingerprint } from "./cache.ts";
-import { compileTaskflowToIR } from "./flowir/index.ts";
+import { compileTaskflowToIR, phaseFingerprint } from "./flowir/index.ts";
 import { computeStaleFrontier, declaredReadMapOfDef, readMapOf } from "./stale.ts";
 import { ctxDirFor, drainPendingSpawns, initCtxDir, registerNode, setNodeStatus, type SpawnAssignment } from "./context-store.ts";
 import { allocateWorkspace, isWorkspaceKeyword, type Workspace } from "./workspace.ts";
@@ -72,6 +72,55 @@ export interface RuntimeResult {
 	finalOutput: string;
 	ok: boolean;
 	totalUsage: UsageStats;
+	/** Incremental-reuse summary: how many phases were reused from cache vs.
+	 *  freshly executed this run, and the cost the reused work would otherwise
+	 *  have incurred (known only for within-run resume; cross-run hits zero
+	 *  their usage so their original cost is not recoverable). Optional &
+	 *  additive — callers that ignore it are unaffected. */
+	reuse?: ReuseSummary;
+}
+
+/** A run's incremental-reuse accounting (see RuntimeResult.reuse). */
+export interface ReuseSummary {
+	/** Phases that completed by executing a subagent this run. */
+	executed: number;
+	/** Phases served from the within-run resume cache (no new tokens). */
+	reusedRunOnly: number;
+	/** Phases restored from the cross-run store (no new tokens). */
+	reusedCrossRun: number;
+	/** Total phases that reached `done` (executed + reused). */
+	done: number;
+	/** USD the within-run-reused phases would have cost if re-executed (their
+	 *  preserved prior usage). Cross-run hits are excluded (cost not recoverable). */
+	savedUSD: number;
+}
+
+/** Compute the incremental-reuse summary from a run's terminal phase states.
+ *  Pure, total, never throws. A phase is "reused" iff it carries a `cacheHit`
+ *  marker (set by `cachedPhase` for both within-run resume and cross-run hits). */
+export function summarizeReuse(state: RunState): ReuseSummary {
+	let executed = 0;
+	let reusedRunOnly = 0;
+	let reusedCrossRun = 0;
+	let savedUSD = 0;
+	for (const ps of Object.values(state.phases)) {
+		if (ps.status !== "done") continue;
+		if (ps.cacheHit === "run-only") {
+			reusedRunOnly++;
+			savedUSD += ps.usage?.cost ?? 0; // within-run resume preserves prior usage
+		} else if (ps.cacheHit === "cross-run") {
+			reusedCrossRun++; // cross-run hits zero their usage — cost not recoverable
+		} else {
+			executed++;
+		}
+	}
+	return {
+		executed,
+		reusedRunOnly,
+		reusedCrossRun,
+		done: executed + reusedRunOnly + reusedCrossRun,
+		savedUSD,
+	};
 }
 
 function buildInterpolationContext(
@@ -721,6 +770,7 @@ async function executePhaseInner(
 		flowName: state.flowName,
 		runId: state.runId,
 		flowDefHash: state.flowDefHash === "failed" ? undefined : state.flowDefHash,
+		phaseFp: state.phaseFingerprints?.[phase.id],
 		forceRerun: opts?.forceRerun,
 		thinking: phase.thinking,
 		tools: phase.tools,
@@ -1635,6 +1685,12 @@ export interface PhaseCacheCtx {
 	 *  key so two structurally-different flows that share a name can never
 	 *  collide, and a changed flow never serves a stale cross-run hit. */
 	flowDefHash?: string | "failed";
+	/** Per-phase structural sub-fingerprint (M6). When present, folds into the
+	 *  key as `v3:phasefp:<subfp>` so editing phase B invalidates only B + its
+	 *  transitive dependents. When absent (sub-flow inner states, or a phase
+	 *  for which per-phase soundness couldn't be guaranteed), `cacheKeys`
+	 *  falls back to `flowDefHash` — preserving pre-M6 whole-flow behavior. */
+	phaseFp?: string;
 	/** Force this phase to re-execute, ignoring the within-run prior AND the
 	 *  cross-run store (M5 recompute seed). Downstream phases are NOT forced —
 	 *  they re-evaluate naturally: if the seed's new output changed their
@@ -1646,27 +1702,34 @@ export interface PhaseCacheCtx {
 /** A computed cache identity: the new (versioned) key plus the read-only
  *  fallback keys used to honor entries written by older releases. The `key`
  *  is what we WRITE under and what `PhaseState.inputHash` carries; the
- *  `legacyKey`/`bareKey` are consulted READ-ONLY on a miss so an upgrade
- *  never produces a miss-storm. See docs/internal/cache-migration.md. */
+ *  `v2Key`/`bareKey`/`legacyKey` are consulted READ-ONLY on a miss so an
+ *  upgrade never produces a miss-storm. See docs/internal/cache-migration.md. */
 export interface CacheKeys {
-	/** Current key: folds `v2:flowdef:<hash>` (the overstory content fingerprint). */
+	/** Current key: folds `v3:phasefp:<subfp>` (the per-phase structural
+	 *  sub-fingerprint; degrades to the whole-flow hash when per-phase
+	 *  soundness couldn't be guaranteed). */
 	key: string;
-	/** Pre-flowDefHash-era key: the flowdef line OMITTED entirely. Read-only. */
-	legacyKey: string;
+	/** Pre-M6 key: `v2:flowdef:<flowDefHash>` (whole-flow fingerprint).
+	 *  Read-only. */
+	v2Key: string;
 	/** Bare (unversioned) `flowdef:` key — written by pre-H1 code that folded
 	 *  the hash without a `v2:` prefix. Read-only. Removed in v0.1.0. */
 	bareKey: string;
+	/** Pre-flowDefHash-era key: the flowdef line OMITTED entirely. Read-only. */
+	legacyKey: string;
 }
 
 /** Fold the phase fingerprint into the base hash parts to form the cache keys.
  *
- *  Three keys are produced for backward compatibility (see
+ *  Four keys are produced for backward compatibility (see
  *  docs/internal/cache-migration.md):
- *    - `key`      : `v2:flowdef:<hash>` — the current write key.
+ *    - `key`      : `v3:phasefp:<subfp>` — the current write key (per-phase
+ *      structural sub-fingerprint; falls back to the whole-flow hash when
+ *      `cc.phaseFp` is absent).
+ *    - `v2Key`    : `v2:flowdef:<flowDefHash>` — pre-M6 whole-flow key.
+ *    - `bareKey`  : bare `flowdef:<flowDefHash>` (unversioned) — pre-H1 entries.
  *    - `legacyKey`: the flowdef line omitted — pre-flowDefHash entries.
- *    - `bareKey`  : bare `flowdef:<hash>` (unversioned) — pre-H1 entries that
- *      folded the hash without the `v2:` prefix.
- *  `cachedPhase` consults all three READ-ONLY on a miss; `recordCache` writes
+ *  `cachedPhase` consults all four READ-ONLY on a miss; `recordCache` writes
  *  only `key`. This means an upgrade never produces a miss-storm: existing
  *  entries (whichever shape) still hit, and new writes converge on `key`. */
 export function cacheKeys(cc: PhaseCacheCtx, baseParts: string[]): CacheKeys {
@@ -1682,10 +1745,15 @@ export function cacheKeys(cc: PhaseCacheCtx, baseParts: string[]): CacheKeys {
 	];
 	const fold = (parts: string[]): string =>
 		cc.fingerprint ? hashInput(...parts, cc.fingerprint) : hashInput(...parts);
+	// Per-phase sub-fingerprint; falls back to the whole-flow hash when absent
+	// (sub-flow inner states, or soundness fallback) — preserving pre-M6 behavior.
+	const fp = cc.phaseFp ?? cc.flowDefHash ?? "";
+	const fdh = cc.flowDefHash ?? "";
 	return {
-		key: fold([`flow:${cc.flowName}`, `v2:flowdef:${cc.flowDefHash ?? ""}`, ...tail]),
+		key: fold([`flow:${cc.flowName}`, `v3:phasefp:${fp}`, ...tail]),
+		v2Key: fold([`flow:${cc.flowName}`, `v2:flowdef:${fdh}`, ...tail]),
+		bareKey: fold([`flow:${cc.flowName}`, `flowdef:${fdh}`, ...tail]),
 		legacyKey: fold([`flow:${cc.flowName}`, ...tail]),
-		bareKey: fold([`flow:${cc.flowName}`, `flowdef:${cc.flowDefHash ?? ""}`, ...tail]),
 	};
 }
 
@@ -1696,9 +1764,10 @@ export function cacheKeys(cc: PhaseCacheCtx, baseParts: string[]): CacheKeys {
  *   - "cross-run": within-run first, then the persistent cross-run store.
  * On a cross-run hit, usage is zeroed and `cacheHit` records the source.
  *
- * The cross-run read is THREE-TIER and READ-ONLY for fallback keys: it tries
- * `keys.key` (current `v2:flowdef:` shape) first, then `keys.bareKey` (pre-H1
- * bare `flowdef:`), then `keys.legacyKey` (pre-flowDefHash, no flowdef line).
+ * The cross-run read is FOUR-TIER and READ-ONLY for fallback keys: it tries
+ * `keys.key` (current `v3:phasefp:` shape) first, then `keys.v2Key` (pre-M6
+ * `v2:flowdef:`), then `keys.bareKey` (pre-H1 bare `flowdef:`), then
+ * `keys.legacyKey` (pre-flowDefHash, no flowdef line).
  * A hit on ANY tier is restored as a cache hit; we do NOT write-through (no
  * re-store under the new key) so the cache size stays stable and the legacy
  * entry ages out naturally. See docs/internal/cache-migration.md.
@@ -1707,14 +1776,17 @@ function cachedPhase(cc: PhaseCacheCtx, keys: CacheKeys): PhaseState | null {
 	if (cc.scope === "off") return null;
 	if (cc.forceRerun) return null;
 
-	// 1. within-run resume (fastest; always allowed unless scope is off)
+	// 1. within-run resume (fastest; always allowed unless scope is off). Flag
+	// it as a `run-only` cache hit so the run summary can count it as reused
+	// work (it spent no new tokens). The prior usage is preserved verbatim so
+	// the summary can report what the reuse would otherwise have cost.
 	if (cc.prior && cc.prior.status === "done" && cc.prior.inputHash === keys.key) {
-		return { ...cc.prior, status: "done" };
+		return { ...cc.prior, status: "done", cacheHit: "run-only" };
 	}
 
-	// 2. cross-run memoization (opt-in) — three-tier read-only fallback.
+	// 2. cross-run memoization (opt-in) — four-tier read-only fallback.
 	if (cc.scope === "cross-run") {
-		for (const k of [keys.key, keys.bareKey, keys.legacyKey]) {
+		for (const k of [keys.key, keys.v2Key, keys.bareKey, keys.legacyKey]) {
 			const e = cc.store.get(k, cc.ttlMs);
 			if (!e) continue;
 			// If we stored the full PhaseState, restore it (preserving gate,
@@ -1895,6 +1967,22 @@ export interface RecomputeReport {
 	/** Phases in the frontier whose inputHash did NOT move → cached result
 	 *  reused, no re-execution (early cutoff). Empty in dry-run (unknowable). */
 	readonly cutoff: readonly string[];
+	/** Per-phase decision trace: WHY each phase was rerun / cut off / reused.
+	 *  The "explainable reactivity" layer — like React DevTools telling you why
+	 *  a component re-rendered. Additive; callers that ignore it are unaffected. */
+	readonly decisions: readonly RecomputeDecision[];
+}
+
+/** Why a single phase landed in its recompute outcome. */
+export interface RecomputeDecision {
+	readonly phaseId: string;
+	/** What happened (real run) or would happen (dry-run). */
+	readonly outcome: "rerun" | "cutoff" | "reused" | "failed";
+	/** Human-readable cause. */
+	readonly reason: string;
+	/** The upstream phase(s) that caused this outcome, when applicable
+	 *  (e.g. the changed upstreams that forced a rerun). */
+	readonly causedBy?: readonly string[];
 }
 
 /** Scan a flow for dependencies that cannot be observed through the readSet.
@@ -1946,6 +2034,30 @@ export async function recomputeTaskflow(
 	const allIds = Object.keys(newState.phases);
 
 	if (opts.dryRun) {
+		// Explain each phase WITHOUT executing: a frontier phase "may rerun"
+		// because it (transitively) reads a changed seed; everything else is
+		// reused as unreachable. We name the in-frontier upstream(s) as the cause.
+		const seedSet0 = new Set(seeds);
+		const upstreamsOf = (id: string): string[] => {
+			const observed = (newState.phases[id]?.reads ?? []).map((r) => r.stepId).filter((u) => u !== id);
+			const decl = (declared.get(id) ?? []).filter((u) => u !== id);
+			return [...new Set([...observed, ...decl])];
+		};
+		const decisions: RecomputeDecision[] = allIds.map((id) => {
+			if (!frontier.has(id)) {
+				return { phaseId: id, outcome: "reused", reason: "not reachable from any changed seed" };
+			}
+			if (seedSet0.has(id)) {
+				return { phaseId: id, outcome: "rerun", reason: "forced by recompute request (seed)" };
+			}
+			const causes = upstreamsOf(id).filter((u) => frontier.has(u));
+			return {
+				phaseId: id,
+				outcome: "rerun",
+				reason: "reads a phase in the stale frontier; may re-run if that upstream's output moves",
+				causedBy: causes.length ? causes : undefined,
+			};
+		});
 		return {
 			report: {
 				dryRun: true,
@@ -1954,6 +2066,7 @@ export async function recomputeTaskflow(
 				rerun: [...frontier],
 				reused: allIds.filter((id) => !frontier.has(id)),
 				cutoff: [],
+				decisions,
 			},
 			state: newState,
 		};
@@ -2003,6 +2116,11 @@ export async function recomputeTaskflow(
 		.filter((id) => frontier.has(id));
 	const rerun: string[] = [];
 	const cutoff: string[] = [];
+	const decisions: RecomputeDecision[] = [];
+	// Phases whose OUTPUT actually moved this recompute (seed forced, or result
+	// changed). Used to attribute a downstream rerun to the specific upstream(s)
+	// that changed — the "why" of the decision trace.
+	const outputMoved = new Set<string>();
 	const noop = () => {};
 	let aborted = false;
 	for (const id of order) {
@@ -2015,17 +2133,50 @@ export async function recomputeTaskflow(
 		const phase = newState.def.phases.find((p) => p.id === id);
 		if (!phase) continue;
 		const before = newState.phases[id]?.inputHash;
-		const execOpts = seedSet.has(id) ? { forceRerun: true } : undefined;
+		const isSeed = seedSet.has(id);
+		const execOpts = isSeed ? { forceRerun: true } : undefined;
+		// The upstream(s) of this phase whose output moved — the cause of a rerun.
+		const changedUpstreams = depsFor(id).filter((u) => outputMoved.has(u));
 		try {
 			const ps = await executePhase(phase, newState, deps, newState.phases[id], noop, 0, execOpts);
 			newState.phases[id] = ps;
 			// A phase counts as "rerun" if it was a forced seed OR its result moved;
 			// otherwise it hit its cache (inputHash unchanged) → early cutoff.
-			if (seedSet.has(id) || ps.inputHash !== before) rerun.push(id);
-			else cutoff.push(id);
+			if (isSeed || ps.inputHash !== before) {
+				rerun.push(id);
+				outputMoved.add(id);
+				decisions.push(
+					isSeed
+						? { phaseId: id, outcome: "rerun", reason: "forced by recompute request (seed)" }
+						: {
+								phaseId: id,
+								outcome: "rerun",
+								reason: "input changed — an upstream's output moved",
+								causedBy: changedUpstreams.length ? changedUpstreams : undefined,
+							},
+				);
+			} else {
+				cutoff.push(id);
+				decisions.push({
+					phaseId: id,
+					outcome: "cutoff",
+					reason: "input unchanged — upstream(s) re-ran but produced identical output (early cutoff)",
+					causedBy: depsFor(id).filter((u) => frontier.has(u)).length
+						? depsFor(id).filter((u) => frontier.has(u))
+						: undefined,
+				});
+			}
 		} catch {
 			// A failing recompute phase is recorded as rerun (it was attempted).
 			rerun.push(id);
+			outputMoved.add(id);
+			decisions.push({ phaseId: id, outcome: "failed", reason: "re-execution attempted but the phase failed" });
+		}
+	}
+	// Frontier-external phases were never touched — record them as reused.
+	for (const id of allIds) {
+		if (!frontier.has(id)) {
+			decisions.push({ phaseId: id, outcome: "reused", reason: "not reachable from any changed seed" });
 		}
 	}
 	return {
@@ -2036,6 +2187,7 @@ export async function recomputeTaskflow(
 			rerun,
 			reused: allIds.filter((id) => !frontier.has(id)),
 			cutoff,
+			decisions,
 		},
 		state: newState,
 	};
@@ -2099,6 +2251,27 @@ async function runTaskflowLayers(state: RunState, deps: RuntimeDeps): Promise<Ru
 		}
 	}
 
+	// M6: per-phase structural sub-fingerprints. Computed once per run (when
+	// cross-run is potentially active) so editing phase B invalidates only B +
+	// its transitive dependents, not independent siblings. Each value is either
+	// a precise per-phase hash or the whole-flow `flowDefHash` (soundness
+	// fallback for shareContext / `flow` phases). Skipped entirely when
+	// `flowDefHash === "failed"` (cross-run is disabled for the run anyway).
+	// Never throws into the run — a per-phase error degrades that phase to the
+	// whole-flow hash (safe, = pre-M6 behavior).
+	if (state.flowDefHash !== "failed" && state.phaseFingerprints === undefined) {
+		const whole = state.flowDefHash ?? "";
+		const map: Record<string, string> = {};
+		for (const p of def.phases) {
+			try {
+				map[p.id] = (await phaseFingerprint(def, p.id)) ?? whole;
+			} catch {
+				map[p.id] = whole; // fail-open → whole-flow scope
+			}
+		}
+		state.phaseFingerprints = map;
+	}
+
 	state.status = "running";
 	safeEmit(deps, state);
 
@@ -2238,5 +2411,6 @@ async function runTaskflowLayers(state: RunState, deps: RuntimeDeps): Promise<Ru
 		finalOutput,
 		ok: state.status === "completed",
 		totalUsage,
+		reuse: summarizeReuse(state),
 	};
 }
diff --git a/extensions/schema.ts b/extensions/schema.ts
index bda1e0a..8e80a17 100644
--- a/extensions/schema.ts
+++ b/extensions/schema.ts
@@ -855,6 +855,37 @@ export function dependenciesOf(phase: Phase): string[] {
 	return Array.from(set);
 }
 
+/**
+ * Transitive upstream dependency closure of a phase: every id reachable via
+ * `dependsOn ∪ from`, including indirect ancestors. Cycle-safe (visited set).
+ * Returns the closure EXCLUDING `phaseId` itself. Sorted for deterministic
+ * hashing. Shares the exact edge semantics with `topoLayers`/`detectCycle` so
+ * the closure is complete for every valid flow (validation already rejects
+ * `{steps.X}` refs that aren't reachable via these edges, except for
+ * `join: "any"` phases — handled by callers as needed).
+ *
+ * Hoisted out of `validateTaskflow` so `phaseFingerprint` (M6) and validation
+ * share one source of truth for "what does this phase structurally depend on".
+ */
+export function transitiveDependencies(phases: Phase[], phaseId: string): string[] {
+	const byId = new Map(phases.map((p) => [p.id, p]));
+	const seen = new Set<string>();
+	const queue: string[] = [];
+	const seed = byId.get(phaseId);
+	if (seed) for (const d of dependenciesOf(seed)) queue.push(d);
+	while (queue.length) {
+		const id = queue.shift()!;
+		if (seen.has(id)) continue;
+		if (!byId.has(id)) continue; // unknown dep — validation reports elsewhere
+		seen.add(id);
+		const dep = byId.get(id)!;
+		for (const d of dependenciesOf(dep)) {
+			if (!seen.has(d)) queue.push(d);
+		}
+	}
+	return Array.from(seen).sort();
+}
+
 /** Topologically ordered layers; phases in the same layer can run concurrently. */
 export function topoLayers(phases: Phase[]): Phase[][] {
 	const byId = new Map(phases.map((p) => [p.id, p]));
diff --git a/extensions/store.ts b/extensions/store.ts
index aa464d1..881f2e3 100644
--- a/extensions/store.ts
+++ b/extensions/store.ts
@@ -42,10 +42,11 @@ export interface PhaseState {
 	model?: string;
 	error?: string;
 	inputHash?: string;
-	/** When this result was served from cache: 'cross-run' for the persistent
-	 *  cross-run store. (Within-run resume reuses prior state verbatim and is not
-	 *  flagged here.) */
-	cacheHit?: "cross-run";
+	/** When this result was served from cache instead of executed:
+	 *  'cross-run' = restored from the persistent cross-run store;
+	 *  'run-only'  = within-run resume (a prior attempt with the same inputHash).
+	 *  A phase with this set spent no new tokens this run. */
+	cacheHit?: "cross-run" | "run-only";
 	startedAt?: number;
 	endedAt?: number;
 	/** Live fan-out progress for map/parallel phases. */
@@ -114,6 +115,13 @@ export interface RunState {
 	 *  recompute derives this fresh from `def` so old runs (pre-H1) also get
 	 *  union semantics. */
 	declaredDeps?: Record<string, DeclaredDeps>;
+	/** Per-phase structural sub-fingerprints (M6). Computed once per run
+	 *  alongside `flowDefHash`. Each value is either a precise per-phase hash
+	 *  (when sound) or the whole-flow `flowDefHash` (fallback for
+	 *  shareContext / `flow` phases). Folded into the cross-run cache key as
+	 *  `v3:phasefp:<subfp>` so editing phase B invalidates only B + its
+	 *  transitive dependents. Audit/resume only — recompute derives fresh. */
+	phaseFingerprints?: Record<string, string>;
 }
 
 // ---------------------------------------------------------------------------
diff --git a/test/cache-migration.test.ts b/test/cache-migration.test.ts
index d4182a5..1e6d276 100644
--- a/test/cache-migration.test.ts
+++ b/test/cache-migration.test.ts
@@ -49,11 +49,14 @@ function countingRunner(counter: { n: number }): RuntimeDeps["runTask"] {
 }
 
 /** Build a minimal PhaseCacheCtx matching what executeTaskflow constructs for
- *  a cross-run agent phase, so we can compute the exact legacy/bare keys to
- *  pre-seed. Derives flowDefHash by running compileTaskflowToIR once. */
+ *  a cross-run agent phase, so we can compute the exact legacy/bare/v2 keys to
+ *  pre-seed. Derives flowDefHash + per-phase sub-fingerprint by running
+ *  compileTaskflowToIR + phaseFingerprint once (mirrors the runtime). */
 async function ccFor(def: Taskflow, cwd: string, store: CacheStore, phaseId: string): Promise<PhaseCacheCtx> {
-	const { compileTaskflowToIR } = await import("../extensions/flowir/index.ts");
+	const { compileTaskflowToIR, phaseFingerprint } = await import("../extensions/flowir/index.ts");
 	const ir = await compileTaskflowToIR(def);
+	const fdh = ir.hash;
+	const subfp = (await phaseFingerprint(def, phaseId)) ?? fdh ?? "";
 	return {
 		scope: "cross-run",
 		fingerprint: "",
@@ -62,7 +65,8 @@ async function ccFor(def: Taskflow, cwd: string, store: CacheStore, phaseId: str
 		phaseId,
 		flowName: def.name,
 		runId: "seed",
-		flowDefHash: ir.hash,
+		flowDefHash: fdh,
+		phaseFp: subfp,
 	};
 }
 
@@ -70,7 +74,7 @@ async function ccFor(def: Taskflow, cwd: string, store: CacheStore, phaseId: str
 // Key shape: new key uses v2:flowdef prefix; legacy/bare differ.
 // ---------------------------------------------------------------------------
 
-test("cacheKeys: key, legacyKey, bareKey are all distinct", async () => {
+test("cacheKeys: key, v2Key, bareKey, legacyKey are all distinct (M6 4-tier)", async () => {
 	const dir = tmpDir();
 	const store = new CacheStore(dir);
 	const def: Taskflow = {
@@ -80,10 +84,13 @@ test("cacheKeys: key, legacyKey, bareKey are all distinct", async () => {
 	const cc = await ccFor(def, dir, store, "p");
 	// baseParts must match what the agent branch uses: [phase.id, agentName, model, fullTask]
 	const ck = cacheKeys(cc, ["p", "a", "", "fixed"]);
-	assert.ok(ck.key !== ck.legacyKey, "v2 key differs from legacy (no-flowdef)");
-	assert.ok(ck.key !== ck.bareKey, "v2 key differs from bare (unversioned flowdef)");
-	assert.ok(ck.legacyKey !== ck.bareKey, "legacy differs from bare");
-	assert.match(ck.key, /^[0-9a-f]+$/);
+	assert.ok(ck.key !== ck.v2Key, "v3 key differs from v2 (per-phase subfp vs whole-flow)");
+	assert.ok(ck.key !== ck.bareKey, "v3 key differs from bare (unversioned flowdef)");
+	assert.ok(ck.key !== ck.legacyKey, "v3 key differs from legacy (no-flowdef)");
+	assert.ok(ck.v2Key !== ck.bareKey, "v2 differs from bare");
+	assert.ok(ck.v2Key !== ck.legacyKey, "v2 differs from legacy");
+	assert.ok(ck.bareKey !== ck.legacyKey, "bare differs from legacy");
+	assert.match(ck.key, /^[0-9a-f]+$/); // all four are hashInput hex digests
 	fs.rmSync(dir, { recursive: true, force: true });
 });
 
@@ -221,11 +228,14 @@ test("cache migration: identical re-run is free (v2 write round-trips)", async (
 test("cache migration: structural change invalidates (flowdef hash differs)", async () => {
 	const dir = tmpDir();
 	const store = new CacheStore(dir);
+	// M6: only a structural change WITHIN a phase's transitive closure
+	// invalidates it. Adding an unrelated independent phase must NOT. So `q`
+	// is made a dependency of `p` — adding it moves p's sub-fingerprint.
 	const mk = (extra: boolean): Taskflow => ({
 		name: "struct-change",
 		phases: extra
 			? [
-					{ id: "p", type: "agent", agent: "a", task: "fixed", cache: { scope: "cross-run" }, final: true },
+					{ id: "p", type: "agent", agent: "a", task: "fixed", cache: { scope: "cross-run" }, dependsOn: ["q"], final: true },
 					{ id: "q", type: "agent", agent: "a", task: "extra" },
 				]
 			: [{ id: "p", type: "agent", agent: "a", task: "fixed", cache: { scope: "cross-run" }, final: true }],
@@ -235,10 +245,10 @@ test("cache migration: structural change invalidates (flowdef hash differs)", as
 
 	await executeTaskflow(mkState(mk(false), dir), deps);
 	assert.equal(counter.n, 1);
-	// Different structure (extra phase) → different flowDefHash → different v2 key → miss.
-	// (q also runs, so counter increments by 2.)
+	// Adding `q` (now in p's closure) → p's sub-fingerprint changes → v3 key
+	// differs → miss. (q also runs, so counter increments by 2.)
 	await executeTaskflow(mkState(mk(true), dir), deps);
-	assert.equal(counter.n, 3, "structural change → miss on p (and q runs)");
+	assert.equal(counter.n, 3, "structural change in p's closure → miss on p (and q runs)");
 	fs.rmSync(dir, { recursive: true, force: true });
 });
 
diff --git a/test/cache-phasefp.test.ts b/test/cache-phasefp.test.ts
new file mode 100644
index 0000000..ce23446
--- /dev/null
+++ b/test/cache-phasefp.test.ts
@@ -0,0 +1,274 @@
+import assert from "node:assert/strict";
+import * as fs from "node:fs";
+import * as os from "node:os";
+import * as path from "node:path";
+import { test } from "node:test";
+import type { AgentConfig } from "../extensions/agents.ts";
+import { CacheStore } from "../extensions/cache.ts";
+import { phaseFingerprint } from "../extensions/flowir/index.ts";
+import { executeTaskflow, cacheKeys, type PhaseCacheCtx, type RuntimeDeps } from "../extensions/runtime.ts";
+import type { RunResult, RunOptions } from "../extensions/runner.ts";
+import type { Taskflow } from "../extensions/schema.ts";
+import type { RunState } from "../extensions/store.ts";
+import { emptyUsage } from "../extensions/usage.ts";
+
+// ---------------------------------------------------------------------------
+// helpers (minimal set, mirroring test/cache.test.ts)
+// ---------------------------------------------------------------------------
+
+const AGENTS: AgentConfig[] = [
+	{ name: "a", description: "test agent", systemPrompt: "", source: "user", filePath: "" },
+];
+
+function tmpDir(): string {
+	return fs.mkdtempSync(path.join(os.tmpdir(), "tf-phasefp-"));
+}
+
+function mkState(def: Taskflow, cwd: string): RunState {
+	return {
+		runId: `run-${Math.random().toString(36).slice(2, 8)}`,
+		flowName: def.name,
+		def,
+		args: {},
+		status: "running",
+		phases: {},
+		createdAt: Date.now(),
+		updatedAt: Date.now(),
+		cwd,
+	};
+}
+
+function countingRunner(counter: { n: number }): RuntimeDeps["runTask"] {
+	return async (_cwd, _agents, agentName, task, _o: RunOptions): Promise<RunResult> => {
+		counter.n++;
+		return {
+			agent: agentName,
+			task,
+			exitCode: 0,
+			output: `out:${task}#${counter.n}`,
+			stderr: "",
+			usage: { ...emptyUsage(), output: 10, cost: 0.001, turns: 1 },
+			stopReason: "end",
+		};
+	};
+}
+
+// ===========================================================================
+// Unit tests for phaseFingerprint (soundness gate + determinism)
+// ===========================================================================
+
+test("phaseFingerprint: returns undefined when def.contextSharing is true (soundness gate)", async () => {
+	const def: Taskflow = {
+		name: "sharing-flow",
+		contextSharing: true,
+		phases: [{ id: "p", type: "agent", agent: "a", task: "t", cache: { scope: "cross-run" }, final: true }],
+	};
+	assert.equal(await phaseFingerprint(def, "p"), undefined);
+});
+
+test("phaseFingerprint: returns undefined when a closure member has shareContext", async () => {
+	const def: Taskflow = {
+		name: "sharing-closure",
+		phases: [
+			{ id: "scout", type: "agent", agent: "a", task: "scan", shareContext: true },
+			{ id: "p", type: "agent", agent: "a", task: "use {steps.scout.output}", dependsOn: ["scout"], cache: { scope: "cross-run" }, final: true },
+		],
+	};
+	// p transitively depends on scout (shareContext) → fallback.
+	assert.equal(await phaseFingerprint(def, "p"), undefined);
+	// scout itself has shareContext → fallback.
+	assert.equal(await phaseFingerprint(def, "scout"), undefined);
+});
+
+test("phaseFingerprint: returns undefined when a closure member is a flow phase", async () => {
+	const def: Taskflow = {
+		name: "flow-closure",
+		phases: [
+			{ id: "sub", type: "flow", use: "some-saved-flow" },
+			{ id: "p", type: "agent", agent: "a", task: "use {steps.sub.output}", dependsOn: ["sub"], cache: { scope: "cross-run" }, final: true },
+		],
+	} as Taskflow;
+	// p transitively depends on a flow phase → fallback.
+	assert.equal(await phaseFingerprint(def, "p"), undefined);
+	// the flow phase itself → fallback.
+	assert.equal(await phaseFingerprint(def, "sub"), undefined);
+});
+
+test("phaseFingerprint: deterministic + changes when an included field changes", async () => {
+	const mk = (task: string): Taskflow => ({
+		name: "det",
+		phases: [{ id: "p", type: "agent", agent: "a", task, cache: { scope: "cross-run" }, final: true }],
+	});
+	const a1 = await phaseFingerprint(mk("t1"), "p");
+	const a2 = await phaseFingerprint(mk("t1"), "p");
+	const b = await phaseFingerprint(mk("t2"), "p");
+	assert.equal(a1, a2, "stable across calls");
+	assert.notEqual(a1, b, "changes when task text changes");
+	assert.match(a1!, /^[0-9a-f]+$/);
+});
+
+test("phaseFingerprint: cache policy field does NOT affect the sub-fingerprint", async () => {
+	// cache.scope/ttl/fingerprint reach the key via other paths; the sub-fingerprint
+	// must be invariant to them (else changing TTL would not invalidate via the
+	// dedicated expiry path but perturb the structural hash).
+	const mk = (cache: Taskflow["phases"][number]["cache"]): Taskflow => ({
+		name: "policy-inv",
+		phases: [{ id: "p", type: "agent", agent: "a", task: "t", cache, final: true }],
+	});
+	const a = await phaseFingerprint(mk({ scope: "cross-run" }), "p");
+	const b = await phaseFingerprint(mk({ scope: "cross-run", ttl: "30m" }), "p");
+	const c = await phaseFingerprint(mk({ scope: "cross-run", fingerprint: ["file:x"] }), "p");
+	assert.equal(a, b);
+	assert.equal(a, c);
+});
+
+test("phaseFingerprint: adding an independent phase does NOT move a phase's sub-fingerprint", async () => {
+	const base: Taskflow = {
+		name: "indep",
+		phases: [{ id: "p", type: "agent", agent: "a", task: "t", cache: { scope: "cross-run" }, final: true }],
+	};
+	const withExtra: Taskflow = {
+		name: "indep",
+		phases: [
+			{ id: "p", type: "agent", agent: "a", task: "t", cache: { scope: "cross-run" }, final: true },
+			{ id: "q", type: "agent", agent: "a", task: "extra" },
+		],
+	};
+	// q is NOT in p's closure → p's sub-fingerprint is unchanged.
+	assert.equal(await phaseFingerprint(base, "p"), await phaseFingerprint(withExtra, "p"));
+});
+
+// ===========================================================================
+// Integration tests through the runtime (the Test Matrix)
+// ===========================================================================
+
+test("phasefp: editing phase B does NOT invalidate independent phase A", async () => {
+	const dir = tmpDir();
+	const store = new CacheStore(dir);
+	const mk = (bTask: string): Taskflow => ({
+		name: "indep-edit",
+		phases: [
+			{ id: "scout", type: "agent", agent: "a", task: "scan", cache: { scope: "cross-run" } },
+			{ id: "A", type: "agent", agent: "a", task: "A uses {steps.scout.output}", dependsOn: ["scout"], cache: { scope: "cross-run" } },
+			{ id: "B", type: "agent", agent: "a", task: bTask, dependsOn: ["scout"], cache: { scope: "cross-run" }, final: true },
+		],
+	});
+	const counter = { n: 0 };
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	await executeTaskflow(mkState(mk("B original"), dir), deps);
+	assert.equal(counter.n, 3, "scout + A + B run once");
+	// Edit ONLY B's task text. scout + A are unaffected (their closures don't include B).
+	const r2 = await executeTaskflow(mkState(mk("B edited"), dir), deps);
+	assert.equal(counter.n, 4, "only B re-runs; scout + A hit");
+	assert.equal(r2.state.phases.scout.cacheHit, "cross-run");
+	assert.equal(r2.state.phases.A.cacheHit, "cross-run");
+	assert.equal(r2.state.phases.B.cacheHit, undefined, "B missed (its task changed)");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+test("phasefp: editing phase B invalidates B and its transitive dependents", async () => {
+	const dir = tmpDir();
+	const store = new CacheStore(dir);
+	const mk = (bTask: string): Taskflow => ({
+		name: "transitive",
+		phases: [
+			{ id: "scout", type: "agent", agent: "a", task: "scan", cache: { scope: "cross-run" } },
+			{ id: "B", type: "agent", agent: "a", task: bTask, dependsOn: ["scout"], cache: { scope: "cross-run" } },
+			{ id: "C", type: "agent", agent: "a", task: "C uses {steps.B.output}", dependsOn: ["B"], cache: { scope: "cross-run" } },
+			{ id: "A", type: "agent", agent: "a", task: "A uses {steps.scout.output}", dependsOn: ["scout"], cache: { scope: "cross-run" }, final: true },
+		],
+	});
+	const counter = { n: 0 };
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	await executeTaskflow(mkState(mk("B original"), dir), deps);
+	assert.equal(counter.n, 4, "scout + B + C + A run once");
+	// Edit B's task. B's closure changes → B misses. C depends on B → C's closure
+	// (which includes B) changes → C misses. scout + A are unaffected.
+	const r2 = await executeTaskflow(mkState(mk("B edited"), dir), deps);
+	assert.equal(counter.n, 6, "B + C re-run; scout + A hit");
+	assert.equal(r2.state.phases.scout.cacheHit, "cross-run");
+	assert.equal(r2.state.phases.A.cacheHit, "cross-run", "A independent of B → hit");
+	assert.equal(r2.state.phases.B.cacheHit, undefined, "B missed");
+	assert.equal(r2.state.phases.C.cacheHit, undefined, "C (transitive dependent) missed");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+test("phasefp: pre-v3 (v2) entry still hits — no miss-storm", async () => {
+	const dir = tmpDir();
+	const store = new CacheStore(dir);
+	const def: Taskflow = {
+		name: "v2-fallback",
+		phases: [{ id: "p", type: "agent", agent: "a", task: "fixed", cache: { scope: "cross-run" }, final: true }],
+	};
+	// Compute the v2 key the runtime will look up, and pre-seed it.
+	const { compileTaskflowToIR } = await import("../extensions/flowir/index.ts");
+	const ir = await compileTaskflowToIR(def);
+	const cc: PhaseCacheCtx = {
+		scope: "cross-run", fingerprint: "", store, prior: undefined,
+		phaseId: "p", flowName: def.name, runId: "old",
+		flowDefHash: ir.hash, phaseFp: (await phaseFingerprint(def, "p")) ?? ir.hash,
+		thinking: undefined, tools: undefined, preRead: "",
+	};
+	const ck = cacheKeys(cc, ["p", "a", "", "fixed"]);
+	store.put({ key: ck.v2Key, createdAt: Date.now(), output: "V2-OUTPUT", model: "v2-model", state: undefined, flowName: def.name, phaseId: "p", runId: "old" });
+
+	const counter = { n: 0 };
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+	const r = await executeTaskflow(mkState(def, dir), deps);
+	assert.equal(counter.n, 0, "v2 entry must hit via fallback — no execution");
+	assert.equal(r.state.phases.p.cacheHit, "cross-run");
+	assert.equal(r.state.phases.p.output, "V2-OUTPUT");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+test("phasefp: two structurally-different flows do not collide", async () => {
+	const dir = tmpDir();
+	const store = new CacheStore(dir);
+	const mk = (extra: boolean): Taskflow => ({
+		name: "collide",
+		phases: extra
+			? [
+					{ id: "p", type: "agent", agent: "a", task: "same", cache: { scope: "cross-run" }, dependsOn: ["q"], final: true },
+					{ id: "q", type: "agent", agent: "a", task: "extra" },
+				]
+			: [{ id: "p", type: "agent", agent: "a", task: "same", cache: { scope: "cross-run" }, final: true }],
+	});
+	const counter = { n: 0 };
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	await executeTaskflow(mkState(mk(false), dir), deps);
+	assert.equal(counter.n, 1);
+	// Same name + phaseId + task, but p's closure differs (q added as a dep) →
+	// different sub-fingerprint → no cross-flow collision.
+	await executeTaskflow(mkState(mk(true), dir), deps);
+	assert.equal(counter.n, 3, "p misses (closure changed) and q runs");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+test("phasefp: shareContext falls back to whole-flow invalidation", async () => {
+	const dir = tmpDir();
+	const store = new CacheStore(dir);
+	const mk = (bTask: string): Taskflow => ({
+		name: "sharing-fallback",
+		contextSharing: true,
+		phases: [
+			{ id: "A", type: "agent", agent: "a", task: "A", cache: { scope: "cross-run" } },
+			{ id: "B", type: "agent", agent: "a", task: bTask, cache: { scope: "cross-run" }, final: true },
+		],
+	});
+	const counter = { n: 0 };
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	await executeTaskflow(mkState(mk("B original"), dir), deps);
+	assert.equal(counter.n, 2, "A + B run once");
+	// With contextSharing, per-phase soundness cannot be guaranteed → both
+	// phases fall back to the whole-flow flowDefHash. Editing B moves the
+	// whole-flow hash → A ALSO misses (whole-flow invalidation, not per-phase).
+	const r2 = await executeTaskflow(mkState(mk("B edited"), dir), deps);
+	assert.equal(counter.n, 4, "both A and B re-run — whole-flow hash moved");
+	assert.equal(r2.state.phases.A.cacheHit, undefined, "A NOT reused — fallback to whole-flow");
+	assert.equal(r2.state.phases.B.cacheHit, undefined, "B missed (its task changed)");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
diff --git a/test/runtime.test.ts b/test/runtime.test.ts
index c1a9840..663f493 100644
--- a/test/runtime.test.ts
+++ b/test/runtime.test.ts
@@ -259,14 +259,14 @@ test("runtime: resume skips cached completed phases", async () => {
 	const state = mkState(def);
 	// Pre-seed phase one as already done with the matching input hash.
 	const { hashInput } = await import("../extensions/store.ts");
-	const { flowDefHash } = await import("../extensions/flowir/hash.ts");
-	const fh = await flowDefHash(def);
+	const { phaseFingerprint } = await import("../extensions/flowir/index.ts");
+	const subfpOne = (await phaseFingerprint(def, "one")) ?? "";
 	state.phases.one = {
 		id: "one",
 		status: "done",
 		output: "out:start",
-		// Must match runtime cacheKey(): flow name + flowDefHash + base parts + thinking + tools + ctx.
-		inputHash: hashInput(`flow:${def.name}`, `v2:flowdef:${fh}`, "one", "a", "", "start", "think:", "tools:[]", "ctx:"),
+		// Must match runtime cacheKey(): flow name + v3:phasefp sub-fingerprint + base parts + thinking + tools + ctx.
+		inputHash: hashInput(`flow:${def.name}`, `v3:phasefp:${subfpOne}`, "one", "a", "", "start", "think:", "tools:[]", "ctx:"),
 		usage: emptyUsage(),
 	};
 
@@ -287,16 +287,17 @@ test("runtime: resume caches a completed reduce phase (unified inputHash)", asyn
 	const record: string[] = [];
 	const runner = mockRunner((t) => `o:${t}`, { record });
 	const { hashInput } = await import("../extensions/store.ts");
-	const { flowDefHash } = await import("../extensions/flowir/hash.ts");
-	const fh = await flowDefHash(def);
+	const { phaseFingerprint } = await import("../extensions/flowir/index.ts");
+	const subfpX = (await phaseFingerprint(def, "x")) ?? "";
+	const subfpSum = (await phaseFingerprint(def, "sum")) ?? "";
 	const state = mkState(def);
-	state.phases.x = { id: "x", status: "done", output: "o:tx", inputHash: hashInput(`flow:${def.name}`, `v2:flowdef:${fh}`, "x", "a", "", "tx", "think:", "tools:[]", "ctx:"), usage: emptyUsage() };
-	// reduce cache key has the same shape as agent/gate (flow + flowDefHash + base parts + thinking + tools).
+	state.phases.x = { id: "x", status: "done", output: "o:tx", inputHash: hashInput(`flow:${def.name}`, `v3:phasefp:${subfpX}`, "x", "a", "", "tx", "think:", "tools:[]", "ctx:"), usage: emptyUsage() };
+	// reduce cache key has the same shape as agent/gate (flow + v3:phasefp + base parts + thinking + tools).
 	state.phases.sum = {
 		id: "sum",
 		status: "done",
 		output: "o:combine o:tx",
-		inputHash: hashInput(`flow:${def.name}`, `v2:flowdef:${fh}`, "sum", "a", "", "combine o:tx", "think:", "tools:[]", "ctx:"),
+		inputHash: hashInput(`flow:${def.name}`, `v3:phasefp:${subfpSum}`, "sum", "a", "", "combine o:tx", "think:", "tools:[]", "ctx:"),
 		usage: emptyUsage(),
 	};
 	const res = await executeTaskflow(state, baseDeps(runner));

From 31b2d49c49c834b18aaa599f876906cc57ad8c1e Mon Sep 17 00:00:00 2001
From: heggria <bshengtao@gmail.com>
Date: Thu, 25 Jun 2026 20:35:16 +0800
Subject: [PATCH 2/5] feat(cache): per-item cross-run caching for map phases
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add per-item cross-run memoization to the map phase so that when one of N
items changes between runs, only that item re-executes (N-1 cache hits) —
while preserving the existing whole-map fast path and all soundness fallbacks.

Mechanism:
- runFanout accepts an optional perItem hook. Before spawning a subagent for
  an item, it consults cachedPhase with a per-item key; a hit returns a
  0-token synthesized RunResult (stopReason "cache-hit") that flows through
  mergePhaseState as a normal successful item. Successful fresh items are
  recorded per-item for future runs.
- Per-item keys fold [phase.id, it.agent, model, it.task] + the existing
  v3:phasefp/flowName/fingerprint/thinking/tools/preRead tail. Folding
  it.agent (arbiter fix) prevents a stale cross-agent hit when only
  phase.agent changes.
- Whole-map lookup stays first (fast path); per-item engages only on a
  whole-map miss. A trailing whole-map record keeps the fast path warm.

Soundness gates (per-item disabled -> whole-map only):
- cross-run scope required (run-only/"off" have no persistent store)
- shareContext / flow-wide contextSharing disabled (items may read sibling
  blackboard writes outside declared deps)
- inside a runtime-generated sub-flow (def: frame — untrusted)
- undefined phaseFingerprint is NOT a blocker (cacheKeys falls back to
  flowDefHash, which is stable for a fixed def)

Correctness:
- merged output labels are positionally aligned with over ([k/N] using
  results.length), budget-skipped items filtered to null; cache-hit items
  keep their positional slot
- cached items contribute emptyUsage -> partial-hit cost == re-executed item only
- failed and budget-skipped items are never recorded per-item
- fail-open: any cache read/write error degrades to executing the item

Backward-compat: pre-existing whole-map entries (any tier) still hit via
cachedPhase's 4-tier read-only fallback; the whole-map key format is unchanged.

Tests: new test/cache-peritem.test.ts (11 tests) covering the Test Matrix —
partial reuse, positional alignment, duplicate sharing, shareContext/def-frame
fallbacks, whole-map fast path, revert, usage/subProgress, failed/skipped
non-caching, and the agent-invalidation arbiter fix.
---
 extensions/runtime.ts      | 117 ++++++++-
 skills/taskflow/SKILL.md   |  48 ++++
 test/cache-peritem.test.ts | 491 +++++++++++++++++++++++++++++++++++++
 3 files changed, 652 insertions(+), 4 deletions(-)
 create mode 100644 test/cache-peritem.test.ts

diff --git a/extensions/runtime.ts b/extensions/runtime.ts
index 49c3eae..76c4a99 100644
--- a/extensions/runtime.ts
+++ b/extensions/runtime.ts
@@ -169,6 +169,31 @@ function resultToPhaseState(id: string, r: RunResult, inputHash: string, parseJs
 	};
 }
 
+/**
+ * Synthesize a 0-token `RunResult` from a cached per-item `PhaseState` so a
+ * cross-run per-item cache hit flows through `mergePhaseState` as a normal
+ * successful fan-out item. `stopReason: "cache-hit"` is NOT in `isFailed`'s
+ * failure set (only "error"/"aborted"/non-zero exit), so the item counts as
+ * success. Usage is `emptyUsage()` — a cached item spent no new tokens this
+ * run, so `mergePhaseState`'s `aggregateUsage` charges nothing for it.
+ *
+ * Used only by the `map` per-item cache path (see `runFanout`). Fail-open by
+ * construction: this is only reached AFTER a successful `cachedPhase` lookup,
+ * so `ps.output` is always present.
+ */
+function phaseStateToRunResult(ps: PhaseState, it: { agent: string; task: string }): RunResult {
+	return {
+		agent: it.agent,
+		task: it.task,
+		exitCode: 0,
+		output: ps.output ?? "",
+		stderr: "",
+		usage: emptyUsage(),
+		model: ps.model,
+		stopReason: "cache-hit",
+	};
+}
+
 /** Convert observed read refs (e.g. "steps.scout.output") into a structured
  *  readSet keyed by upstream phase id, tagging each with the version
  *  (= inputHash) that was current when read. Only `steps.*` refs are upstream
@@ -326,12 +351,20 @@ function mergePhaseState(
 	const model = ran.find((r) => r.model !== undefined)?.model;
 	// Combine outputs as a labelled list; also expose a JSON array of outputs.
 	// For failed items, use the error message instead of the useless placeholder.
-	const combinedText = ran
+	// Labels are positionally aligned to the ORIGINAL `over` array: we iterate
+	// over ALL results (including budget-skipped, which are filtered to null) and
+	// use `results.length` as N, so item k's label reads `[k/N]` matching its
+	// position in `over` — not its rank among non-skipped items. Per-item cache
+	// hits (`stopReason: "cache-hit"`) are not budget-skipped, so they keep their
+	// original positional label.
+	const combinedText = results
 		.map((r, i) => {
-			const label = `### [${i + 1}/${ran.length}] ${r.agent}${isFailed(r) ? " (failed)" : ""}`;
+			if (r.stopReason === "budget-skipped") return null;
+			const label = `### [${i + 1}/${results.length}] ${r.agent}${isFailed(r) ? " (failed)" : ""}`;
 			const content = isFailed(r) ? (r.errorMessage || r.stderr || r.output) : r.output;
 			return `${label}\n\n${content}`;
 		})
+		.filter((x): x is string => x !== null)
 		.join("\n\n---\n\n");
 	// Only successful runs feed the parsed JSON array (no error/skip strings).
 	const jsonArray = parseJson ? ran.filter((r) => !isFailed(r)).map((r) => safeParse(r.output) ?? r.output) : undefined;
@@ -870,7 +903,14 @@ async function executePhaseInner(
 	const parseJson = phase.output === "json";
 
 	// Runs a list of sub-tasks with live fan-out progress + aggregate live usage/activity.
-	const runFanout = async (items: Array<{ agent: string; task: string }>): Promise<RunResult[]> => {
+	// `perItem` (map only) enables per-item cross-run caching: each item is looked
+	// up in the cache before spawning a subagent, and a successful fresh item is
+	// recorded so a later run with that item unchanged hits per-item. When
+	// `perItem` is undefined (parallel, or non-cacheable maps) the path is inert.
+	const runFanout = async (
+		items: Array<{ agent: string; task: string }>,
+		perItem?: { keyOf: (idx: number) => CacheKeys | null },
+	): Promise<RunResult[]> => {
 		let done = 0;
 		let running = 0;
 		let failed = 0;
@@ -904,6 +944,28 @@ async function executePhaseInner(
 					stopReason: "budget-skipped",
 				} satisfies RunResult;
 			}
+			// Per-item cross-run cache lookup (map only). A hit synthesizes a 0-token
+			// RunResult and returns immediately — the item never spawns a subagent and
+			// never reaches the ctx_spawn drain below (a cached item can't have queued
+			// new spawns). Fail-open: any error in the lookup path degrades to executing.
+			if (perItem) {
+				try {
+					const ckItem = perItem.keyOf(idx);
+					if (ckItem) {
+						const hit = cachedPhase(cc, ckItem);
+						if (hit) {
+							done++;
+							const synth = phaseStateToRunResult(hit, it);
+							liveUsages[idx] = emptyUsage();
+							if (hit.model) latestModel = hit.model;
+							refresh();
+							return synth;
+						}
+					}
+				} catch {
+					/* fail-open: a cache read error must never sink the item */
+				}
+			}
 			running++;
 			refresh();
 			if (ctxDir) {
@@ -919,6 +981,23 @@ async function executePhaseInner(
 			done++;
 			if (isFailed(r)) failed++;
 			liveUsages[idx] = r.usage;
+			// Per-item cross-run cache record (map only): persist a successful fresh
+			// item so a later run with this item unchanged hits per-item instead of
+			// re-running. Failed and budget-skipped items are never cached (a stale
+			// failure would be served on the next run). Fail-open: a write error never
+			// sinks the item — the fresh `r` is already in hand and flows downstream.
+			if (perItem && !isFailed(r) && r.stopReason !== "budget-skipped") {
+				try {
+					const ckItem = perItem.keyOf(idx);
+					if (ckItem) {
+						const ccItem: PhaseCacheCtx = { ...cc, phaseId: `${phase.id}#item${idx}` };
+						const itemPs = resultToPhaseState(`${phase.id}#item${idx}`, r, ckItem.key, parseJson);
+						recordCache(ccItem, itemPs);
+					}
+				} catch {
+					/* fail-open: cache write must never sink the item */
+				}
+			}
 			if (ctxDir) {
 				try {
 					const itemNid = nodeIdFor(String(idx));
@@ -1118,12 +1197,42 @@ async function executePhaseInner(
 				task: preRead + interpolate(phase.task ?? "", localCtx).text,
 			};
 		});
+		// Per-item caching is sound ONLY when ALL of:
+		//  - cross-run scope: run-only has no persistent store, so per-item entries
+		//    could never be re-read (no point keying them).
+		//  - no Shared Context Tree (`!sharing`): a sharing map item can read sibling
+		//    blackboard writes OUTSIDE its declared deps, so the per-item key (which
+		//    folds only the item's own task) under-approximates real reads and could
+		//    serve a stale result. Fall back to whole-map.
+		//  - not inside a runtime-generated sub-flow (`def:` frame in the stack):
+		//    such flows are untrusted / possibly non-deterministic, so per-item reuse
+		//    is unsafe. Fall back to whole-map (which still applies breadth caps).
+		// `undefined phaseFingerprint` is NOT a blocker: `cacheKeys` falls back to
+		// the whole-flow `flowDefHash`, which is stable across runs for a fixed def,
+		// so per-item keys for unchanged items remain stable.
+		const perItemCacheable =
+			cc.scope === "cross-run" &&
+			!sharing &&
+			!(deps._stack ?? []).some((s) => s.startsWith("def:"));
+		// Pre-compute per-item CacheKeys once so the lookup and the record path use
+		// the IDENTICAL key (and share cacheKeys' v3:phasefp + flow-name +
+		// fingerprint + thinking/tools/preRead contract). The per-item key folds
+		// `it.agent` (Arbiter fix): a different agent means different output, so a
+		// per-item key WITHOUT the agent could serve a stale cross-agent hit when
+		// only `phase.agent` changed (the whole-map key would correctly miss via
+		// JSON.stringify(tasks), but per-item keys would not).
+		const perItemKeys: (CacheKeys | null)[] = perItemCacheable
+			? tasks.map((it) => cacheKeys(cc, [phase.id, it.agent, phase.model ?? "", it.task]))
+			: tasks.map(() => null);
+		const perItem = perItemCacheable
+			? { keyOf: (idx: number): CacheKeys | null => perItemKeys[idx] ?? null }
+			: undefined;
 		const ck = cacheKeys(cc, [phase.id, phase.model ?? "", JSON.stringify(tasks)]);
 		const inputHash = ck.key;
 		const cached = cachedPhase(cc, ck);
 		if (cached) return cached;
 
-		const results = await runFanout(tasks);
+		const results = await runFanout(tasks, perItem);
 		const ps = mergePhaseState(phase.id, results, inputHash, parseJson);
 		if (readRefs.length) ps.reads = readRefsToReads(readRefs, state);
 		if (mapTruncated) {
diff --git a/skills/taskflow/SKILL.md b/skills/taskflow/SKILL.md
index aca991b..cd10531 100644
--- a/skills/taskflow/SKILL.md
+++ b/skills/taskflow/SKILL.md
@@ -553,6 +553,54 @@ Quick reference:
 - **Precedence (model/thinking/tools):** phase value → agent frontmatter (resolved via `modelRoles`) → global/default.
 - **Concurrency:** same-layer phases use `flow.concurrency`; a `map`/`parallel` phase uses `phase.concurrency ?? flow.concurrency ?? 8`.
 
+### Per-item map caching (cross-run)
+
+A `map` phase with `cache: { "scope": "cross-run" }` is cached **per item**, not
+just as a whole. When one of N items changes between runs, only that item
+re-executes — the other N−1 are served from the cross-run cache for $0.
+
+```jsonc
+{ "id": "audit-each", "type": "map",
+  "over": "{steps.discover.json.files}",   // array from an upstream phase
+  "task": "audit {item}",
+  "cache": { "scope": "cross-run" },        // ← enables per-item reuse
+  "dependsOn": ["discover"], "final": true }
+```
+
+How it works:
+
+- The **whole-map** entry is still checked first (fast path): an identical
+  re-run is a single $0 hit and never enters the fan-out.
+- On a whole-map miss, each item is looked up individually before it spawns a
+  subagent; a hit returns a 0-token synthesized result. Successful fresh items
+  are recorded so a later run with that item unchanged reuses them.
+- Per-item keys fold the item's resolved task **and agent** (so changing
+  `phase.agent` invalidates every item), plus the phase sub-fingerprint,
+  `thinking`/`tools`, and any `fingerprint` entries — exactly like a standalone
+  cross-run phase.
+
+Automatic fallbacks (per-item disables and the whole-map path is used):
+
+- `shareContext: true` on the phase, or flow-wide `contextSharing: true` — a
+  sharing item can read sibling blackboard writes outside its declared deps, so
+  the per-item key would under-approximate real reads.
+- The map runs **inside a runtime-generated sub-flow** (a `flow { def }` phase
+  or a `ctx_spawn({subflow})`) — untrusted / possibly non-deterministic.
+- `scope: "run-only"` (default) or `"off"` — no persistent store to reuse from.
+
+Notes & limitations:
+
+- Duplicate items (identical task + agent) share a single entry — reuse is
+  content-addressable, not positional.
+- Failed items and **budget-skipped** items are never cached, so they always
+  re-execute on the next run.
+- `{steps.<map>.json[k]}` indexes the k-th **successful** item (not the k-th
+  position in `over`); the merged `output` text, however, IS positionally
+  aligned with `over` (labels read `[k/N]`).
+- Within-run resume of a partially-completed map is not supported (only
+  fully-completed maps resume within a run); cross-run per-item reuse covers the
+  common case.
+
 ## Actions
 
 - `action: "run"` — run an inline `define` (a one-off DAG) **or** a saved `name` (with optional `args`). Use `define` for an ad-hoc flow; use `name` to invoke something previously saved. Add `detach: true` to run in the background (returns immediately with the runId; poll the store for status).
diff --git a/test/cache-peritem.test.ts b/test/cache-peritem.test.ts
new file mode 100644
index 0000000..3a34510
--- /dev/null
+++ b/test/cache-peritem.test.ts
@@ -0,0 +1,491 @@
+/**
+ * Per-item map caching — the Test Matrix from the approved plan.
+ *
+ * These tests pin the behavior of the per-item cross-run cache path added to
+ * the `map` branch: changing one of N items re-executes only that item,
+ * merged output stays positionally aligned with `over`, duplicate items share
+ * an entry, and the soundness fallbacks (shareContext, dynamic sub-flow,
+ * failed/budget-skipped items) hold.
+ *
+ * The realistic shape for per-item reuse is `over: "{args.items}"` with the
+ * array supplied via run args: the phase DEFINITION (and therefore
+ * flowDefHash / phaseFp) stays stable across runs, while the RESOLVED array
+ * changes — so per-item keys for unchanged items remain stable. Changing the
+ * `over` LITERAL would move the phase's structural fingerprint and invalidate
+ * every per-item key at once (no partial reuse), which is correct but not the
+ * scenario per-item caching targets.
+ */
+
+import assert from "node:assert/strict";
+import * as fs from "node:fs";
+import * as os from "node:os";
+import * as path from "node:path";
+import { test } from "node:test";
+import type { AgentConfig } from "../extensions/agents.ts";
+import { CacheStore } from "../extensions/cache.ts";
+import { phaseFingerprint, compileTaskflowToIR } from "../extensions/flowir/index.ts";
+import { cacheKeys, executeTaskflow, summarizeReuse, type PhaseCacheCtx, type RuntimeDeps } from "../extensions/runtime.ts";
+import type { RunOptions, RunResult } from "../extensions/runner.ts";
+import type { Taskflow } from "../extensions/schema.ts";
+import type { RunState } from "../extensions/store.ts";
+import { emptyUsage } from "../extensions/usage.ts";
+
+// ---------------------------------------------------------------------------
+// helpers
+// ---------------------------------------------------------------------------
+
+const AGENTS: AgentConfig[] = [
+	{ name: "a", description: "test agent", systemPrompt: "", source: "user", filePath: "" },
+	{ name: "b", description: "test agent b", systemPrompt: "", source: "user", filePath: "" },
+];
+
+function tmpDir(): string {
+	return fs.mkdtempSync(path.join(os.tmpdir(), "tf-peritem-"));
+}
+
+function mkState(def: Taskflow, cwd: string, args: Record<string, unknown> = {}): RunState {
+	return {
+		runId: `run-${Math.random().toString(36).slice(2, 8)}`,
+		flowName: def.name,
+		def,
+		args,
+		status: "running",
+		phases: {},
+		createdAt: Date.now(),
+		updatedAt: Date.now(),
+		cwd,
+	};
+}
+
+/** Counting runner: each successful call increments `counter.n` and emits a
+ *  deterministic output embedding the task + call index, so cache hits (which
+ *  skip the call) are observable as a missing index. `failWhen` lets a test
+ *  force a specific item to fail. */
+function countingRunner(
+	counter: { n: number },
+	failWhen?: (task: string) => string | null,
+): RuntimeDeps["runTask"] {
+	return async (_cwd, _agents, agentName, task, _o: RunOptions): Promise<RunResult> => {
+		counter.n++;
+		const fail = failWhen ? failWhen(task) : null;
+		if (fail) {
+			return {
+				agent: agentName,
+				task,
+				exitCode: 1,
+				output: "",
+				stderr: fail,
+				usage: { ...emptyUsage(), output: 5, cost: 0.001, turns: 1 },
+				stopReason: "error",
+				errorMessage: fail,
+			};
+		}
+		return {
+			agent: agentName,
+			task,
+			exitCode: 0,
+			output: `out:${task}#${counter.n}`,
+			stderr: "",
+			usage: { ...emptyUsage(), output: 10, cost: 0.001, turns: 1 },
+			stopReason: "end",
+		};
+	};
+}
+
+// ---------------------------------------------------------------------------
+// (a) change 1 of N items re-executes only that item
+// ---------------------------------------------------------------------------
+
+test("per-item: change 1 of N items re-executes only that item", async () => {
+	const dir = tmpDir();
+	const def: Taskflow = {
+		name: "peritem-change-one",
+		phases: [
+			{ id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", cache: { scope: "cross-run" }, final: true },
+		],
+	} as Taskflow;
+	const counter = { n: 0 };
+	const store = new CacheStore(dir);
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	const r1 = await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps);
+	assert.equal(counter.n, 3, "run1 executes all 3 items");
+	// Change ONLY item[1] (b -> b2). The phase def is unchanged (over is the
+	// literal "{args.items}"), so per-item keys for item[0]/item[2] are stable.
+	const r2 = await executeTaskflow(mkState(def, dir, { items: ["a", "b2", "c"] }), deps);
+	assert.equal(counter.n, 4, "run2 re-executes only item[1] (3 + 1)");
+	assert.equal(r2.state.phases.m.cacheHit, undefined, "phase executed (not a whole-map hit)");
+
+	// item[0] and item[2] were served from per-item cache: their outputs match
+	// run1 verbatim (same call index), proving no re-execution.
+	assert.match(r2.finalOutput, /out:process a#1\b/, "item[0] reused from per-item cache (call #1)");
+	assert.match(r2.finalOutput, /out:process c#3\b/, "item[2] reused from per-item cache (call #3)");
+	// item[1] re-executed → fresh call index #4.
+	assert.match(r2.finalOutput, /out:process b2#4\b/, "item[1] re-executed (call #4)");
+	// Sanity: run1's item[1] output is NOT present in run2.
+	assert.doesNotMatch(r2.finalOutput, /out:process b#2\b/);
+	// r1 sanity: all three call indices appear.
+	assert.match(r1.finalOutput, /out:process a#1\b/);
+	assert.match(r1.finalOutput, /out:process b#2\b/);
+	assert.match(r1.finalOutput, /out:process c#3\b/);
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (b) merged output stays positionally aligned with `over`
+// ---------------------------------------------------------------------------
+
+test("per-item: merged output stays positionally aligned with over (failed item keeps its slot)", async () => {
+	const dir = tmpDir();
+	const def: Taskflow = {
+		name: "peritem-positional",
+		phases: [
+			{ id: "m", type: "map", agent: "a", over: '["x","FAIL","y"]', task: "do {item}", cache: { scope: "cross-run" }, final: true },
+		],
+	} as Taskflow;
+	const counter = { n: 0 };
+	const store = new CacheStore(dir);
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter, (t) => (t.includes("FAIL") ? "boom" : null)), cacheStore: store };
+
+	const r = await executeTaskflow(mkState(def, dir), deps);
+	const out = r.finalOutput;
+	// Labels are positionally aligned to the original `over`: [1/3], [2/3] (failed), [3/3].
+	assert.match(out, /### \[1\/3\] a\n\nout:do x#\d/, "item[0] keeps slot 1/3");
+	assert.match(out, /### \[2\/3\] a \(failed\)\n\nboom/, "item[1] keeps slot 2/3 and is marked failed");
+	assert.match(out, /### \[3\/3\] a\n\nout:do y#\d/, "item[2] keeps slot 3/3");
+	// No [1/2] / [2/2] labels (the old non-positional behavior counted only ran items).
+	assert.doesNotMatch(out, /### \[1\/2\]/);
+	assert.doesNotMatch(out, /### \[2\/2\]/);
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (c) duplicate items share a single cache entry
+// ---------------------------------------------------------------------------
+
+test("per-item: duplicate items share a single cache entry (content-addressable)", async () => {
+	const dir = tmpDir();
+	const def: Taskflow = {
+		name: "peritem-dups",
+		phases: [
+			// concurrency:1 so item[0] records before item[1] looks up (deterministic).
+			{ id: "m", type: "map", agent: "a", over: "{args.items}", task: "do {item}", concurrency: 1, cache: { scope: "cross-run" }, final: true },
+		],
+	} as Taskflow;
+	const counter = { n: 0 };
+	const store = new CacheStore(dir);
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	// ["x","x","y"]: two identical tasks ("do x") share one per-item entry.
+	await executeTaskflow(mkState(def, dir, { items: ["x", "x", "y"] }), deps);
+	assert.equal(counter.n, 2, "run1: two DISTINCT tasks execute (do x once, do y once); the second do x hits the just-written entry");
+	// run2: all three hit (do x + do y already cached).
+	const r2 = await executeTaskflow(mkState(def, dir, { items: ["x", "x", "y"] }), deps);
+	assert.equal(counter.n, 2, "run2: all items served from cache (0 new calls)");
+	assert.equal(r2.state.phases.m.cacheHit, "cross-run", "whole-map fast path hits on identical re-run");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (d) shareContext map falls back to whole-map caching
+// ---------------------------------------------------------------------------
+
+test("per-item: shareContext map falls back to whole-map (no partial reuse)", async () => {
+	const dir = tmpDir();
+	const def: Taskflow = {
+		name: "peritem-sharectx",
+		phases: [
+			{ id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", shareContext: true, cache: { scope: "cross-run" }, final: true },
+		],
+	} as Taskflow;
+	const counter = { n: 0 };
+	const store = new CacheStore(dir);
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps);
+	assert.equal(counter.n, 3, "run1 executes all 3");
+	// Change only item[1]. With shareContext, per-item is unsound → disabled.
+	// Whole-map misses (items changed) → ALL items re-execute (no partial hits).
+	const r2 = await executeTaskflow(mkState(def, dir, { items: ["a", "b2", "c"] }), deps);
+	assert.equal(counter.n, 6, "run2 re-executes ALL 3 items (whole-map fallback, no per-item reuse)");
+	assert.equal(r2.state.phases.m.cacheHit, undefined, "phase executed (whole-map missed)");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (e) pre-existing whole-map entry still hits (fast path)
+// ---------------------------------------------------------------------------
+
+test("per-item: whole-map fast path still hits on identical re-run (precedence over per-item)", async () => {
+	const dir = tmpDir();
+	const def: Taskflow = {
+		name: "peritem-fastpath",
+		phases: [
+			{ id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", cache: { scope: "cross-run" }, final: true },
+		],
+	} as Taskflow;
+	const counter = { n: 0 };
+	const store = new CacheStore(dir);
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps);
+	assert.equal(counter.n, 3, "run1 seeds whole-map + per-item entries");
+	// Identical re-run: whole-map key matches → 1 hit, runFanout never engages.
+	const r2 = await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps);
+	assert.equal(counter.n, 3, "run2 hits the whole-map fast path (0 new calls)");
+	assert.equal(r2.state.phases.m.cacheHit, "cross-run", "whole-map hit sets the phase-level cacheHit");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (f) cross-run resume reuses completed items after re-seed (revert path)
+// ---------------------------------------------------------------------------
+
+test("per-item: revert to original re-runs hits the whole-map fast path (run1 entry preserved)", async () => {
+	const dir = tmpDir();
+	const def: Taskflow = {
+		name: "peritem-revert",
+		phases: [
+			{ id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", cache: { scope: "cross-run" }, final: true },
+		],
+	} as Taskflow;
+	const counter = { n: 0 };
+	const store = new CacheStore(dir);
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	const r1 = await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps);
+	assert.equal(counter.n, 3);
+	// Change item[1] → 1 re-exec, writes a NEW whole-map entry + new per-item.
+	await executeTaskflow(mkState(def, dir, { items: ["a", "b2", "c"] }), deps);
+	assert.equal(counter.n, 4, "run2: only item[1] re-executes");
+	// Revert to original. The whole-map key now matches run1's entry → fast-path hit.
+	const r3 = await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps);
+	assert.equal(counter.n, 4, "run3: whole-map fast path hits run1's entry (0 new calls)");
+	assert.equal(r3.state.phases.m.cacheHit, "cross-run");
+	assert.equal(r3.finalOutput, r1.finalOutput, "run3 output matches run1 exactly");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (g) usage + subProgress correct on partial hit
+// ---------------------------------------------------------------------------
+
+test("per-item: partial hit charges only the re-executed item; subProgress reflects all done", async () => {
+	const dir = tmpDir();
+	const def: Taskflow = {
+		name: "peritem-usage",
+		phases: [
+			{ id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", cache: { scope: "cross-run" }, final: true },
+		],
+	} as Taskflow;
+	const counter = { n: 0 };
+	const store = new CacheStore(dir);
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps);
+	assert.equal(counter.n, 3);
+	// Change item[1] only → 1 re-exec (cost 0.001); items 0+2 are 0-token cache hits.
+	const r2 = await executeTaskflow(mkState(def, dir, { items: ["a", "b2", "c"] }), deps);
+	assert.equal(counter.n, 4);
+	const m = r2.state.phases.m;
+	assert.equal(m.cacheHit, undefined, "phase executed (partial hit, not whole-map)");
+	// Cached items contribute emptyUsage → merged cost is exactly one item's cost.
+	assert.equal(m.usage?.cost ?? 0, 0.001, "only the re-executed item is charged");
+	// subProgress: all 3 items reached done (2 cached + 1 executed), none failed.
+	assert.equal(m.subProgress?.done, 3, "all 3 items done");
+	assert.equal(m.subProgress?.failed, 0, "no failures");
+	assert.equal(m.subProgress?.total, 3);
+	// summarizeReuse: the phase executed (partial hit) → counted as executed, not reused.
+	const reuse = summarizeReuse(r2.state);
+	assert.equal(reuse.executed, 1, "the map phase is counted as executed (it ran 1 item)");
+	assert.equal(reuse.reusedCrossRun, 0, "no whole-phase cross-run hit on a partial run");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (h) failed item is never cached
+// ---------------------------------------------------------------------------
+
+test("per-item: a failed item is never cached (re-executes on the next run)", async () => {
+	const dir = tmpDir();
+	const def: Taskflow = {
+		name: "peritem-nofail",
+		phases: [
+			{ id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", cache: { scope: "cross-run" }, final: true },
+		],
+	} as Taskflow;
+	const store = new CacheStore(dir);
+
+	// run1: item[1] ("process b") fails. Items 0+2 succeed and are cached per-item.
+	let counter = { n: 0 };
+	let failOn = "b";
+	const deps1: RuntimeDeps = {
+		cwd: dir, agents: AGENTS, cacheStore: store,
+		runTask: countingRunner(counter, (t) => (t.includes(`process ${failOn}`) ? "boom" : null)),
+	};
+	await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps1);
+	assert.equal(counter.n, 3, "run1 attempts all 3 (item[1] fails)");
+
+	// run2: same items, no failures. item[0]/[2] hit per-item; item[1] must
+	// RE-EXECUTE (its failure was not cached) and now succeeds.
+	counter = { n: 0 };
+	failOn = "";
+	const deps2: RuntimeDeps = { cwd: dir, agents: AGENTS, cacheStore: store, runTask: countingRunner(counter) };
+	const r2 = await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps2);
+	assert.equal(counter.n, 1, "run2: only the previously-failed item[1] re-executes; 0+2 hit per-item");
+	assert.equal(r2.state.phases.m.status, "done", "all items succeed on run2");
+	assert.match(r2.finalOutput, /out:process b#\d/, "item[1] now has a fresh successful output");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (i) budget-skipped item is never cached
+// ---------------------------------------------------------------------------
+
+test("per-item: a budget-skipped item is never recorded as a per-item cache entry", async () => {
+	const dir = tmpDir();
+	// concurrency:1 so the budget guard sees accumulated spend item-by-item.
+	// maxUSD 0.0015: run1 executes item[0] (0.001) + item[1] (0.001, total 0.002
+	// > cap) → item[2] is budget-skipped. We then inspect the cache store DIRECTLY:
+	// the skipped item must have NO per-item entry (else a later run could serve a
+	// stale "skipped" result), while the executed items DO have entries.
+	const def: Taskflow = {
+		name: "peritem-nobudgetskip",
+		budget: { maxUSD: 0.0015 },
+		phases: [
+			{ id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", concurrency: 1, cache: { scope: "cross-run" }, final: true },
+		],
+	} as Taskflow;
+	const store = new CacheStore(dir);
+
+	let counter = { n: 0 };
+	const deps1: RuntimeDeps = { cwd: dir, agents: AGENTS, cacheStore: store, runTask: countingRunner(counter) };
+	const r1 = await executeTaskflow(mkState(def, dir, { items: ["a", "b", "c"] }), deps1);
+	assert.equal(counter.n, 2, "run1: item[0]+item[1] execute, item[2] budget-skipped");
+	assert.equal(r1.state.phases.m.budgetTruncated, true, "map was cut short by the budget cap");
+
+	// Reconstruct the runtime's per-item CacheKeys to inspect the store.
+	// cc matches what executePhaseInner builds: scope cross-run, no fingerprint,
+	// empty preRead, and phaseFp = phaseFingerprint(def,"m") ?? flowDefHash.
+	const ir = await compileTaskflowToIR(def);
+	const flowDefHash = ir.hash ?? "failed";
+	const phaseFp = (await phaseFingerprint(def, "m")) ?? flowDefHash;
+	const cc: PhaseCacheCtx = {
+		scope: "cross-run",
+		fingerprint: "",
+		store,
+		prior: undefined,
+		phaseId: "m",
+		flowName: def.name,
+		runId: r1.state.runId,
+		flowDefHash,
+		phaseFp,
+		thinking: undefined,
+		tools: undefined,
+		preRead: "",
+	};
+	// Per-item key folds [phase.id, it.agent, model, it.task] (Arbiter fix).
+	const keyFor = (task: string) => cacheKeys(cc, ["m", "a", "", task]).key;
+	const keyA = keyFor("process a"); // item[0]: executed → cached
+	const keyB = keyFor("process b"); // item[1]: executed → cached
+	const keyC = keyFor("process c"); // item[2]: budget-skipped → NOT cached
+
+	assert.notEqual(store.get(keyA), null, "executed item[0] has a per-item cache entry");
+	assert.notEqual(store.get(keyB), null, "executed item[1] has a per-item cache entry");
+	assert.equal(store.get(keyC), null, "budget-skipped item[2] has NO per-item cache entry");
+	// The skipped item's entry (had it been written) would carry no real output;
+	// confirm the executed entries carry the real subagent output.
+	assert.match(store.get(keyA)?.output ?? "", /out:process a#/);
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (j) map inside a dynamic sub-flow (def: frame) uses whole-map only
+// ---------------------------------------------------------------------------
+
+test("per-item: map inside a dynamic sub-flow (def: frame) uses whole-map only (no partial reuse)", async () => {
+	const dir = tmpDir();
+	// Top-level flow phase with an inline `def` containing a cross-run map.
+	// The def-frame in the stack disables per-item caching for the inner map.
+	// `with.items` interpolates from top-level args so the resolved array can
+	// change WITHOUT changing the def literal (keeping the sub-flow identity
+	// stable would otherwise mask the behavior under the flow phase's own cache).
+	const mk = (): Taskflow => ({
+		name: "peritem-defframe",
+		phases: [
+			{
+				id: "sub",
+				type: "flow",
+				agent: "a",
+				with: { items: "{args.topItems}" },
+				cache: { scope: "cross-run" },
+				final: true,
+				def: {
+					name: "inner",
+					phases: [
+						{ id: "m", type: "map", agent: "a", over: "{args.items}", task: "process {item}", cache: { scope: "cross-run" }, final: true },
+					],
+				},
+			},
+		],
+	}) as Taskflow;
+	const def = mk();
+	const store = new CacheStore(dir);
+
+	let counter = { n: 0 };
+	const deps1: RuntimeDeps = { cwd: dir, agents: AGENTS, cacheStore: store, runTask: countingRunner(counter) };
+	await executeTaskflow(mkState(def, dir, { topItems: '["a","b","c"]' }), deps1);
+	assert.equal(counter.n, 3, "run1: inner map executes all 3 items");
+
+	// Identical re-run: the flow phase's whole-map cache hits → inner map is
+	// not even re-entered → 0 calls. Confirms the flow phase still caches.
+	counter = { n: 0 };
+	const deps2: RuntimeDeps = { cwd: dir, agents: AGENTS, cacheStore: store, runTask: countingRunner(counter) };
+	const r2 = await executeTaskflow(mkState(def, dir, { topItems: '["a","b","c"]' }), deps2);
+	assert.equal(counter.n, 0, "run2: flow phase whole-map hit (0 calls)");
+	assert.equal(r2.state.phases.sub.cacheHit, "cross-run");
+
+	// Change ONLY item[1]. The flow phase whole-map misses (subArgs changed) →
+	// inner map re-enters. Its whole-map also misses (items changed). Because the
+	// map is inside a def-frame, per-item is DISABLED → ALL 3 items re-execute
+	// (if per-item were enabled, only item[1] would run → counter.n would be 1).
+	counter = { n: 0 };
+	const deps3: RuntimeDeps = { cwd: dir, agents: AGENTS, cacheStore: store, runTask: countingRunner(counter) };
+	const r3 = await executeTaskflow(mkState(def, dir, { topItems: '["a","b2","c"]' }), deps3);
+	assert.equal(counter.n, 3, "run3: ALL items re-execute (per-item disabled inside def-frame; whole-map fallback)");
+	assert.equal(r3.state.phases.sub.cacheHit, undefined, "flow phase missed (items changed)");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (k) Arbiter fix: changing phase.agent invalidates all per-item keys
+// ---------------------------------------------------------------------------
+
+test("per-item: changing phase.agent invalidates every per-item key (no stale cross-agent hit)", async () => {
+	const dir = tmpDir();
+	const mk = (agent: string): Taskflow => ({
+		name: "peritem-agent",
+		phases: [
+			{ id: "m", type: "map", agent, over: "{args.items}", task: "process {item}", cache: { scope: "cross-run" }, final: true },
+		],
+	}) as Taskflow;
+	const counter = { n: 0 };
+	const store = new CacheStore(dir);
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	// run1 with agent "a": all items identical (same task text). Seeds per-item
+	// entries keyed on agent "a".
+	await executeTaskflow(mkState(mk("a"), dir, { items: ["a", "b", "c"] }), deps);
+	assert.equal(counter.n, 3);
+	// run2: SAME items + SAME task, but agent changed to "b". The per-item key
+	// folds `it.agent`, so every per-item key differs → no stale cross-agent hit.
+	// All 3 items re-execute under agent "b".
+	const r2 = await executeTaskflow(mkState(mk("b"), dir, { items: ["a", "b", "c"] }), deps);
+	assert.equal(counter.n, 6, "changing phase.agent must invalidate all per-item keys (3 + 3)");
+	assert.equal(r2.state.phases.m.cacheHit, undefined, "whole-map also missed (agent is in JSON.stringify(tasks))");
+	// Re-run with agent "b" again → whole-map fast path hits.
+	const r3 = await executeTaskflow(mkState(mk("b"), dir, { items: ["a", "b", "c"] }), deps);
+	assert.equal(counter.n, 6, "agent b now cached → 0 new calls");
+	assert.equal(r3.state.phases.m.cacheHit, "cross-run");
+	fs.rmSync(dir, { recursive: true, force: true });
+});

From fb13128be29f736584414b4aa6864e2d7d77f2d2 Mon Sep 17 00:00:00 2001
From: heggria <bshengtao@gmail.com>
Date: Fri, 26 Jun 2026 12:16:13 +0800
Subject: [PATCH 3/5] fix(cache): make map per-item keys omit structural
 fingerprint
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Per-item cross-run cache keys for the `map` phase folded both `phaseFp`
and `flowDefHash` (via the whole-phase `cc`). Both fingerprints hash the
`over` array source, so when a literal or data-derived `over` changed ONE
item between runs, EVERY per-item key moved at once — defeating partial
reuse (all N items re-executed instead of just the changed one).

Fix: build a per-item `ccPerItem` with BOTH `phaseFp` and `flowDefHash`
set to `undefined`, and use it only for per-item key construction. A
single item's output is fully specified by it.task (template + item/as
value + upstream-output refs + args) + it.agent + model +
thinking/tools/preRead + the world-state fingerprint; `over` only
determines WHICH items exist, not WHAT any item computes. `flowName` is
retained for cross-flow collision prevention.

The whole-map key keeps the FULL cc (phaseFp + flowDefHash) so its fast
path and any pre-existing whole-map entries are unchanged (backward
compat). The perItem object now carries its own cc so the lookup and
record paths in runFanout use the per-item variant consistently.

Soundness is preserved: task-template, agent, model, as (via resolved
it.task), upstream-output, and world-state changes all still invalidate
the correct items. shareContext / def-frame / failed / budget-skipped
fallbacks are unchanged.

Tests: add a bug-reproduction test (literal over, change 1 of N items)
that FAILS before the fix (counter 3 to 6) and PASSES after (3 to 4),
plus literal-over soundness variants (task/agent/upstream change imply
full re-exec) and whole-map fast-path + partial-hit + failed-item
de-masking variants. Update the budget-skipped test key reconstruction
to use the per-item cc (fingerprints omitted). Fix the e2e incremental
suite map section (add output: json so the merged-output assertion holds).
---
 extensions/runtime.ts          |  45 ++++--
 test/cache-peritem.test.ts     | 272 +++++++++++++++++++++++++++++++--
 test/e2e-incremental-suite.mts | 258 +++++++++++++++++++++++++++++++
 3 files changed, 551 insertions(+), 24 deletions(-)
 create mode 100644 test/e2e-incremental-suite.mts

diff --git a/extensions/runtime.ts b/extensions/runtime.ts
index 76c4a99..d8d8749 100644
--- a/extensions/runtime.ts
+++ b/extensions/runtime.ts
@@ -909,7 +909,7 @@ async function executePhaseInner(
 	// `perItem` is undefined (parallel, or non-cacheable maps) the path is inert.
 	const runFanout = async (
 		items: Array<{ agent: string; task: string }>,
-		perItem?: { keyOf: (idx: number) => CacheKeys | null },
+		perItem?: { keyOf: (idx: number) => CacheKeys | null; cc: PhaseCacheCtx },
 	): Promise<RunResult[]> => {
 		let done = 0;
 		let running = 0;
@@ -952,7 +952,7 @@ async function executePhaseInner(
 				try {
 					const ckItem = perItem.keyOf(idx);
 					if (ckItem) {
-						const hit = cachedPhase(cc, ckItem);
+						const hit = cachedPhase(perItem.cc, ckItem);
 						if (hit) {
 							done++;
 							const synth = phaseStateToRunResult(hit, it);
@@ -990,7 +990,7 @@ async function executePhaseInner(
 				try {
 					const ckItem = perItem.keyOf(idx);
 					if (ckItem) {
-						const ccItem: PhaseCacheCtx = { ...cc, phaseId: `${phase.id}#item${idx}` };
+						const ccItem: PhaseCacheCtx = { ...perItem.cc, phaseId: `${phase.id}#item${idx}` };
 						const itemPs = resultToPhaseState(`${phase.id}#item${idx}`, r, ckItem.key, parseJson);
 						recordCache(ccItem, itemPs);
 					}
@@ -1207,26 +1207,43 @@ async function executePhaseInner(
 		//  - not inside a runtime-generated sub-flow (`def:` frame in the stack):
 		//    such flows are untrusted / possibly non-deterministic, so per-item reuse
 		//    is unsafe. Fall back to whole-map (which still applies breadth caps).
-		// `undefined phaseFingerprint` is NOT a blocker: `cacheKeys` falls back to
-		// the whole-flow `flowDefHash`, which is stable across runs for a fixed def,
-		// so per-item keys for unchanged items remain stable.
+		// `undefined phaseFingerprint` is NOT a blocker for soundness — it is a
+		// DELIBERATE design choice: per-item keys omit BOTH phaseFp and flowDefHash
+		// (via ccPerItem below) so a changing `over` cannot move unchanged items'
+		// keys. See ccPerItem for the full soundness argument.
 		const perItemCacheable =
 			cc.scope === "cross-run" &&
 			!sharing &&
 			!(deps._stack ?? []).some((s) => s.startsWith("def:"));
+		// Per-item cache context: structural fingerprints (phaseFp + flowDefHash)
+		// are OMITTED so a changing `over` cannot move unchanged items' keys. Both
+		// fingerprints hash `over` (the array source); folding either into a
+		// per-item key means editing one item invalidates EVERY per-item key at
+		// once (no partial reuse) — the bug fixed here. A single item's output is
+		// fully specified by `it.task` (template + {item}/{as} value + any
+		// upstream-output refs + args) + `it.agent` + model + thinking/tools/preRead
+		// + the world-state `fingerprint`; `over` only determines WHICH items
+		// exist, not WHAT any item computes. `flowName` is retained for cross-flow
+		// collision prevention. Soundness: docs/internal/cache-migration.md.
+		// NB: perItemCacheable already gates on scope === "cross-run", which is
+		// blocked upstream when flowDefHash === "failed", so ccPerItem is only
+		// built when flowDefHash is a real hash (or already undefined) — setting
+		// it to undefined here is a safe no-op for the failed case.
+		const ccPerItem: PhaseCacheCtx = { ...cc, phaseFp: undefined, flowDefHash: undefined };
 		// Pre-compute per-item CacheKeys once so the lookup and the record path use
-		// the IDENTICAL key (and share cacheKeys' v3:phasefp + flow-name +
-		// fingerprint + thinking/tools/preRead contract). The per-item key folds
-		// `it.agent` (Arbiter fix): a different agent means different output, so a
-		// per-item key WITHOUT the agent could serve a stale cross-agent hit when
-		// only `phase.agent` changed (the whole-map key would correctly miss via
-		// JSON.stringify(tasks), but per-item keys would not).
+		// the IDENTICAL key (built from ccPerItem, NOT the whole-phase cc). The
+		// per-item key folds `it.agent` (Arbiter fix): a different agent means
+		// different output, so a per-item key WITHOUT the agent could serve a stale
+		// cross-agent hit when only `phase.agent` changed (the whole-map key would
+		// correctly miss via JSON.stringify(tasks), but per-item keys would not).
 		const perItemKeys: (CacheKeys | null)[] = perItemCacheable
-			? tasks.map((it) => cacheKeys(cc, [phase.id, it.agent, phase.model ?? "", it.task]))
+			? tasks.map((it) => cacheKeys(ccPerItem, [phase.id, it.agent, phase.model ?? "", it.task]))
 			: tasks.map(() => null);
 		const perItem = perItemCacheable
-			? { keyOf: (idx: number): CacheKeys | null => perItemKeys[idx] ?? null }
+			? { keyOf: (idx: number): CacheKeys | null => perItemKeys[idx] ?? null, cc: ccPerItem }
 			: undefined;
+		// Whole-map key keeps the FULL cc (phaseFp + flowDefHash) so its fast path
+		// and any pre-existing whole-map entries are unchanged (backward compat).
 		const ck = cacheKeys(cc, [phase.id, phase.model ?? "", JSON.stringify(tasks)]);
 		const inputHash = ck.key;
 		const cached = cachedPhase(cc, ck);
diff --git a/test/cache-peritem.test.ts b/test/cache-peritem.test.ts
index 3a34510..d43e5ba 100644
--- a/test/cache-peritem.test.ts
+++ b/test/cache-peritem.test.ts
@@ -23,7 +23,6 @@ import * as path from "node:path";
 import { test } from "node:test";
 import type { AgentConfig } from "../extensions/agents.ts";
 import { CacheStore } from "../extensions/cache.ts";
-import { phaseFingerprint, compileTaskflowToIR } from "../extensions/flowir/index.ts";
 import { cacheKeys, executeTaskflow, summarizeReuse, type PhaseCacheCtx, type RuntimeDeps } from "../extensions/runtime.ts";
 import type { RunOptions, RunResult } from "../extensions/runner.ts";
 import type { Taskflow } from "../extensions/schema.ts";
@@ -365,12 +364,11 @@ test("per-item: a budget-skipped item is never recorded as a per-item cache entr
 	assert.equal(r1.state.phases.m.budgetTruncated, true, "map was cut short by the budget cap");
 
 	// Reconstruct the runtime's per-item CacheKeys to inspect the store.
-	// cc matches what executePhaseInner builds: scope cross-run, no fingerprint,
-	// empty preRead, and phaseFp = phaseFingerprint(def,"m") ?? flowDefHash.
-	const ir = await compileTaskflowToIR(def);
-	const flowDefHash = ir.hash ?? "failed";
-	const phaseFp = (await phaseFingerprint(def, "m")) ?? flowDefHash;
-	const cc: PhaseCacheCtx = {
+	// Per-item keys are built from ccPerItem — the whole-phase cc with BOTH
+	// phaseFp and flowDefHash set to undefined (so a changing `over` cannot move
+	// unchanged items' keys). So the reconstructed cc must ALSO omit both
+	// fingerprints to match what the runtime writes under.
+	const ccPerItem: PhaseCacheCtx = {
 		scope: "cross-run",
 		fingerprint: "",
 		store,
@@ -378,14 +376,15 @@ test("per-item: a budget-skipped item is never recorded as a per-item cache entr
 		phaseId: "m",
 		flowName: def.name,
 		runId: r1.state.runId,
-		flowDefHash,
-		phaseFp,
+		flowDefHash: undefined,
+		phaseFp: undefined,
 		thinking: undefined,
 		tools: undefined,
 		preRead: "",
 	};
 	// Per-item key folds [phase.id, it.agent, model, it.task] (Arbiter fix).
-	const keyFor = (task: string) => cacheKeys(cc, ["m", "a", "", task]).key;
+	// (phaseFp/flowDefHash are intentionally absent — see ccPerItem above.)
+	const keyFor = (task: string) => cacheKeys(ccPerItem, ["m", "a", "", task]).key;
 	const keyA = keyFor("process a"); // item[0]: executed → cached
 	const keyB = keyFor("process b"); // item[1]: executed → cached
 	const keyC = keyFor("process c"); // item[2]: budget-skipped → NOT cached
@@ -489,3 +488,256 @@ test("per-item: changing phase.agent invalidates every per-item key (no stale cr
 	assert.equal(r3.state.phases.m.cacheHit, "cross-run");
 	fs.rmSync(dir, { recursive: true, force: true });
 });
+
+// ---------------------------------------------------------------------------
+// (L0) BUG REPRODUCTION: literal `over` — change 1 of N items re-executes only that item.
+//
+// Unlike the {args.items} tests above (whose phase DEFINITION is stable across
+// runs), a literal `over: '["a","b","c"]'` bakes the array into the def. Changing
+// one item CHANGES the def → flowDefHash AND phaseFp both move (neither strips
+// `over`). Before the fix, ALL per-item keys moved at once → every item
+// re-executed (counter 3 → 6). After the fix, per-item keys omit BOTH
+// phaseFp and flowDefHash (via ccPerItem), so an unchanged item's key is stable
+// (it depends only on it.task + agent + model + thinking/tools/preRead +
+// world-state fingerprint) → only the changed item re-runs (3 → 4).
+// ---------------------------------------------------------------------------
+
+test("per-item: LITERAL over — change 1 of N items re-executes only that item (bug repro)", async () => {
+	const dir = tmpDir();
+	const mk = (items: string[]): Taskflow => ({
+		name: "peritem-literal-repro",
+		phases: [
+			{ id: "m", type: "map", agent: "a", over: JSON.stringify(items), task: "process {item}", cache: { scope: "cross-run" }, final: true },
+		],
+	}) as Taskflow;
+	const counter = { n: 0 };
+	const store = new CacheStore(dir);
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	const r1 = await executeTaskflow(mkState(mk(["a", "b", "c"]), dir), deps);
+	assert.equal(counter.n, 3, "run1 executes all 3 items");
+	assert.match(r1.finalOutput, /out:process a#1\b/);
+	assert.match(r1.finalOutput, /out:process b#2\b/);
+	assert.match(r1.finalOutput, /out:process c#3\b/);
+
+	// Change ONLY item[1] (b -> b2). The literal `over` changes, so flowDefHash/
+	// phaseFp move — but per-item keys must be invariant to `over` changes.
+	const r2 = await executeTaskflow(mkState(mk(["a", "b2", "c"]), dir), deps);
+	assert.equal(counter.n, 4, "run2 re-executes only item[1] (3 + 1)");
+	assert.equal(r2.state.phases.m.cacheHit, undefined, "phase executed (partial hit, not whole-map)");
+	// item[0] and item[2] reused verbatim from per-item cache (same call index).
+	assert.match(r2.finalOutput, /out:process a#1\b/, "item[0] reused from per-item cache (call #1)");
+	assert.match(r2.finalOutput, /out:process c#3\b/, "item[2] reused from per-item cache (call #3)");
+	// item[1] re-executed → fresh call index #4.
+	assert.match(r2.finalOutput, /out:process b2#4\b/, "item[1] re-executed (call #4)");
+	// Sanity: run1's item[1] output is NOT present in run2.
+	assert.doesNotMatch(r2.finalOutput, /out:process b#2\b/);
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (L1) Soundness: task template change invalidates ALL items (literal over).
+// `it.task` is the per-item identity — changing the template changes every
+// item's task, so every per-item key must move (full re-exec).
+// ---------------------------------------------------------------------------
+
+test("per-item: LITERAL over — task template change re-executes all items", async () => {
+	const dir = tmpDir();
+	const mk = (task: string, items: string[]): Taskflow => ({
+		name: "peritem-literal-task",
+		phases: [
+			{ id: "m", type: "map", agent: "a", over: JSON.stringify(items), task, cache: { scope: "cross-run" }, final: true },
+		],
+	}) as Taskflow;
+	const counter = { n: 0 };
+	const store = new CacheStore(dir);
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	await executeTaskflow(mkState(mk("process {item}", ["a", "b", "c"]), dir), deps);
+	assert.equal(counter.n, 3, "run1 executes all 3");
+	// Same items, but task template changed → every it.task differs → all re-exec.
+	const r2 = await executeTaskflow(mkState(mk("analyze {item}", ["a", "b", "c"]), dir), deps);
+	assert.equal(counter.n, 6, "run2 re-executes ALL items (task template changed → every key moved)");
+	assert.equal(r2.state.phases.m.cacheHit, undefined, "whole-map also missed (tasks JSON differs)");
+	assert.match(r2.finalOutput, /out:analyze a#4\b/);
+	assert.match(r2.finalOutput, /out:analyze b#5\b/);
+	assert.match(r2.finalOutput, /out:analyze c#6\b/);
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (L2) Soundness: agent change invalidates ALL items (literal over).
+// The per-item key folds `it.agent`, so changing phase.agent moves every key.
+// ---------------------------------------------------------------------------
+
+test("per-item: LITERAL over — agent change re-executes all items", async () => {
+	const dir = tmpDir();
+	const mk = (agent: string): Taskflow => ({
+		name: "peritem-literal-agent",
+		phases: [
+			{ id: "m", type: "map", agent, over: JSON.stringify(["a", "b", "c"]), task: "process {item}", cache: { scope: "cross-run" }, final: true },
+		],
+	}) as Taskflow;
+	const counter = { n: 0 };
+	const store = new CacheStore(dir);
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	await executeTaskflow(mkState(mk("a"), dir), deps);
+	assert.equal(counter.n, 3);
+	// Same items + same task, but agent changed → every per-item key moves.
+	const r2 = await executeTaskflow(mkState(mk("b"), dir), deps);
+	assert.equal(counter.n, 6, "agent change invalidates all per-item keys (3 + 3)");
+	assert.equal(r2.state.phases.m.cacheHit, undefined);
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (L3) Soundness: `as` field interaction is implicitly covered.
+// `as` only renames the loop variable; the resolved `it.task` text is what
+// flows into the per-item key. If the author keeps the template consistent
+// with `as`, the interpolated text is unchanged → no spurious invalidation
+// (correct). If they desync them, `it.task` differs → invalidation (correct,
+// covered by L1's task-template principle). No separate test needed.
+// ---------------------------------------------------------------------------
+
+// ---------------------------------------------------------------------------
+// (L4) Soundness: upstream output referenced in task re-executes all items.
+// A map task that interpolates {steps.discover.output} folds the upstream
+// output into it.task — when the upstream output changes, every per-item key
+// moves (correct: the map's input genuinely changed).
+// ---------------------------------------------------------------------------
+
+test("per-item: upstream output referenced in task invalidates all items when it changes", async () => {
+	const dir = tmpDir();
+	const mk = (discoverOut: string): Taskflow => ({
+		name: "peritem-upstream",
+		phases: [
+			{ id: "discover", type: "agent", agent: "a", task: "discover" },
+			{ id: "m", type: "map", agent: "a", over: JSON.stringify(["x", "y"]), task: `do {item} with {steps.discover.output}`, dependsOn: ["discover"], cache: { scope: "cross-run" }, final: true },
+		],
+	}) as Taskflow;
+	let counter = { n: 0 };
+	const store = new CacheStore(dir);
+	// Runner that emits a configurable discover output + counting map calls.
+	const mkDeps = (discoverOut: string): RuntimeDeps => ({
+		cwd: dir, agents: AGENTS, cacheStore: store,
+		runTask: async (_cwd, _agents, agentName, task): Promise<RunResult> => {
+			counter.n++;
+			const out = task === "discover" ? discoverOut : `out:${task}#${counter.n}`;
+			return { agent: agentName, task, exitCode: 0, output: out, stderr: "", usage: { ...emptyUsage(), output: 10, cost: 0.001, turns: 1 }, stopReason: "end" };
+		},
+	});
+
+	await executeTaskflow(mkState(mk("CTX1"), dir), mkDeps("CTX1"));
+	const mapCalls1 = counter.n;
+	assert.ok(mapCalls1 >= 3, "run1: discover + 2 map items execute");
+	// discover output changes → it.task for EVERY map item changes → all re-exec.
+	counter = { n: 0 };
+	const r2 = await executeTaskflow(mkState(mk("CTX2"), dir), mkDeps("CTX2"));
+	// discover re-runs (its task changed too — same literal, but flowDefHash/phaseFp
+	// move because the map phase's over-or-task is the SAME literal here... actually
+	// discover's task literal is unchanged so it hits cross-run). Either way, both
+	// map items must re-execute because {steps.discover.output} differs.
+	assert.match(r2.finalOutput, /do x with CTX2/, "map item x re-executed with new upstream output");
+	assert.match(r2.finalOutput, /do y with CTX2/, "map item y re-executed with new upstream output");
+	assert.doesNotMatch(r2.finalOutput, /do x with CTX1/, "stale upstream-coupled output not served");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (L5) Whole-map fast path still hits on identical re-run (literal over).
+// The whole-map key keeps the FULL cc (phaseFp + flowDefHash), so an identical
+// re-run hits the whole-map fast path — per-item path never engages.
+// ---------------------------------------------------------------------------
+
+test("per-item: LITERAL over — whole-map fast path hits on identical re-run", async () => {
+	const dir = tmpDir();
+	const def: Taskflow = {
+		name: "peritem-literal-fastpath",
+		phases: [
+			{ id: "m", type: "map", agent: "a", over: JSON.stringify(["a", "b", "c"]), task: "process {item}", cache: { scope: "cross-run" }, final: true },
+		],
+	} as Taskflow;
+	const counter = { n: 0 };
+	const store = new CacheStore(dir);
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	await executeTaskflow(mkState(def, dir), deps);
+	assert.equal(counter.n, 3, "run1 executes all 3");
+	// Identical re-run: whole-map key matches → 1 hit, runFanout never engages.
+	const r2 = await executeTaskflow(mkState(def, dir), deps);
+	assert.equal(counter.n, 3, "run2 hits whole-map fast path (0 new calls)");
+	assert.equal(r2.state.phases.m.cacheHit, "cross-run", "whole-map hit sets phase-level cacheHit");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (L6) De-mask: partial hit charges only the re-executed item (literal over).
+// Literal-`over` variant of test (g). Before the fix this was impossible
+// (all items re-executed); now only item[1] re-runs → cost is exactly one item.
+// ---------------------------------------------------------------------------
+
+test("per-item: LITERAL over — partial hit charges only the re-executed item", async () => {
+	const dir = tmpDir();
+	const mk = (items: string[]): Taskflow => ({
+		name: "peritem-literal-usage",
+		phases: [
+			{ id: "m", type: "map", agent: "a", over: JSON.stringify(items), task: "process {item}", cache: { scope: "cross-run" }, final: true },
+		],
+	}) as Taskflow;
+	const counter = { n: 0 };
+	const store = new CacheStore(dir);
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: countingRunner(counter), cacheStore: store };
+
+	await executeTaskflow(mkState(mk(["a", "b", "c"]), dir), deps);
+	assert.equal(counter.n, 3);
+	// Change item[1] only → 1 re-exec (cost 0.001); items 0+2 are 0-token cache hits.
+	const r2 = await executeTaskflow(mkState(mk(["a", "b2", "c"]), dir), deps);
+	assert.equal(counter.n, 4);
+	const m = r2.state.phases.m;
+	assert.equal(m.cacheHit, undefined, "phase executed (partial hit, not whole-map)");
+	assert.equal(m.usage?.cost ?? 0, 0.001, "only the re-executed item is charged");
+	assert.equal(m.subProgress?.done, 3, "all 3 items done");
+	assert.equal(m.subProgress?.failed, 0);
+	assert.equal(m.subProgress?.total, 3);
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+// ---------------------------------------------------------------------------
+// (L7) De-mask: a failed item is never cached (literal over).
+// Literal-`over` variant of test (h). A failing item must not be recorded,
+// so a later run with the SAME literal `over` (same def!) re-executes only it.
+// Note: because the def is identical across runs here, flowDefHash/phaseFp are
+// stable — so this test would have PASSED even before the fix. It's included
+// to lock the behavior for the literal-`over` shape (de-masking the suite).
+// ---------------------------------------------------------------------------
+
+test("per-item: LITERAL over — a failed item is never cached (re-executes next run)", async () => {
+	const dir = tmpDir();
+	const def: Taskflow = {
+		name: "peritem-literal-nofail",
+		phases: [
+			{ id: "m", type: "map", agent: "a", over: JSON.stringify(["a", "b", "c"]), task: "process {item}", cache: { scope: "cross-run" }, final: true },
+		],
+	} as Taskflow;
+	const store = new CacheStore(dir);
+
+	// run1: item[1] ("process b") fails. Items 0+2 succeed and are cached per-item.
+	let counter = { n: 0 };
+	const deps1: RuntimeDeps = {
+		cwd: dir, agents: AGENTS, cacheStore: store,
+		runTask: countingRunner(counter, (t) => (t.includes("process b") ? "boom" : null)),
+	};
+	await executeTaskflow(mkState(def, dir), deps1);
+	assert.equal(counter.n, 3, "run1 attempts all 3 (item[1] fails)");
+
+	// run2: SAME def (same literal over), no failures. item[0]/[2] hit per-item;
+	// item[1] must RE-EXECUTE (its failure was not cached) and now succeeds.
+	counter = { n: 0 };
+	const deps2: RuntimeDeps = { cwd: dir, agents: AGENTS, cacheStore: store, runTask: countingRunner(counter) };
+	const r2 = await executeTaskflow(mkState(def, dir), deps2);
+	assert.equal(counter.n, 1, "run2: only the previously-failed item[1] re-executes; 0+2 hit per-item");
+	assert.equal(r2.state.phases.m.status, "done", "all items succeed on run2");
+	assert.match(r2.finalOutput, /out:process b#\d/, "item[1] now has a fresh successful output");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
diff --git a/test/e2e-incremental-suite.mts b/test/e2e-incremental-suite.mts
new file mode 100644
index 0000000..e105893
--- /dev/null
+++ b/test/e2e-incremental-suite.mts
@@ -0,0 +1,258 @@
+/**
+ * E2E suite for the "complete incremental recompute" landing (v0.0.28):
+ * the five coupled capabilities shipped across the M5 finish line, exercised
+ * end-to-end through the REAL runtime + REAL on-disk CacheStore with a
+ * deterministic mock subagent runner (no live `pi` / model access needed).
+ *
+ *   1. precise ir-changed diff   — editing one phase reuses the others cross-run
+ *   2. map item-level reuse       — editing one fan-out item reruns only it
+ *   3. incremental flag           — flow.incremental / override → cross-run default
+ *   4. run reuse summary          — summarizeReuse counts reused vs executed
+ *   5. recompute decision trace   — per-phase why (rerun/cutoff/reused + causedBy)
+ *
+ * Run:  node --experimental-strip-types test/e2e-incremental-suite.mts
+ */
+
+import * as fs from "node:fs";
+import * as os from "node:os";
+import * as path from "node:path";
+import type { AgentConfig } from "../extensions/agents.ts";
+import { CacheStore } from "../extensions/cache.ts";
+import {
+	executeTaskflow,
+	recomputeTaskflow,
+	summarizeReuse,
+	type RuntimeDeps,
+} from "../extensions/runtime.ts";
+import { resolveCacheScope } from "../extensions/index.ts";
+import type { RunResult, RunOptions } from "../extensions/runner.ts";
+import type { Taskflow } from "../extensions/schema.ts";
+import type { RunState } from "../extensions/store.ts";
+import { emptyUsage } from "../extensions/usage.ts";
+
+const C = {
+	ok: (s: string) => `\x1b[32m${s}\x1b[0m`,
+	bad: (s: string) => `\x1b[31m${s}\x1b[0m`,
+	hl: (s: string) => `\x1b[36m${s}\x1b[0m`,
+	bold: (s: string) => `\x1b[1m${s}\x1b[0m`,
+};
+
+const AGENTS: AgentConfig[] = [
+	{ name: "a", description: "test agent", systemPrompt: "", source: "user", filePath: "" },
+];
+
+let failures = 0;
+const assert = (cond: boolean, msg: string) => {
+	if (cond) console.log(`  ${C.ok("✓")} ${msg}`);
+	else {
+		failures++;
+		console.log(`  ${C.bad("✗")} ${msg}`);
+	}
+};
+const section = (s: string) => console.log(`\n${C.hl("▸ " + s)}`);
+
+function tmpDir(): string {
+	return fs.mkdtempSync(path.join(os.tmpdir(), "tf-e2e-incr-"));
+}
+function mkState(def: Taskflow, cwd: string): RunState {
+	return {
+		runId: `run-${Math.random().toString(36).slice(2, 8)}`,
+		flowName: def.name,
+		def,
+		args: {},
+		status: "running",
+		phases: {},
+		createdAt: Date.now(),
+		updatedAt: Date.now(),
+		cwd,
+	};
+}
+/** A deterministic runner: output is a pure function of the task text, so two
+ *  runs with the same task produce byte-identical output (content-addressable).
+ *  Records every executed task so we can assert exactly which phases ran. */
+function recordingRunner(record: string[]): RuntimeDeps["runTask"] {
+	return async (_cwd, _agents, agentName, task, _o: RunOptions): Promise<RunResult> => {
+		record.push(task);
+		return {
+			agent: agentName,
+			task,
+			exitCode: 0,
+			output: `out:${task}`,
+			stderr: "",
+			usage: { ...emptyUsage(), output: 10, cost: 0.003, turns: 1 },
+			stopReason: "end",
+		};
+	};
+}
+
+async function main() {
+	// -----------------------------------------------------------------------
+	// 1 + 3 + 4: precise ir-changed diff under the incremental flag.
+	// An incremental flow scout→audit→report + an independent sibling. Run once,
+	// edit ONLY audit's task, re-run: scout & independent must hit cross-run
+	// (their per-phase fingerprints didn't move), audit must re-run.
+	// -----------------------------------------------------------------------
+	section("precise ir-changed diff (incremental flow): edit one phase, reuse the rest");
+	{
+		const dir = tmpDir();
+		const store = new CacheStore(dir);
+		const mkDef = (auditTask: string): Taskflow =>
+			({
+				name: "incr-precise",
+				incremental: true,
+				phases: [
+					{ id: "scout", type: "agent", agent: "a", task: "scan" },
+					{ id: "independent", type: "agent", agent: "a", task: "unrelated analysis" },
+					{ id: "audit", type: "agent", agent: "a", task: auditTask, dependsOn: ["scout"] },
+					{
+						id: "report",
+						type: "agent",
+						agent: "a",
+						task: "report {steps.audit.output} {steps.independent.output}",
+						dependsOn: ["audit", "independent"],
+						final: true,
+					},
+				],
+			}) as Taskflow;
+
+		// The flow declares incremental:true → resolveCacheScope opts it into cross-run.
+		const scope = resolveCacheScope(undefined, mkDef("audit {steps.scout.output}").incremental);
+		assert(scope === "cross-run", "flow.incremental=true → cross-run default scope");
+
+		const rec1: string[] = [];
+		const deps1: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: recordingRunner(rec1), cacheStore: store, cacheScopeDefault: scope };
+		const r1 = await executeTaskflow(mkState(mkDef("audit v1 {steps.scout.output}"), dir), deps1);
+		assert(r1.ok, "run 1 completed");
+		assert(rec1.length === 4, `run 1 executed all 4 phases (got ${rec1.length})`);
+		const s1 = summarizeReuse(r1.state);
+		assert(s1.executed === 4 && s1.reusedCrossRun === 0, "run 1 reuse summary: 4 executed, 0 reused");
+
+		// Edit ONLY audit's task. Re-run (fresh state, same store = cross-run).
+		const rec2: string[] = [];
+		const deps2: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: recordingRunner(rec2), cacheStore: store, cacheScopeDefault: scope };
+		const r2 = await executeTaskflow(mkState(mkDef("audit v2 {steps.scout.output}"), dir), deps2);
+		assert(r2.ok, "run 2 completed");
+		// scout + independent unchanged → their per-phase fingerprints didn't move
+		// → cross-run hit. audit changed → re-run. report reads audit → re-run.
+		assert(!rec2.includes("scan"), "scout reused cross-run (not re-executed)");
+		assert(!rec2.includes("unrelated analysis"), "independent reused cross-run (the precise-diff win)");
+		assert(rec2.some((t) => t.includes("audit v2")), "audit re-executed (its task changed)");
+		const s2 = summarizeReuse(r2.state);
+		assert(s2.reusedCrossRun >= 2, `run 2 reused ≥2 phases cross-run (got ${s2.reusedCrossRun})`);
+		assert(s2.reusedCrossRun + s2.executed === 4, "run 2 accounting balances (reused + executed = 4)");
+		fs.rmSync(dir, { recursive: true, force: true });
+	}
+
+	// -----------------------------------------------------------------------
+	// 2: map item-level reuse — change one item's input, only it re-runs.
+	// -----------------------------------------------------------------------
+	section("map item-level reuse: edit one fan-out item, rerun only that item");
+	{
+		const dir = tmpDir();
+		const store = new CacheStore(dir);
+		const mkDef = (items: string[]): Taskflow =>
+			({
+				name: "incr-map",
+				incremental: true,
+				phases: [
+					{ id: "seed", type: "agent", agent: "a", task: "seed", output: "json" },
+					{
+						id: "fan",
+						type: "map",
+						agent: "a",
+						over: JSON.stringify(items),
+						task: "process {item}",
+						dependsOn: [],
+						output: "json",
+						final: true,
+					},
+				],
+			}) as Taskflow;
+
+		const rec1: string[] = [];
+		const deps1: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: recordingRunner(rec1), cacheStore: store, cacheScopeDefault: "cross-run" };
+		await executeTaskflow(mkState(mkDef(["alpha", "beta", "gamma"]), dir), deps1);
+		const fanRuns1 = rec1.filter((t) => t.startsWith("process "));
+		assert(fanRuns1.length === 3, `run 1 fanned out 3 items (got ${fanRuns1.length})`);
+
+		// Change ONLY the middle item: beta → BETA2.
+		const rec2: string[] = [];
+		const deps2: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: recordingRunner(rec2), cacheStore: store, cacheScopeDefault: "cross-run" };
+		const r2 = await executeTaskflow(mkState(mkDef(["alpha", "BETA2", "gamma"]), dir), deps2);
+		const fanRuns2 = rec2.filter((t) => t.startsWith("process "));
+		assert(fanRuns2.length === 1, `run 2 re-executed only the changed item (got ${fanRuns2.length})`);
+		assert(fanRuns2[0] === "process BETA2", "the one re-executed item is the changed one");
+		assert(!fanRuns2.includes("process alpha") && !fanRuns2.includes("process gamma"), "alpha & gamma reused per-item");
+		// Order invariant: merged output stays aligned with `over`.
+		const out = r2.state.phases.fan?.json as unknown[] | undefined;
+		assert(Array.isArray(out) && out.length === 3, "merged output has all 3 items in order");
+		fs.rmSync(dir, { recursive: true, force: true });
+	}
+
+	// -----------------------------------------------------------------------
+	// 5: recompute decision trace — per-phase why + causedBy attribution.
+	// -----------------------------------------------------------------------
+	section("recompute decision trace: per-phase why + upstream attribution");
+	{
+		const dir = tmpDir();
+		const def: Taskflow = {
+			name: "incr-trace",
+			concurrency: 1,
+			phases: [
+				{ id: "scout", type: "agent", agent: "a", task: "scan" },
+				{ id: "independent", type: "agent", agent: "a", task: "unrelated" },
+				{ id: "audit", type: "agent", agent: "a", task: "audit {steps.scout.output}", dependsOn: ["scout"] },
+				{ id: "report", type: "agent", agent: "a", task: "report {steps.audit.output} {steps.independent.output}", dependsOn: ["audit", "independent"], final: true },
+			],
+		} as Taskflow;
+		const rec: string[] = [];
+		const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: recordingRunner(rec), cacheStore: new CacheStore(dir) };
+		const state = mkState(def, dir);
+		await executeTaskflow(state, deps);
+
+		const { report } = await recomputeTaskflow(state, deps, ["scout"], { dryRun: false });
+		const byId = Object.fromEntries(report.decisions.map((d) => [d.phaseId, d]));
+		assert(byId.scout?.outcome === "rerun" && /seed/.test(byId.scout.reason), "scout: rerun (seed)");
+		assert(byId.audit?.outcome === "rerun", "audit: rerun (upstream moved)");
+		assert(JSON.stringify(byId.audit?.causedBy) === JSON.stringify(["scout"]), "audit rerun attributed to scout");
+		assert(JSON.stringify(byId.report?.causedBy) === JSON.stringify(["audit"]), "report rerun attributed to audit (not scout)");
+		assert(byId.independent?.outcome === "reused" && /not reachable/.test(byId.independent.reason), "independent: reused (unreachable)");
+		assert(report.decisions.length === 4, "every phase is explained");
+		fs.rmSync(dir, { recursive: true, force: true });
+	}
+
+	// -----------------------------------------------------------------------
+	// 3 (negative): default is run-only — capability given, default NOT flipped.
+	// -----------------------------------------------------------------------
+	section("default safety: without incremental, re-run does NOT reuse cross-run");
+	{
+		const dir = tmpDir();
+		const store = new CacheStore(dir);
+		const def: Taskflow = {
+			name: "incr-default-off",
+			phases: [{ id: "p", type: "agent", agent: "a", task: "work", final: true }],
+		} as Taskflow;
+		// No incremental flag anywhere → resolveCacheScope → run-only.
+		const scope = resolveCacheScope(undefined, def.incremental);
+		assert(scope === "run-only", "no incremental flag → run-only (default not flipped)");
+		const rec1: string[] = [];
+		await executeTaskflow(mkState(def, dir), { cwd: dir, agents: AGENTS, runTask: recordingRunner(rec1), cacheStore: store, cacheScopeDefault: scope });
+		const rec2: string[] = [];
+		await executeTaskflow(mkState(def, dir), { cwd: dir, agents: AGENTS, runTask: recordingRunner(rec2), cacheStore: store, cacheScopeDefault: scope });
+		assert(rec1.length === 1 && rec2.length === 1, "run-only re-executes every run (no silent cross-run reuse)");
+		fs.rmSync(dir, { recursive: true, force: true });
+	}
+
+	console.log("");
+	if (failures === 0) {
+		console.log(C.ok(C.bold("All Incremental-Recompute E2E checks passed.")));
+	} else {
+		console.log(C.bad(C.bold(`${failures} Incremental-Recompute E2E check(s) FAILED.`)));
+		process.exit(1);
+	}
+}
+
+main().catch((e) => {
+	console.error(e);
+	process.exit(1);
+});

From 603eb34330625484fc6a28d39bf32ed3cbb549b2 Mon Sep 17 00:00:00 2001
From: heggria <bshengtao@gmail.com>
Date: Sat, 27 Jun 2026 14:33:43 +0800
Subject: [PATCH 4/5] feat(cache): add incremental flag and reuse reporting

Add a flow-level and invocation-level `incremental` flag that defaults every
phase to cross-run caching (scope:"cross-run"), so re-running a flow reuses
unchanged phases without annotating each phase. The invocation arg wins over
the flow field; per-phase cache settings and the cross-run-blocked types
(gate/approval/loop/tournament) still take precedence; default stays run-only.

Surface the effect: the end-of-run cache report and /tf recompute now show
reused-vs-executed counts plus a per-phase "Why" trace (rerun/cutoff/reused/
failed with causedBy). Dollar figures are reported only for within-run reuse;
cross-run hits are counted without inventing a saving.

Also strip retry/concurrency/final from phaseFingerprint (none changes a
phase's output, so a no-op config tweak no longer falsely invalidates), and
fall back to whole-flow invalidation for join:"any" phases (they may read
refs outside their declared dependsOn).

Tests: add incremental-flag and reuse-summary suites; extend cache-phasefp
and recompute coverage.
---
 extensions/flowir/phasefp.ts  |  38 ++++++++---
 extensions/index.ts           |  71 ++++++++++++++++-----
 extensions/schema.ts          |   6 ++
 test/cache-phasefp.test.ts    |  38 +++++++++++
 test/incremental-flag.test.ts |  33 ++++++++++
 test/recompute.test.ts        |  88 +++++++++++++++++++++++++
 test/reuse-summary.test.ts    | 117 ++++++++++++++++++++++++++++++++++
 7 files changed, 365 insertions(+), 26 deletions(-)
 create mode 100644 test/incremental-flag.test.ts
 create mode 100644 test/reuse-summary.test.ts

diff --git a/extensions/flowir/phasefp.ts b/extensions/flowir/phasefp.ts
index a7f3c46..02eda69 100644
--- a/extensions/flowir/phasefp.ts
+++ b/extensions/flowir/phasefp.ts
@@ -23,13 +23,18 @@
  *      sub-structure is resolved at runtime (inline `def`) or from a saved
  *      flow (`use`) and is not statically visible here. Editing the saved
  *      sub-flow would not move this phase's sub-fingerprint.
+ *   3. **`join: "any"` phase** (`phase.join === "any"`): validation exempts it
+ *      from the `{steps.X}`-must-be-in-`dependsOn` check, so it may read
+ *      phases outside its static closure. The closure under-approximates its
+ *      real reads, so fall back to whole-flow invalidation.
  *
- * `cache` (the policy object) is the ONLY field stripped from each phase
- * before hashing: its sub-fields (`scope`/`ttl`/`fingerprint`) are folded into
- * the cache key through other paths (`cc.scope` gates the lookup, `cc.ttlMs`
- * governs expiry, `cc.fingerprint` is in the key tail). Every other `Phase`
- * field is hashed. `PhaseSchema` uses `additionalProperties: false`, so no
- * surprise field can be missed.
+ * `cache`, `retry`, `concurrency`, and `final` are stripped from each phase
+ * before hashing: none of them changes the subagent's OUTPUT (they are policy,
+ * execution mechanics, or result selection). `cache`'s sub-fields
+ * (`scope`/`ttl`/`fingerprint`) reach the cache key through other paths
+ * (`cc.scope` gates the lookup, `cc.ttlMs` governs expiry, `cc.fingerprint` is
+ * in the key tail). Every other `Phase` field is hashed. `PhaseSchema` uses
+ * `additionalProperties: false`, so no surprise field can be missed.
  *
  * Pure + async (Web Crypto via `hashCanonical`). Reuses the vendored
  * `canonicalJson`/`hashCanonical` (byte-identical to overstory's contract) so
@@ -42,10 +47,17 @@
 import { transitiveDependencies, type Phase, type Taskflow } from "../schema.ts";
 import { canonicalJson, hashCanonical } from "./hash.ts";
 
-/** Policy field stripped before hashing (its sub-fields reach the key via
- *  `cc.scope` / `cc.ttlMs` / `cc.fingerprint` — folding them here would be
- *  recursive and redundant). This is the ONLY field stripped. */
-const PHASE_FP_STRIP = ["cache"] as const;
+/** Fields stripped before hashing because they do NOT affect a phase's
+ *  subagent OUTPUT, only execution mechanics or result selection — folding
+ *  them in would cause false cache invalidation on a no-op config change:
+ *   - `cache`: policy object; its sub-fields reach the key via
+ *     `cc.scope`/`cc.ttlMs`/`cc.fingerprint`.
+ *   - `retry`: retry/backoff is execution mechanics; a successful phase
+ *     produces the same output regardless of how many attempts it took.
+ *   - `concurrency`: fan-out parallelism; does not change any item's output.
+ *   - `final`: marks which phase's output is the flow result; does not change
+ *     the phase's own output. */
+const PHASE_FP_STRIP = ["cache", "retry", "concurrency", "final"] as const;
 
 /** Clone a phase into a plain record with policy fields removed. */
 function stripPolicy(phase: Phase): Record<string, unknown> {
@@ -74,6 +86,12 @@ export async function phaseFingerprint(def: Taskflow, phaseId: string): Promise<
 	// --- Soundness gate: fall back to whole-flow when static closure is unsafe. ---
 	// Flow-wide context sharing enables cross-sibling reads outside declared deps.
 	if (def.contextSharing === true) return undefined;
+	// A `join: "any"` phase may interpolate `{steps.X.*}` refs to phases OUTSIDE
+	// its declared dependsOn (validation deliberately exempts it — schema.ts), so
+	// the static closure under-approximates its real reads. Fall back to
+	// whole-flow invalidation rather than rely on the key tail alone (which would
+	// be an undocumented coupling). Safe, = pre-M6 behavior.
+	if (phase.join === "any") return undefined;
 
 	const closureIds = transitiveDependencies(phases, phaseId);
 	const closurePhases: Phase[] = [];
diff --git a/extensions/index.ts b/extensions/index.ts
index 984a63f..d87c39f 100644
--- a/extensions/index.ts
+++ b/extensions/index.ts
@@ -28,7 +28,7 @@ import { type AgentScope, discoverAgents, readSubagentSettings, shouldSyncBuilti
 import { renderRunResult, summarizeRun } from "./render.ts";
 import { RunHistoryComponent, type RunHistoryResult } from "./runs-view.ts";
 import { ApprovalViewComponent, type ApprovalChoice } from "./approval-view.ts";
-import { executeTaskflow, recomputeTaskflow, type ApprovalDecision, type ApprovalRequest, type RecomputeReport, type RuntimeDeps, type RuntimeResult } from "./runtime.ts";
+import { executeTaskflow, recomputeTaskflow, summarizeReuse, type ApprovalDecision, type ApprovalRequest, type RecomputeReport, type RuntimeDeps, type RuntimeResult } from "./runtime.ts";
 import { type UsageStats } from "./usage.ts";
 import { finalPhase, resolveArgs, type Taskflow, validateTaskflow, desugar, isShorthand } from "./schema.ts";
 import {
@@ -150,6 +150,12 @@ const TaskflowParams = Type.Object({
 			description: "Run in background (detached child process); return runId immediately. Status polled via store.",
 		}),
 	),
+	incremental: Type.Optional(
+		Type.Boolean({
+			description:
+				"For action=run: default every phase to cross-run caching so re-running the flow reuses unchanged phases across runs/sessions (incremental recompute). Overrides the flow's own `incremental` field. Per-phase cache settings and cross-run-blocked types (gate/approval/loop/tournament) still take precedence. Omit to use the flow's setting (default: run-only — fresh each run).",
+		}),
+	),
 });
 
 function formatFlowIR(ir: TaskflowIR): string {
@@ -225,6 +231,17 @@ function formatRecompute(r: RecomputeReport): string {
 		if (r.cutoff.length > 0) lines.push(`   → saved ${r.cutoff.length} re-execution(s).`);
 	}
 	lines.push(`✓ reused (outside frontier): ${r.reused.join(", ") || "—"}`);
+	// Per-phase "why" — the explainable-reactivity trace (like React DevTools
+	// telling you why each component re-rendered). Only shown when present.
+	if (r.decisions && r.decisions.length > 0) {
+		const glyph: Record<string, string> = { rerun: "▲", cutoff: "✂", reused: "✓", failed: "✗" };
+		lines.push("");
+		lines.push("Why:");
+		for (const d of r.decisions) {
+			const cause = d.causedBy && d.causedBy.length ? `  ← ${d.causedBy.join(", ")}` : "";
+			lines.push(`  ${glyph[d.outcome] ?? "•"} ${d.phaseId}: ${d.reason}${cause}`);
+		}
+	}
 	return lines.join("\n");
 }
 
@@ -242,6 +259,18 @@ function makeRunState(def: Taskflow, args: Record<string, unknown>, cwd: string)
 	};
 }
 
+/** Resolve the run-wide default cache scope from the incremental flags. The
+ *  invocation-level override (the `incremental` tool arg) wins; otherwise the
+ *  flow's own `incremental` field; otherwise the safe `run-only` default
+ *  (each run starts fresh — cross-run reuse is opt-in). Exported for testing. */
+export function resolveCacheScope(
+	incrementalOverride: boolean | undefined,
+	flowIncremental: boolean | undefined,
+): "cross-run" | "run-only" {
+	const on = typeof incrementalOverride === "boolean" ? incrementalOverride : flowIncremental;
+	return on === true ? "cross-run" : "run-only";
+}
+
 async function runFlow(
 	def: Taskflow,
 	args: Record<string, unknown>,
@@ -249,6 +278,9 @@ async function runFlow(
 	signal: AbortSignal | undefined,
 	onUpdate: ((p: AgentToolResult<TaskflowDetails>) => void) | undefined,
 	existing?: RunState,
+	// Invocation-level incremental override: when set, wins over def.incremental.
+	// undefined → fall back to the flow's own `incremental` field (default off).
+	incrementalOverride?: boolean,
 ): Promise<RuntimeResult> {
 	const state = existing ?? makeRunState(def, args, ctx.cwd);
 
@@ -374,11 +406,15 @@ async function runFlow(
 			persist: persistThrottled,
 			requestApproval,
 			loadFlow: (name: string) => getFlow(ctx.cwd, name)?.def,
-			// Cross-run cache is opt-in per phase (cache:{scope:"cross-run"}).
-			// Defaulting every real run to cross-run was reviewed out: it silently
-			// persists phase outputs and can serve stale results for phases whose
-			// agents read files at runtime (those files are not in the cache key).
-			cacheScopeDefault: "run-only",
+			// Cross-run cache is opt-in. By default a real run is `run-only` (fresh
+			// each run): defaulting every phase to cross-run silently persists
+			// outputs and can serve stale results for phases whose agents read files
+			// at runtime (those files are not in the cache key). A user opts in
+			// explicitly — the invocation `incremental` arg wins, else the flow's
+			// own `incremental` field, else the safe run-only default. All the
+			// soundness fallbacks (blocked types, per-phase fingerprint, shareContext)
+			// still apply per phase inside executePhase.
+			cacheScopeDefault: resolveCacheScope(incrementalOverride, def.incremental),
 		});
 		// Auto-report cache savings at the end of a real run so the user sees the
 		// M1-M5 effect without running a separate /tf command.
@@ -958,7 +994,7 @@ export default function (pi: ExtensionAPI) {
 				};
 			}
 
-			const result = await runFlow(def, args, ctx, signal, onUpdate as any);
+			const result = await runFlow(def, args, ctx, signal, onUpdate as any, undefined, params.incremental as boolean | undefined);
 			// Surface the validation warnings in the tool result so the model
 			// can acknowledge or fix them, and the user sees them in the chat.
 			if (v.warnings.length) {
@@ -1399,15 +1435,18 @@ function errorResult(action: string, message: string): ToolResult {
 	};
 }
 
-function formatCacheReport(state: RunState, totalUsage: UsageStats): string {
-	const cached = Object.values(state.phases).filter((p) => p.cacheHit === "cross-run");
-	if (cached.length === 0) return "";
-	// Honest reporting: we know these phases spent 0 tokens *this run* because
-	// they were served from cache. We do NOT estimate dollars/tokens "saved" —
-	// that requires guessing what a re-execution would have cost, and the mix of
-	// cheap vs expensive phases (tournament/loop) makes such a guess misleading.
-	const cachedTokens = cached.reduce((sum, p) => sum + ((p.usage?.input ?? 0) + (p.usage?.output ?? 0)), 0);
-	return `💾 ${cached.length} phase(s) reused from cross-run cache (${cachedTokens.toLocaleString()} tokens spent on them this run)`;
+function formatCacheReport(state: RunState, _totalUsage: UsageStats): string {
+	const r = summarizeReuse(state);
+	const reused = r.reusedRunOnly + r.reusedCrossRun;
+	if (reused === 0) return ""; // nothing reused — no incremental story to tell
+	// Honest framing: report reused-vs-executed counts, and a dollar figure only
+	// for within-run reuse (where the prior usage is preserved). Cross-run hits
+	// zero their usage, so their original cost is genuinely unknown — we say
+	// "reused" without inventing a savings number for them.
+	const parts: string[] = [`♻️ ${reused}/${r.done} phase(s) reused (${r.executed} executed this run)`];
+	if (r.savedUSD > 0) parts.push(`~$${r.savedUSD.toFixed(4)} of re-execution avoided`);
+	if (r.reusedCrossRun > 0) parts.push(`${r.reusedCrossRun} from cross-run cache`);
+	return parts.join(" · ");
 }
 
 function finalResult(action: string, result: RuntimeResult): ToolResult {
diff --git a/extensions/schema.ts b/extensions/schema.ts
index 8e80a17..2154f2c 100644
--- a/extensions/schema.ts
+++ b/extensions/schema.ts
@@ -284,6 +284,12 @@ export const TaskflowSchema = Type.Object(
 					"Enable the Shared Context Tree for ALL phases in this flow (shorthand for setting shareContext on every phase). Default false.",
 			}),
 		),
+		incremental: Type.Optional(
+			Type.Boolean({
+				description:
+					"Default every phase to cross-run caching (scope:'cross-run') so re-running this flow reuses unchanged phases across runs/sessions. Equivalent to setting cache:{scope:'cross-run'} on every phase; per-phase cache settings and the cross-run-blocked types (gate/approval/loop/tournament) still take precedence. Default false (run-only — each run starts fresh unless a phase opts in). A run-time `incremental` argument overrides this.",
+			}),
+		),
 		phases: Type.Array(PhaseSchema, { minItems: 1, description: "Ordered phase definitions (DAG via dependsOn)" }),
 	},
 	{ additionalProperties: false },
diff --git a/test/cache-phasefp.test.ts b/test/cache-phasefp.test.ts
index ce23446..4ae84b6 100644
--- a/test/cache-phasefp.test.ts
+++ b/test/cache-phasefp.test.ts
@@ -272,3 +272,41 @@ test("phasefp: shareContext falls back to whole-flow invalidation", async () =>
 	assert.equal(r2.state.phases.B.cacheHit, undefined, "B missed (its task changed)");
 	fs.rmSync(dir, { recursive: true, force: true });
 });
+
+// ---------------------------------------------------------------------------
+// Hardening (risk review M-1 / L-1 / L-2): join:"any" soundness fallback, and
+// operational/result-selection fields stripped to avoid false invalidation.
+// ---------------------------------------------------------------------------
+
+test("phaseFingerprint: a join:any phase falls back to whole-flow (soundness)", async () => {
+	// C declares dependsOn [B] with join:any but interpolates {steps.A.output}.
+	// Its real reads escape the static closure, so per-phase diffing is unsound →
+	// fingerprint must be undefined (caller uses whole-flow flowDefHash).
+	const def: Taskflow = {
+		name: "join-any",
+		phases: [
+			{ id: "A", type: "agent", agent: "a", task: "produce" },
+			{ id: "B", type: "agent", agent: "a", task: "fast" },
+			{ id: "C", type: "agent", agent: "a", task: "use {steps.A.output}", dependsOn: ["B"], join: "any", final: true },
+		],
+	} as Taskflow;
+	assert.equal(await phaseFingerprint(def, "C"), undefined, "join:any → fallback");
+	// A and B are ordinary phases → still get a precise fingerprint.
+	assert.ok(await phaseFingerprint(def, "A"));
+	assert.ok(await phaseFingerprint(def, "B"));
+});
+
+test("phaseFingerprint: retry / concurrency / final do NOT move the sub-fingerprint", async () => {
+	const mk = (extra: Record<string, unknown>): Taskflow => ({
+		name: "ops-inv",
+		phases: [
+			{ id: "p", type: "agent", agent: "a", task: "t", cache: { scope: "cross-run" }, ...extra },
+			{ id: "q", type: "agent", agent: "a", task: "u {steps.p.output}", dependsOn: ["p"], final: true },
+		],
+	}) as Taskflow;
+	const base = await phaseFingerprint(mk({ final: true }), "p");
+	// Adding retry/concurrency, or moving `final`, must not perturb p's output hash.
+	assert.equal(await phaseFingerprint(mk({ final: true, retry: { max: 3 } }), "p"), base, "retry stripped");
+	assert.equal(await phaseFingerprint(mk({ final: true, concurrency: 4 }), "p"), base, "concurrency stripped");
+	assert.equal(await phaseFingerprint(mk({}), "p"), base, "final marker stripped");
+});
diff --git a/test/incremental-flag.test.ts b/test/incremental-flag.test.ts
new file mode 100644
index 0000000..38f9ddd
--- /dev/null
+++ b/test/incremental-flag.test.ts
@@ -0,0 +1,33 @@
+import assert from "node:assert/strict";
+import { test } from "node:test";
+import { resolveCacheScope } from "../extensions/index.ts";
+
+// The `incremental` flag (flow-level def.incremental, or the invocation-level
+// override) maps to the run-wide default cache scope. Default is the safe
+// run-only (cross-run reuse is opt-in); the invocation override wins over the
+// flow setting. This pins the C-option contract: capability given, default
+// NOT flipped.
+
+test("resolveCacheScope: default (neither set) is run-only — safe, no flip", () => {
+	assert.equal(resolveCacheScope(undefined, undefined), "run-only");
+});
+
+test("resolveCacheScope: flow.incremental=true opts the whole flow into cross-run", () => {
+	assert.equal(resolveCacheScope(undefined, true), "cross-run");
+});
+
+test("resolveCacheScope: flow.incremental=false stays run-only", () => {
+	assert.equal(resolveCacheScope(undefined, false), "run-only");
+});
+
+test("resolveCacheScope: invocation override wins over the flow setting", () => {
+	// override=true beats flow=false
+	assert.equal(resolveCacheScope(true, false), "cross-run");
+	// override=false beats flow=true (lets a user force a fresh run)
+	assert.equal(resolveCacheScope(false, true), "run-only");
+});
+
+test("resolveCacheScope: override undefined falls back to the flow setting", () => {
+	assert.equal(resolveCacheScope(undefined, true), "cross-run");
+	assert.equal(resolveCacheScope(undefined, false), "run-only");
+});
diff --git a/test/recompute.test.ts b/test/recompute.test.ts
index 1223bf4..5649f14 100644
--- a/test/recompute.test.ts
+++ b/test/recompute.test.ts
@@ -415,3 +415,91 @@ test("recompute: flagship — re-seed with an unchanged output cuts off the whol
 	assert.deepEqual([...report.cutoff].sort(), ["audit", "report"], "the downstream is cut off transitively");
 	assert.equal(record.length, executedBefore + 1, "exactly one re-execution (the seed); downstream hit cache");
 });
+
+// ---------------------------------------------------------------------------
+// Per-phase decision trace (the "explainable reactivity" AC): every phase in
+// the report carries a reason, and a rerun/cutoff is attributed to the
+// upstream(s) that caused it.
+// ---------------------------------------------------------------------------
+
+test("recompute: decision trace attributes each rerun to the changed upstream", async () => {
+	const record: string[] = [];
+	let scoutVersion = "V1";
+	const def: Taskflow = {
+		name: "trace-cascade",
+		concurrency: 1,
+		phases: [
+			{ id: "scout", type: "agent", agent: "a", task: "scan" },
+			{ id: "independent", type: "agent", agent: "a", task: "unrelated" },
+			{ id: "audit", type: "agent", agent: "a", task: "audit {steps.scout.output}", dependsOn: ["scout"] },
+			{
+				id: "report",
+				type: "agent",
+				agent: "a",
+				task: "report {steps.audit.output} {steps.independent.output}",
+				dependsOn: ["audit", "independent"],
+				final: true,
+			},
+		],
+	} as Taskflow;
+	const deps = baseDeps(mockRunner((t) => (t === "scan" ? `out:${scoutVersion}` : `out:${t}`), record));
+	const state = mkState(def);
+	await executeTaskflow(state, deps);
+
+	scoutVersion = "V2";
+	const { report } = await recomputeTaskflow(state, deps, ["scout"], { dryRun: false });
+	const byId = Object.fromEntries(report.decisions.map((d) => [d.phaseId, d]));
+
+	assert.equal(byId.scout.outcome, "rerun");
+	assert.match(byId.scout.reason, /seed/);
+	assert.equal(byId.audit.outcome, "rerun");
+	assert.deepEqual(byId.audit.causedBy, ["scout"], "audit's rerun is attributed to scout");
+	assert.deepEqual(byId.report.causedBy, ["audit"], "report's rerun is attributed to audit, not scout");
+	assert.equal(byId.independent.outcome, "reused");
+	assert.match(byId.independent.reason, /not reachable/);
+	// Every phase is explained.
+	assert.equal(report.decisions.length, 4);
+});
+
+test("recompute: decision trace marks early-cutoff with its (unchanged) upstream cause", async () => {
+	const record: string[] = [];
+	const def: Taskflow = {
+		name: "trace-cutoff",
+		concurrency: 1,
+		phases: [
+			{ id: "scout", type: "agent", agent: "a", task: "scan" },
+			{ id: "audit", type: "agent", agent: "a", task: "audit {steps.scout.output}", dependsOn: ["scout"] },
+			{ id: "report", type: "agent", agent: "a", task: "report {steps.audit.output}", dependsOn: ["audit"], final: true },
+		],
+	} as Taskflow;
+	// scout's output is stable across re-seeds → downstream cuts off.
+	const deps = baseDeps(mockRunner((t) => (t === "scan" ? "out:STABLE" : `out:${t}`), record));
+	const state = mkState(def);
+	await executeTaskflow(state, deps);
+
+	const { report } = await recomputeTaskflow(state, deps, ["scout"], { dryRun: false });
+	const byId = Object.fromEntries(report.decisions.map((d) => [d.phaseId, d]));
+
+	assert.equal(byId.scout.outcome, "rerun", "seed always re-runs");
+	assert.equal(byId.audit.outcome, "cutoff");
+	assert.match(byId.audit.reason, /identical output|unchanged/);
+	assert.deepEqual(byId.audit.causedBy, ["scout"]);
+	assert.equal(byId.report.outcome, "cutoff");
+});
+
+test("recompute: dry-run decision trace explains the worst-case frontier", async () => {
+	const record: string[] = [];
+	const deps = baseDeps(mockRunner((t) => `out:${t}`, record));
+	const state = mkState(DEF);
+	await executeTaskflow(state, deps);
+
+	const { report } = await recomputeTaskflow(state, deps, ["scout"], { dryRun: true });
+	const byId = Object.fromEntries(report.decisions.map((d) => [d.phaseId, d]));
+
+	assert.equal(byId.scout.outcome, "rerun");
+	assert.match(byId.scout.reason, /seed/);
+	// audit + report are in the frontier → "may re-run", attributed upstream.
+	assert.equal(byId.audit.outcome, "rerun");
+	assert.match(byId.audit.reason, /may re-run|stale frontier/);
+	assert.equal(record.length, 3, "dry-run did not execute anything beyond the initial run");
+});
diff --git a/test/reuse-summary.test.ts b/test/reuse-summary.test.ts
new file mode 100644
index 0000000..dd387e0
--- /dev/null
+++ b/test/reuse-summary.test.ts
@@ -0,0 +1,117 @@
+import assert from "node:assert/strict";
+import * as fs from "node:fs";
+import * as os from "node:os";
+import * as path from "node:path";
+import { test } from "node:test";
+import type { AgentConfig } from "../extensions/agents.ts";
+import { CacheStore } from "../extensions/cache.ts";
+import { executeTaskflow, summarizeReuse, type RuntimeDeps } from "../extensions/runtime.ts";
+import type { RunResult, RunOptions } from "../extensions/runner.ts";
+import type { Taskflow } from "../extensions/schema.ts";
+import type { RunState } from "../extensions/store.ts";
+import { emptyUsage } from "../extensions/usage.ts";
+
+// summarizeReuse: the incremental-reuse accounting behind the run summary.
+// A phase counts as reused iff it carries a `cacheHit` marker (within-run
+// resume → "run-only"; cross-run store → "cross-run").
+
+const AGENTS: AgentConfig[] = [
+	{ name: "a", description: "test agent", systemPrompt: "", source: "user", filePath: "" },
+];
+
+function tmpDir(): string {
+	return fs.mkdtempSync(path.join(os.tmpdir(), "tf-reuse-"));
+}
+
+function mkState(def: Taskflow, cwd: string): RunState {
+	return {
+		runId: `run-${Math.random().toString(36).slice(2, 8)}`,
+		flowName: def.name,
+		def,
+		args: {},
+		status: "running",
+		phases: {},
+		createdAt: Date.now(),
+		updatedAt: Date.now(),
+		cwd,
+	};
+}
+
+function runner(): RuntimeDeps["runTask"] {
+	return async (_cwd, _agents, agentName, task, _o: RunOptions): Promise<RunResult> => ({
+		agent: agentName,
+		task,
+		exitCode: 0,
+		output: `out:${task}`,
+		stderr: "",
+		usage: { ...emptyUsage(), output: 10, cost: 0.002, turns: 1 },
+		stopReason: "end",
+	});
+}
+
+const CHAIN: Taskflow = {
+	name: "reuse-chain",
+	phases: [
+		{ id: "scout", type: "agent", agent: "a", task: "scan" },
+		{ id: "audit", type: "agent", agent: "a", task: "audit {steps.scout.output}", dependsOn: ["scout"] },
+		{ id: "report", type: "agent", agent: "a", task: "report {steps.audit.output}", dependsOn: ["audit"], final: true },
+	],
+} as Taskflow;
+
+test("summarizeReuse: a first run executes every phase, reuses none", async () => {
+	const dir = tmpDir();
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: runner() };
+	const r = await executeTaskflow(mkState(CHAIN, dir), deps);
+
+	const s = summarizeReuse(r.state);
+	assert.equal(s.executed, 3, "all three phases executed");
+	assert.equal(s.reusedRunOnly, 0);
+	assert.equal(s.reusedCrossRun, 0);
+	assert.equal(s.done, 3);
+	assert.equal(s.savedUSD, 0, "nothing reused → nothing saved");
+	assert.deepEqual(r.reuse, s, "RuntimeResult.reuse matches summarizeReuse(state)");
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+test("summarizeReuse: resuming a completed run reuses every phase within-run (savedUSD > 0)", async () => {
+	const dir = tmpDir();
+	const deps: RuntimeDeps = { cwd: dir, agents: AGENTS, runTask: runner() };
+	const state = mkState(CHAIN, dir);
+	await executeTaskflow(state, deps);
+
+	// Re-run the SAME state object: every phase is already `done` with a matching
+	// inputHash → the within-run resume path serves each from its prior.
+	const r2 = await executeTaskflow(state, deps);
+	const s = summarizeReuse(r2.state);
+
+	assert.equal(s.executed, 0, "nothing re-executed on resume");
+	assert.equal(s.reusedRunOnly, 3, "all three reused within-run");
+	assert.equal(s.reusedCrossRun, 0);
+	assert.equal(s.done, 3);
+	// Each phase preserved its prior usage (cost 0.002) → 3 × 0.002 saved.
+	assert.ok(Math.abs(s.savedUSD - 0.006) < 1e-9, `savedUSD should be ~0.006, got ${s.savedUSD}`);
+	fs.rmSync(dir, { recursive: true, force: true });
+});
+
+test("summarizeReuse: a second run under cross-run cache counts cross-run reuse", async () => {
+	const dir = tmpDir();
+	const store = new CacheStore(dir);
+	const deps: RuntimeDeps = {
+		cwd: dir,
+		agents: AGENTS,
+		runTask: runner(),
+		cacheStore: store,
+		cacheScopeDefault: "cross-run",
+	};
+	await executeTaskflow(mkState(CHAIN, dir), deps);
+	// A fresh state (new runId) re-running the same flow hits the cross-run store.
+	const r2 = await executeTaskflow(mkState(CHAIN, dir), deps);
+	const s = summarizeReuse(r2.state);
+
+	assert.equal(s.reusedCrossRun, 3, "all three restored from cross-run cache");
+	assert.equal(s.executed, 0, "nothing executed the second run");
+	// Cross-run hits zero their usage → original cost not recoverable.
+	assert.equal(s.savedUSD, 0, "cross-run reuse does not claim a dollar figure");
+	assert.equal(s.done, 3);
+	fs.rmSync(dir, { recursive: true, force: true });
+});

From 74413929fee2fb534ce617890c17030177e8a1bc Mon Sep 17 00:00:00 2001
From: heggria <bshengtao@gmail.com>
Date: Sat, 27 Jun 2026 14:33:49 +0800
Subject: [PATCH 5/5] =?UTF-8?q?chore(release):=20v0.0.28=20=E2=80=94=20per?=
 =?UTF-8?q?-phase=20+=20per-item=20granular=20cache=20reuse?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Bump to 0.0.28. Document the granular-reuse release: per-phase structural
sub-fingerprint (v3:phasefp), per-item map caching, the incremental flag, and
reuse reporting. Refresh README test counts (804 -> 846 across 46 files) and
add per-item map caching to the headline. Document the incremental flag and
its precedence in the taskflow skill.
---
 CHANGELOG.md                     | 54 ++++++++++++++++++++++++++++++++
 README.md                        |  6 ++--
 package.json                     |  2 +-
 skills/taskflow/SKILL.md         |  2 +-
 skills/taskflow/configuration.md | 22 +++++++++++++
 5 files changed, 81 insertions(+), 5 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index affd152..f92560d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,60 @@
 
 All notable changes to pi-taskflow are documented here. This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) format.
 
+## [0.0.28] — 2026-06-27
+
+> Granular-reuse release: **incremental recompute goes from whole-flow to
+> per-phase and per-item.** v0.0.27 *proved* the recompute cost win; this
+> release makes that win far larger and easier to opt into. Editing one phase
+> now invalidates only that phase and its transitive dependents (a sibling keeps
+> its cache hit), a `map` phase re-executes only the items that actually changed,
+> and a single `incremental` flag flips a whole flow into cross-run reuse without
+> annotating every phase.
+
+### Added
+- **Per-phase structural sub-fingerprint (`v3:phasefp`).** The cache key now
+  folds a per-phase fingerprint — the phase plus its transitive `dependsOn ∪ from`
+  closure — instead of the whole-flow `v2:flowdef` hash. Editing phase B
+  invalidates only B and its dependents; an independent sibling A keeps its hit.
+  `cacheKeys` emits a 4-tier read ladder (`v3:phasefp` write → `v2:flowdef` →
+  bare flowdef → legacy, all read-only) so the upgrade is additive — no
+  miss-storm for unchanged flows. Fail-open: any per-phase error degrades that
+  phase to the whole-flow hash. Soundness fallback to whole-flow when per-phase
+  invalidation can't be statically guaranteed (flow-wide `contextSharing`, any
+  `shareContext` phase in the closure, `join: "any"`, or sub-flow inner phases).
+  (`extensions/flowir/phasefp.ts`, `test/cache-phasefp.test.ts` — 11 tests.)
+- **Per-item cross-run caching for `map` phases.** When one of N items changes
+  between runs, only that item re-executes (N−1 cache hits) while the whole-map
+  fast path and every soundness fallback stay intact. Per-item keys omit the
+  structural fingerprint (which hashes the whole `over` source) so changing one
+  item no longer moves every key at once; they fold `[phase.id, it.agent, model,
+  it.task]` + the world-state tail, so task/agent/upstream/world changes still
+  invalidate the right items. Disabled (whole-map only) under run-only/off scope,
+  `shareContext`/flow-wide `contextSharing`, or inside a runtime-generated
+  sub-flow. (`test/cache-peritem.test.ts` — 11 tests.)
+- **`incremental` flag** — flow-level (`TaskflowSchema.incremental`) and
+  invocation-level (`run` tool arg). Defaults every phase to `scope:"cross-run"`
+  so re-running a flow reuses unchanged phases across runs/sessions, without
+  annotating each phase. The invocation arg wins over the flow field; per-phase
+  cache settings and the cross-run-blocked types (gate/approval/loop/tournament)
+  still take precedence; default remains the safe `run-only` (fresh each run).
+  (`resolveCacheScope` in `extensions/index.ts`, `test/incremental-flag.test.ts`.)
+- **Reuse reporting.** The end-of-run cache report and `/tf recompute` now show
+  reused-vs-executed counts and a per-phase "Why" trace (the explainable-
+  reactivity view: `▲ rerun / ✂ cutoff / ✓ reused / ✗ failed`, with `← causedBy`).
+  Dollar figures are reported only for within-run reuse, where the prior usage is
+  preserved; cross-run hits are counted but never attributed an invented saving.
+  (`summarizeReuse` / `RecomputeDecision` in `extensions/runtime.ts`,
+  `test/reuse-summary.test.ts`.)
+- Tests: 804 → 846 (+42).
+
+### Changed
+- **`phaseFingerprint` strips more policy fields** (`cache`, `retry`,
+  `concurrency`, `final`): none changes a phase's subagent *output*, so a no-op
+  config tweak no longer causes false cache invalidation.
+- **README** test count and feature line refreshed (804 → 846 across 46 files);
+  `per-item map caching` added to the headline capabilities.
+
 ## [0.0.27] — 2026-06-25
 
 > Evidence release: **the incremental-recompute cost win is now proven, not
diff --git a/README.md b/README.md
index 34e6d21..86dcc34 100644
--- a/README.md
+++ b/README.md
@@ -8,7 +8,7 @@
   <a href="./LICENSE"><img src="https://img.shields.io/badge/license-MIT-43D9AD?style=flat-square" alt="MIT license"></a>
   <a href="#whats-inside"><img src="https://img.shields.io/badge/runtime%20deps-0-43D9AD?style=flat-square" alt="zero runtime dependencies"></a>
   <a href="https://github.com/heggria/pi-taskflow/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/heggria/pi-taskflow/ci.yml?branch=main&style=flat-square&label=CI" alt="CI status"></a>
-  <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-804-6E8BFF?style=flat-square" alt="804 tests"></a>
+  <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-846-6E8BFF?style=flat-square" alt="846 tests"></a>
   <a href="#whats-inside"><img src="https://img.shields.io/badge/dogfooded-%E2%9C%93-43D9AD?style=flat-square" alt="dogfooded"></a>
   <a href="https://pi.dev"><img src="https://img.shields.io/badge/for-Pi%20coding%20agent-B692FF?style=flat-square" alt="for the Pi coding agent"></a>
 </p>
@@ -728,12 +728,12 @@ Copy one into `.pi/taskflows/<name>.json` (or `~/.pi/agent/taskflows/`) and it r
 
 <div align="center">
 
-**0 runtime dependencies** · **804 tests** · **9 phase types** · **shared context tree** · **cross-session resume** · **cross-run memoization** · **incremental recompute** · **FlowIR compile seam** · **detached execution** · **`compile` Mermaid renderer** · **~9k LOC runtime**
+**0 runtime dependencies** · **846 tests** · **9 phase types** · **shared context tree** · **cross-session resume** · **cross-run memoization** · **per-item map caching** · **incremental recompute** · **FlowIR compile seam** · **detached execution** · **`compile` Mermaid renderer** · **~9k LOC runtime**
 
 </div>
 
 - **Zero runtime dependencies.** No `dependencies` field — the runtime is built entirely on Node built-ins (`fs` / `path` / `os` / `child_process` / `crypto`). The file lock is `fs.openSync("wx")`, not a third-party library.
-- **804 tests across 42 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), backward-compatible cache-key migration (3-tier legacy fallback), the FlowIR compile seam (determinism, declared-plane synthesis), incremental recompute (early-cutoff propagation, partial cascade strictly < full, observed ∪ declared union frontier), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, the shared context tree (blackboard reuse, supervision spawn, subflow validation/nesting), workspace isolation (temp/dedicated/worktree lifecycle, fail-open degrade, dynamic-flow rejection), dynamic sub-flow security hardening, detached execution (PID persistence, stale detection, crash→failed, resume after failure), live run-history refresh, callback isolation, the idle watchdog, model-role init config, parseModelFromLabel with parenthesized-model-name regression, and multi-fence `safeParse` recovery, plus the `compile` Mermaid renderer (id-collision disambiguation, markdown-injection hardening, and full verify-overlay category coverage).
+- **846 tests across 46 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), backward-compatible cache-key migration (4-tier legacy fallback), per-phase structural sub-fingerprint (v3:phasefp — editing one phase invalidates only it and its dependents), per-item map caching (one changed item re-executes, N−1 cache hits), the `incremental` flag (run-wide cross-run default), reuse reporting, the FlowIR compile seam (determinism, declared-plane synthesis), incremental recompute (early-cutoff propagation, partial cascade strictly < full, observed ∪ declared union frontier), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, the shared context tree (blackboard reuse, supervision spawn, subflow validation/nesting), workspace isolation (temp/dedicated/worktree lifecycle, fail-open degrade, dynamic-flow rejection), dynamic sub-flow security hardening, detached execution (PID persistence, stale detection, crash→failed, resume after failure), live run-history refresh, callback isolation, the idle watchdog, model-role init config, parseModelFromLabel with parenthesized-model-name regression, and multi-fence `safeParse` recovery, plus the `compile` Mermaid renderer (id-collision disambiguation, markdown-injection hardening, and full verify-overlay category coverage).
 - **Hardened by design.** Path-traversal defense (lexical + `realpath` containment check), runId validation, HTML/error sanitization, atomic writes, stale-lock stealing via `rename`, and an idle watchdog that kills wedged subagents (SIGTERM → SIGKILL after 5 minutes of silence). Dynamic sub-flows additionally get breadth caps, `cwd` containment, budget clamping, nesting depth caps, and prototype-pollution defense.
 - **Dogfooded.** Every new feature has to survive the project's own `self-improve` taskflow before it ships.
 
diff --git a/package.json b/package.json
index d520ccf..d89c42f 100644
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "pi-taskflow",
-  "version": "0.0.27",
+  "version": "0.0.28",
   "description": "A declarative, verifiable graph of task nodes for the Pi coding agent — not a workflow you script, but a DAG you declare: statically verified before it runs, with dynamic fan-out, gates, isolated subagent context, resumable runs, and saveable commands.",
   "keywords": [
     "pi-package",
diff --git a/skills/taskflow/SKILL.md b/skills/taskflow/SKILL.md
index cd10531..a8863f1 100644
--- a/skills/taskflow/SKILL.md
+++ b/skills/taskflow/SKILL.md
@@ -549,7 +549,7 @@ Quick reference:
 
 - **Flow:** `name`, `description`, `concurrency` (default 8), `budget` (`maxUSD`/`maxTokens`), `agentScope` (user|project|both), `args`, `strictInterpolation`.
 - **Phase:** `model`, `thinking`, `tools` (whitelist), `cwd`, `output:"json"`, `concurrency` (map/parallel fan-out), `when`, `join` (all|any), `retry`, `use`/`with` (flow), `optional` (fail-soft — a failed/blocked phase won't abort the run), `final`.
-- **Cross-run caching:** add `cache: { "scope": "cross-run" }` to a phase to memoize its output across runs (same input → instant reuse, zero tokens). See `configuration.md` for `ttl`, `fingerprint` (git/glob/file/env invalidation), and scope options.
+- **Cross-run caching:** add `cache: { "scope": "cross-run" }` to a phase to memoize its output across runs (same input → instant reuse, zero tokens), or set `incremental: true` at the flow level (or pass `incremental: true` to `run`) to default every phase to cross-run reuse. See `configuration.md` for `ttl`, `fingerprint` (git/glob/file/env invalidation), scope options, and the `incremental` precedence rules.
 - **Precedence (model/thinking/tools):** phase value → agent frontmatter (resolved via `modelRoles`) → global/default.
 - **Concurrency:** same-layer phases use `flow.concurrency`; a `map`/`parallel` phase uses `phase.concurrency ?? flow.concurrency ?? 8`.
 
diff --git a/skills/taskflow/configuration.md b/skills/taskflow/configuration.md
index 22fa9f9..7476933 100644
--- a/skills/taskflow/configuration.md
+++ b/skills/taskflow/configuration.md
@@ -283,6 +283,28 @@ for the design.
 | `cross-run` | Reuse an identical-input result from **any** prior run (the persistent store). |
 | `off` | Never reuse, even within a run (force re-execution every time). |
 
+### Flow-wide opt-in: `incremental`
+
+Rather than annotating every phase with `cache: { "scope": "cross-run" }`, set
+`incremental: true` at the **flow** level (or pass `incremental: true` as the
+`run` tool argument) to default *every* phase to cross-run reuse:
+
+```jsonc
+{
+  "name": "audit",
+  "incremental": true,          // ← every phase defaults to scope:"cross-run"
+  "phases": [ /* ... */ ]
+}
+```
+
+Precedence: the invocation `incremental` argument wins over the flow's
+`incremental` field, which is in turn overridden by any **per-phase** `cache`
+setting. The cross-run-blocked phase types (`gate`/`approval`/`loop`/
+`tournament`) and all per-phase soundness fallbacks still apply. The default
+remains `run-only` (each run starts fresh unless something opts in), because
+cross-run reuse silently persists outputs and can serve stale results for phases
+whose agents read files at runtime.
+
 ### `ttl` (cross-run only)
 
 Max age before a cross-run hit is treated as a miss: e.g. `"30m"`, `"6h"`, `"7d"`.