v0.0.28: per-phase + per-item granular cache reuse#2
Merged
Conversation
Replace the whole-flow v2:flowdef cache-key tier with a per-phase structural sub-fingerprint so editing phase B invalidates only B and its transitive dependents — independent sibling phase A keeps its cache hit. phaseFingerprint(def, phaseId) (extensions/flowir/phasefp.ts) hashes the phase plus its transitive dependsOn ∪ from closure, reusing the vendored canonicalJson + hashCanonical (byte-identical to overstory's contract). Only the policy field cache is stripped; every other Phase field is hashed. Soundness fallback: phaseFingerprint returns undefined (→ caller folds the whole-flow flowDefHash, preserving pre-M6 behavior) when per-phase invalidation cannot be statically guaranteed — contextSharing at flow level, any shareContext phase in the closure, or any flow phase in the closure. Sub-flow inner phases always use this fallback. cacheKeys now produces a 4-tier ladder: key (v3:phasefp, write) → v2Key (v2:flowdef, read-only) → bareKey (bare flowdef, read-only) → legacyKey (no flowdef, read-only). cachedPhase consults all four read-only on a miss; recordCache writes only key. This makes the M6 upgrade additive — no miss-storm for unchanged flows. phaseFingerprints computed once per run in runTaskflowLayers alongside flowDefHash, plumbed through RunState + PhaseCacheCtx. Fail-open: any per-phase error degrades that phase to the whole-flow hash. Tests: test/cache-phasefp.test.ts (11 tests — soundness gate, determinism, precise-diff win, transitive propagation, v2 fallback, cross-flow isolation, shareContext fallback). Updated cache-migration.test.ts (distinct 4-tier keys; structural-change test now scoped to p's closure) and runtime.test.ts resume tests to the v3 key shape.
Add per-item cross-run memoization to the map phase so that when one of N items changes between runs, only that item re-executes (N-1 cache hits) — while preserving the existing whole-map fast path and all soundness fallbacks. Mechanism: - runFanout accepts an optional perItem hook. Before spawning a subagent for an item, it consults cachedPhase with a per-item key; a hit returns a 0-token synthesized RunResult (stopReason "cache-hit") that flows through mergePhaseState as a normal successful item. Successful fresh items are recorded per-item for future runs. - Per-item keys fold [phase.id, it.agent, model, it.task] + the existing v3:phasefp/flowName/fingerprint/thinking/tools/preRead tail. Folding it.agent (arbiter fix) prevents a stale cross-agent hit when only phase.agent changes. - Whole-map lookup stays first (fast path); per-item engages only on a whole-map miss. A trailing whole-map record keeps the fast path warm. Soundness gates (per-item disabled -> whole-map only): - cross-run scope required (run-only/"off" have no persistent store) - shareContext / flow-wide contextSharing disabled (items may read sibling blackboard writes outside declared deps) - inside a runtime-generated sub-flow (def: frame — untrusted) - undefined phaseFingerprint is NOT a blocker (cacheKeys falls back to flowDefHash, which is stable for a fixed def) Correctness: - merged output labels are positionally aligned with over ([k/N] using results.length), budget-skipped items filtered to null; cache-hit items keep their positional slot - cached items contribute emptyUsage -> partial-hit cost == re-executed item only - failed and budget-skipped items are never recorded per-item - fail-open: any cache read/write error degrades to executing the item Backward-compat: pre-existing whole-map entries (any tier) still hit via cachedPhase's 4-tier read-only fallback; the whole-map key format is unchanged. Tests: new test/cache-peritem.test.ts (11 tests) covering the Test Matrix — partial reuse, positional alignment, duplicate sharing, shareContext/def-frame fallbacks, whole-map fast path, revert, usage/subProgress, failed/skipped non-caching, and the agent-invalidation arbiter fix.
Per-item cross-run cache keys for the `map` phase folded both `phaseFp` and `flowDefHash` (via the whole-phase `cc`). Both fingerprints hash the `over` array source, so when a literal or data-derived `over` changed ONE item between runs, EVERY per-item key moved at once — defeating partial reuse (all N items re-executed instead of just the changed one). Fix: build a per-item `ccPerItem` with BOTH `phaseFp` and `flowDefHash` set to `undefined`, and use it only for per-item key construction. A single item's output is fully specified by it.task (template + item/as value + upstream-output refs + args) + it.agent + model + thinking/tools/preRead + the world-state fingerprint; `over` only determines WHICH items exist, not WHAT any item computes. `flowName` is retained for cross-flow collision prevention. The whole-map key keeps the FULL cc (phaseFp + flowDefHash) so its fast path and any pre-existing whole-map entries are unchanged (backward compat). The perItem object now carries its own cc so the lookup and record paths in runFanout use the per-item variant consistently. Soundness is preserved: task-template, agent, model, as (via resolved it.task), upstream-output, and world-state changes all still invalidate the correct items. shareContext / def-frame / failed / budget-skipped fallbacks are unchanged. Tests: add a bug-reproduction test (literal over, change 1 of N items) that FAILS before the fix (counter 3 to 6) and PASSES after (3 to 4), plus literal-over soundness variants (task/agent/upstream change imply full re-exec) and whole-map fast-path + partial-hit + failed-item de-masking variants. Update the budget-skipped test key reconstruction to use the per-item cc (fingerprints omitted). Fix the e2e incremental suite map section (add output: json so the merged-output assertion holds).
Add a flow-level and invocation-level `incremental` flag that defaults every phase to cross-run caching (scope:"cross-run"), so re-running a flow reuses unchanged phases without annotating each phase. The invocation arg wins over the flow field; per-phase cache settings and the cross-run-blocked types (gate/approval/loop/tournament) still take precedence; default stays run-only. Surface the effect: the end-of-run cache report and /tf recompute now show reused-vs-executed counts plus a per-phase "Why" trace (rerun/cutoff/reused/ failed with causedBy). Dollar figures are reported only for within-run reuse; cross-run hits are counted without inventing a saving. Also strip retry/concurrency/final from phaseFingerprint (none changes a phase's output, so a no-op config tweak no longer falsely invalidates), and fall back to whole-flow invalidation for join:"any" phases (they may read refs outside their declared dependsOn). Tests: add incremental-flag and reuse-summary suites; extend cache-phasefp and recompute coverage.
Bump to 0.0.28. Document the granular-reuse release: per-phase structural sub-fingerprint (v3:phasefp), per-item map caching, the incremental flag, and reuse reporting. Refresh README test counts (804 -> 846 across 46 files) and add per-item map caching to the headline. Document the incremental flag and its precedence in the taskflow skill.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Granular-reuse release. v0.0.27 proved the incremental-recompute cost win; this release makes that win far larger and easier to opt into — invalidation drops from whole-flow to per-phase and per-item, with a single flag to flip a whole flow into cross-run reuse.
Changes
v3:phasefp) — editing one phase invalidates only it and its transitive dependents; an independent sibling keeps its cache hit. Emits a 4-tier read ladder (v3 write → v2/bare/legacy read-only) so the upgrade is additive (no miss-storm). Fail-open to whole-flow undercontextSharing,shareContext,join:"any", or sub-flow inner phases.mapcaching — when one of N items changes between runs, only that item re-executes (N−1 cache hits). Per-item keys omit the structural fingerprint (which hashes the wholeoversource) so changing one item no longer moves every key at once. Whole-map fast path and all soundness fallbacks preserved.incrementalflag — flow-level + invocation-level. Defaults every phase toscope:"cross-run"without per-phase annotation. Invocation arg wins over the flow field; per-phasecacheand cross-run-blocked types (gate/approval/loop/tournament) still take precedence; default stays saferun-only./tf recomputenow show reused-vs-executed counts and a per-phase "Why" trace (▲ rerun / ✂ cutoff / ✓ reused / ✗ failedwith← causedBy). Dollar figures only for within-run reuse; cross-run hits counted without an invented saving.phaseFingerprintstrips more policy fields (cache,retry,concurrency,final) — none changes a phase's output, so a no-op config tweak no longer falsely invalidates.Tests
cache-phasefp,cache-peritem,incremental-flag,reuse-summary.npm run typecheckclean,npm test846/846 green.Release
0.0.28, CHANGELOG entry added, README counts refreshed, taskflow skill docs updated, tagv0.0.28pushed.