Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,60 @@

All notable changes to pi-taskflow are documented here. This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) format.

## [0.0.28] — 2026-06-27

> Granular-reuse release: **incremental recompute goes from whole-flow to
> per-phase and per-item.** v0.0.27 *proved* the recompute cost win; this
> release makes that win far larger and easier to opt into. Editing one phase
> now invalidates only that phase and its transitive dependents (a sibling keeps
> its cache hit), a `map` phase re-executes only the items that actually changed,
> and a single `incremental` flag flips a whole flow into cross-run reuse without
> annotating every phase.

### Added
- **Per-phase structural sub-fingerprint (`v3:phasefp`).** The cache key now
folds a per-phase fingerprint — the phase plus its transitive `dependsOn ∪ from`
closure — instead of the whole-flow `v2:flowdef` hash. Editing phase B
invalidates only B and its dependents; an independent sibling A keeps its hit.
`cacheKeys` emits a 4-tier read ladder (`v3:phasefp` write → `v2:flowdef` →
bare flowdef → legacy, all read-only) so the upgrade is additive — no
miss-storm for unchanged flows. Fail-open: any per-phase error degrades that
phase to the whole-flow hash. Soundness fallback to whole-flow when per-phase
invalidation can't be statically guaranteed (flow-wide `contextSharing`, any
`shareContext` phase in the closure, `join: "any"`, or sub-flow inner phases).
(`extensions/flowir/phasefp.ts`, `test/cache-phasefp.test.ts` — 11 tests.)
- **Per-item cross-run caching for `map` phases.** When one of N items changes
between runs, only that item re-executes (N−1 cache hits) while the whole-map
fast path and every soundness fallback stay intact. Per-item keys omit the
structural fingerprint (which hashes the whole `over` source) so changing one
item no longer moves every key at once; they fold `[phase.id, it.agent, model,
it.task]` + the world-state tail, so task/agent/upstream/world changes still
invalidate the right items. Disabled (whole-map only) under run-only/off scope,
`shareContext`/flow-wide `contextSharing`, or inside a runtime-generated
sub-flow. (`test/cache-peritem.test.ts` — 11 tests.)
- **`incremental` flag** — flow-level (`TaskflowSchema.incremental`) and
invocation-level (`run` tool arg). Defaults every phase to `scope:"cross-run"`
so re-running a flow reuses unchanged phases across runs/sessions, without
annotating each phase. The invocation arg wins over the flow field; per-phase
cache settings and the cross-run-blocked types (gate/approval/loop/tournament)
still take precedence; default remains the safe `run-only` (fresh each run).
(`resolveCacheScope` in `extensions/index.ts`, `test/incremental-flag.test.ts`.)
- **Reuse reporting.** The end-of-run cache report and `/tf recompute` now show
reused-vs-executed counts and a per-phase "Why" trace (the explainable-
reactivity view: `▲ rerun / ✂ cutoff / ✓ reused / ✗ failed`, with `← causedBy`).
Dollar figures are reported only for within-run reuse, where the prior usage is
preserved; cross-run hits are counted but never attributed an invented saving.
(`summarizeReuse` / `RecomputeDecision` in `extensions/runtime.ts`,
`test/reuse-summary.test.ts`.)
- Tests: 804 → 846 (+42).

### Changed
- **`phaseFingerprint` strips more policy fields** (`cache`, `retry`,
`concurrency`, `final`): none changes a phase's subagent *output*, so a no-op
config tweak no longer causes false cache invalidation.
- **README** test count and feature line refreshed (804 → 846 across 46 files);
`per-item map caching` added to the headline capabilities.

## [0.0.27] — 2026-06-25

> Evidence release: **the incremental-recompute cost win is now proven, not
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-MIT-43D9AD?style=flat-square" alt="MIT license"></a>
<a href="#whats-inside"><img src="https://img.shields.io/badge/runtime%20deps-0-43D9AD?style=flat-square" alt="zero runtime dependencies"></a>
<a href="https://github.com/heggria/pi-taskflow/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/heggria/pi-taskflow/ci.yml?branch=main&style=flat-square&label=CI" alt="CI status"></a>
<a href="#whats-inside"><img src="https://img.shields.io/badge/tests-804-6E8BFF?style=flat-square" alt="804 tests"></a>
<a href="#whats-inside"><img src="https://img.shields.io/badge/tests-846-6E8BFF?style=flat-square" alt="846 tests"></a>
<a href="#whats-inside"><img src="https://img.shields.io/badge/dogfooded-%E2%9C%93-43D9AD?style=flat-square" alt="dogfooded"></a>
<a href="https://pi.dev"><img src="https://img.shields.io/badge/for-Pi%20coding%20agent-B692FF?style=flat-square" alt="for the Pi coding agent"></a>
</p>
Expand Down Expand Up @@ -728,12 +728,12 @@ Copy one into `.pi/taskflows/<name>.json` (or `~/.pi/agent/taskflows/`) and it r

<div align="center">

**0 runtime dependencies** · **804 tests** · **9 phase types** · **shared context tree** · **cross-session resume** · **cross-run memoization** · **incremental recompute** · **FlowIR compile seam** · **detached execution** · **`compile` Mermaid renderer** · **~9k LOC runtime**
**0 runtime dependencies** · **846 tests** · **9 phase types** · **shared context tree** · **cross-session resume** · **cross-run memoization** · **per-item map caching** · **incremental recompute** · **FlowIR compile seam** · **detached execution** · **`compile` Mermaid renderer** · **~9k LOC runtime**

</div>

- **Zero runtime dependencies.** No `dependencies` field — the runtime is built entirely on Node built-ins (`fs` / `path` / `os` / `child_process` / `crypto`). The file lock is `fs.openSync("wx")`, not a third-party library.
- **804 tests across 42 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), backward-compatible cache-key migration (3-tier legacy fallback), the FlowIR compile seam (determinism, declared-plane synthesis), incremental recompute (early-cutoff propagation, partial cascade strictly < full, observed ∪ declared union frontier), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, the shared context tree (blackboard reuse, supervision spawn, subflow validation/nesting), workspace isolation (temp/dedicated/worktree lifecycle, fail-open degrade, dynamic-flow rejection), dynamic sub-flow security hardening, detached execution (PID persistence, stale detection, crash→failed, resume after failure), live run-history refresh, callback isolation, the idle watchdog, model-role init config, parseModelFromLabel with parenthesized-model-name regression, and multi-fence `safeParse` recovery, plus the `compile` Mermaid renderer (id-collision disambiguation, markdown-injection hardening, and full verify-overlay category coverage).
- **846 tests across 46 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), backward-compatible cache-key migration (4-tier legacy fallback), per-phase structural sub-fingerprint (v3:phasefp — editing one phase invalidates only it and its dependents), per-item map caching (one changed item re-executes, N−1 cache hits), the `incremental` flag (run-wide cross-run default), reuse reporting, the FlowIR compile seam (determinism, declared-plane synthesis), incremental recompute (early-cutoff propagation, partial cascade strictly < full, observed ∪ declared union frontier), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, the shared context tree (blackboard reuse, supervision spawn, subflow validation/nesting), workspace isolation (temp/dedicated/worktree lifecycle, fail-open degrade, dynamic-flow rejection), dynamic sub-flow security hardening, detached execution (PID persistence, stale detection, crash→failed, resume after failure), live run-history refresh, callback isolation, the idle watchdog, model-role init config, parseModelFromLabel with parenthesized-model-name regression, and multi-fence `safeParse` recovery, plus the `compile` Mermaid renderer (id-collision disambiguation, markdown-injection hardening, and full verify-overlay category coverage).
- **Hardened by design.** Path-traversal defense (lexical + `realpath` containment check), runId validation, HTML/error sanitization, atomic writes, stale-lock stealing via `rename`, and an idle watchdog that kills wedged subagents (SIGTERM → SIGKILL after 5 minutes of silence). Dynamic sub-flows additionally get breadth caps, `cwd` containment, budget clamping, nesting depth caps, and prototype-pollution defense.
- **Dogfooded.** Every new feature has to survive the project's own `self-improve` taskflow before it ships.

Expand Down
93 changes: 67 additions & 26 deletions docs/internal/cache-migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,63 +12,104 @@ Before H1, the cache key folded the flow **definition** fingerprint under a bare
H1 versions the key with a `v2:` prefix and routes the fingerprint through the
FlowIR compile seam (`compileTaskflowToIR` → `flowDefHash`).

To avoid a one-time miss-storm on upgrade, the runtime consults **three** keys
on every cross-run lookup, read-only for the legacy tiers.
M6 replaces the whole-flow `v2:flowdef:` tier with a **per-phase structural
sub-fingerprint** (`v3:phasefp:`): the hash of a single phase plus its
transitive dependency closure. Editing phase B now invalidates only B and its
transitive dependents — independent sibling phase A keeps its cache hit.

## Key shapes (H1)
To avoid a one-time miss-storm on upgrade, the runtime consults **four** keys
on every cross-run lookup, read-only for the fallback tiers.

`cacheKeys()` (`extensions/runtime.ts`) returns three keys for a phase:
## Key shapes (M6)

`cacheKeys()` (`extensions/runtime.ts`) returns four keys for a phase:

| Tier | Shape | Written by | Status |
|------|-------|-----------|--------|
| `key` (current) | `flow:<name>` + `v2:flowdef:<hash>` + `<phase>` + `think/tools/ctx` + fingerprint | H1+ | **read + write** |
| `key` (current) | `flow:<name>` + `v3:phasefp:<subfp>` + `<phase>` + `think/tools/ctx` + fingerprint | M6+ | **read + write** |
| `v2Key` | `flow:<name>` + `v2:flowdef:<flowDefHash>` + … | H1..M5 | **read-only** |
| `bareKey` | `flow:<name>` + `flowdef:<hash>` (bare, unversioned) + … | pre-H1 | **read-only** (removed in v0.1.0) |
| `legacyKey` | `flow:<name>` + … (flowdef line omitted) | pre-flowDefHash era | **read-only** (removed in v0.1.0) |

### The per-phase sub-fingerprint (`v3:phasefp`)

`phaseFingerprint(def, phaseId)` (`extensions/flowir/phasefp.ts`) hashes the
phase itself plus its transitive `dependsOn ∪ from` closure, reusing the vendored
`canonicalJson` + `hashCanonical` (byte-identical to overstory's contract). The
`cache` policy field is stripped (its sub-fields reach the key via other paths);
every other `Phase` field is hashed.

**Soundness fallback.** Per-phase invalidation is only sound when a phase's real
dependencies are fully captured by the static closure. `phaseFingerprint` returns
`undefined` (→ the caller folds the whole-flow `flowDefHash` instead, preserving
pre-M6 behavior) when:

- the flow has `contextSharing: true`, OR
- any phase in the closure (self included) has `shareContext: true`, OR
- any phase in the closure (self included) has `type: "flow"`.

These are the cases where a phase can read sibling state outside its declared
deps (Shared Context Tree) or where sub-structure is resolved at runtime
(`flow`). Sub-flow inner phases always use this fallback (their `phaseFp` is
absent → `flowDefHash`), so editing one phase inside a sub-flow invalidates all
sub-flow phases — a known, safe conservatism.

### Lookup order (`cachedPhase`)

1. within-run resume (`cc.prior.inputHash === keys.key`) — fastest, always allowed.
2. `store.get(keys.key)` — current v2 entry.
3. `store.get(keys.bareKey)` — pre-H1 bare entry.
4. `store.get(keys.legacyKey)` — pre-flowDefHash entry.
2. `store.get(keys.key)` — current v3 entry.
3. `store.get(keys.v2Key)` — pre-M6 v2 entry.
4. `store.get(keys.bareKey)` — pre-H1 bare entry.
5. `store.get(keys.legacyKey)` — pre-flowDefHash entry.

A hit on **any** tier is restored as a `cacheHit: "cross-run"` result with zero
usage. The restored `PhaseState.inputHash` is always `keys.key` (the current
shape), so downstream phases and recompute see a consistent identity.

### Write policy (`recordCache`)

Only `keys.key` (the current v2 shape) is ever written. Legacy/bare hits are
Only `keys.key` (the current v3 shape) is ever written. v2/bare/legacy hits are
**not** write-through: re-storing under the new key would double the cache size
for no benefit. Legacy/bare entries age out naturally via the 90-day hard cap
for no benefit. Legacy/bare/v2 entries age out naturally via the 90-day hard cap
(`DEFAULT_MAX_AGE_MS`) and the LRU cap (`DEFAULT_MAX_ENTRIES`).

## Why three tiers?

- **`v2:flowdef:` (current):** the versioned prefix lets a future genuine
overstory compiler advance to `v3:flowIR:` with its own fallback tier,
without disturbing v2 entries.
- **bare `flowdef:` (pre-H1):** pre-H1 code wrote this shape. Without the 3rd
tier, every existing cross-run entry would silently miss on upgrade — a
one-time miss-storm for opt-in cross-run users.
## Why four tiers?

- **`v3:phasefp:` (current):** the per-phase structural sub-fingerprint enables
precise invalidation — editing one phase no longer evicts independent
siblings. The versioned prefix lets a future genuine overstory compiler
advance to `v4:flowIR:` with its own fallback tier, without disturbing v3.
- **`v2:flowdef:` (pre-M6):** M5-and-earlier code wrote this whole-flow shape.
Without this tier, every existing cross-run entry would silently miss on the
M6 upgrade — a one-time miss-storm for opt-in cross-run users.
- **bare `flowdef:` (pre-H1):** pre-H1 code wrote this shape. Retained for
completeness.
- **no-flowdef (pre-flowDefHash):** the very earliest cross-run entries, before
the flow definition was folded into the key at all. Retained for completeness;
these are rare.

### Upgrade note (one-time cost)

On the first post-M6 run, if a sibling phase was edited between the last
pre-M6 run and the upgrade, an *unchanged* independent phase may re-execute
once: its v2 entry was keyed on the old `flowDefHash`, which no longer matches.
This is bounded (per-flow, one-time, only when a sibling edit happened) and
amortized over subsequent runs as v3 entries take over. For unchanged flows the
v2 tier hits and no re-execution occurs.

## Retirement

- **v0.1.0:** remove the `bareKey` and `legacyKey` tiers and the `CacheKeys`
return to a single `key`. By then all pre-H1 entries will have aged out (90-day
hard cap). The `v2:` prefix is retained as the version anchor for the *next*
migration.
- A pre-release verification step: inspect a real `.pi/taskflow/cache/` directory
for bare-`flowdef:` entries. If cross-run is confirmed unused in production
(opt-in, young), the bare tier can be dropped earlier.
- **v0.1.0:** remove the `bareKey` and `legacyKey` tiers. By then all pre-H1
entries will have aged out (90-day hard cap).
- **Later:** remove the `v2Key` tier once all pre-M6 entries have aged out.
- The `v3:` prefix is retained as the version anchor for the *next* migration.

## See also

- `extensions/flowir/hash.ts` — the vendored overstory hash algorithm.
- `extensions/flowir/phasefp.ts` — the per-phase structural sub-fingerprint.
- `extensions/flowir/index.ts` — `compileTaskflowToIR` (the seam that produces
`hash` and `meta.declaredDeps`).
`hash` and `meta.declaredDeps`) and `phaseFingerprint`.
- `docs/internal/overstory-convergence-roadmap.md` §3 (M1).
- `test/cache-migration.test.ts` — the migration contract tests.
- `test/cache-phasefp.test.ts` — the per-phase sub-fingerprint contract tests.
2 changes: 2 additions & 0 deletions extensions/flowir/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,5 @@ export type {
TaskflowIR,
TaskflowIRMeta,
} from "./meta.ts";

export { phaseFingerprint } from "./phasefp.ts";
Loading