Skip to content

Identify commands by a structural hash, not a string or position#52

Merged
typeless merged 1 commit into
mainfrom
feat/command-identity
May 28, 2026
Merged

Identify commands by a structural hash, not a string or position#52
typeless merged 1 commit into
mainfrom
feat/command-identity

Conversation

@typeless

Copy link
Copy Markdown
Owner

Summary

A command's cross-build identity was overloaded onto two lossy proxies:

  • the rendered command string — used by change detection (detect_new_commands), and
  • the positional NodeId — used by implicit-dep carry-forward (preserve_old_implicit_edges).

Both miss real changes, and each produced a distinct stale-output bug. This PR gives every command a structural identity — a content hash of its effective definition — and uses it as the stable key in both places.

Bugs fixed

  1. Exported env var the subprocess reads is invisible to change detection.
    A var consumed via bare $VAR in the shell (e.g. import FOO + export FOO) never appears in the rendered command text, so changing it left the string identical and the output stale (Nothing to do). Reproducer: [envdep].

  2. Implicit header dep misattributed across a command-id shift.
    Command ids are stable when a command is added (new ones append) but shift down when an earlier-created glob-matched source is removed. preserve_old_implicit_edges carried previously-discovered header → command edges keyed by the old positional id, so after a removal they landed on the wrong command — a later header edit then rebuilt the wrong unit and left the real output stale. Reproducer: [idshift].

What changed

  • Structural identity (graph::compute_command_identity): SHA-256 over the expanded command text plus the (name, value-hash) of every variable the command depends on via a Sticky edge. Folding only Variable nodes (not the Tupfile source) avoids over-rebuilding.
  • Exported vars now emit a Sticky edge (create_command_node), making "the subprocess reads this var" an explicit dependency so its value folds into the identity.
  • Persisted in the index — format v11, RawCommandEntry 16 → 48 bytes (inline hash; to_raw/from_raw round-trip). The reader rejects older versions, so the upgrade costs one full rebuild.
  • detect_new_commands keys change detection on identity.
  • preserve_old_implicit_edges remaps each carried edge's command from its old id to the new id through identity, which is stable across id shifts.
  • remove_stale_outputs deliberately keeps the command string as its structural-shape key — it asks "does a producer for this output still exist?", which is correctly env-insensitive (an env change should rebuild in place, not delete-and-recreate). The string command index is retained for that role.

Deliberately out of scope

The grander "move implicit-dep headers into the in-memory graph and collapse the carry-forward passes" refactor is not done, on purpose: reproduction showed the machinery is otherwise correct (the removal-shift was the only live defect), and system headers (/usr/include/*) have no natural node in the source-rooted path model — forcing them in adds complexity rather than removing it.

Test plan

  • [envdep] and [idshift] added as regression guards (both red before, green after).
  • Full suite green: 465 cases / 23,664 assertions.
  • clang-format, clang-tidy, and clang-include-cleaner clean on the changed code.
  • DESIGN.md updated (v11 format + a "Why Structural Command Identity?" rationale).

🤖 Generated with Claude Code

A command's cross-build identity was overloaded onto two lossy proxies: the rendered command string (change detection) and the positional NodeId (implicit-dep carry-forward). Both miss real changes. An exported env var the subprocess reads via bare $VAR never appears in the command text, so changing it left the rendered string identical and the output stale ("Nothing to do"). And command ids shift when an earlier-created command is removed, so previously-discovered header edges were carried onto the wrong command, and a later header edit rebuilt the wrong unit.

Add a per-command structural identity: SHA-256 over the expanded command text plus the (name, value-hash) of every variable the command depends on via a Sticky edge. Exported vars now contribute a Sticky edge so their values fold in even when absent from the text. It is persisted per command (index format v11; RawCommandEntry 16->48 bytes).

Use it as the stable key. detect_new_commands keys change detection on identity, not the rendered string. preserve_old_implicit_edges remaps each carried edge's command from its old id to the new id through identity, which survives id shifts. remove_stale_outputs keeps the command string as its structural-shape key (correctly env-insensitive), so the string command index is retained for that role.

Reproducers: [envdep] (changing an exported env var the subprocess reads triggers a rebuild) and [idshift] (a header dependency survives the command-id shift from removing an earlier glob source). Full suite green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@typeless typeless merged commit aecde9a into main May 28, 2026
9 checks passed
@typeless typeless deleted the feat/command-identity branch May 28, 2026 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant