Identify commands by a structural hash, not a string or position#52
Merged
Conversation
A command's cross-build identity was overloaded onto two lossy proxies: the rendered command string (change detection) and the positional NodeId (implicit-dep carry-forward). Both miss real changes. An exported env var the subprocess reads via bare $VAR never appears in the command text, so changing it left the rendered string identical and the output stale ("Nothing to do"). And command ids shift when an earlier-created command is removed, so previously-discovered header edges were carried onto the wrong command, and a later header edit rebuilt the wrong unit.
Add a per-command structural identity: SHA-256 over the expanded command text plus the (name, value-hash) of every variable the command depends on via a Sticky edge. Exported vars now contribute a Sticky edge so their values fold in even when absent from the text. It is persisted per command (index format v11; RawCommandEntry 16->48 bytes).
Use it as the stable key. detect_new_commands keys change detection on identity, not the rendered string. preserve_old_implicit_edges remaps each carried edge's command from its old id to the new id through identity, which survives id shifts. remove_stale_outputs keeps the command string as its structural-shape key (correctly env-insensitive), so the string command index is retained for that role.
Reproducers: [envdep] (changing an exported env var the subprocess reads triggers a rebuild) and [idshift] (a header dependency survives the command-id shift from removing an earlier glob source). Full suite green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A command's cross-build identity was overloaded onto two lossy proxies:
detect_new_commands), andNodeId— used by implicit-dep carry-forward (preserve_old_implicit_edges).Both miss real changes, and each produced a distinct stale-output bug. This PR gives every command a structural identity — a content hash of its effective definition — and uses it as the stable key in both places.
Bugs fixed
Exported env var the subprocess reads is invisible to change detection.
A var consumed via bare
$VARin the shell (e.g.import FOO+export FOO) never appears in the rendered command text, so changing it left the string identical and the output stale (Nothing to do). Reproducer:[envdep].Implicit header dep misattributed across a command-id shift.
Command ids are stable when a command is added (new ones append) but shift down when an earlier-created glob-matched source is removed.
preserve_old_implicit_edgescarried previously-discoveredheader → commandedges keyed by the old positional id, so after a removal they landed on the wrong command — a later header edit then rebuilt the wrong unit and left the real output stale. Reproducer:[idshift].What changed
graph::compute_command_identity): SHA-256 over the expanded command text plus the(name, value-hash)of every variable the command depends on via aStickyedge. Folding only Variable nodes (not the Tupfile source) avoids over-rebuilding.Stickyedge (create_command_node), making "the subprocess reads this var" an explicit dependency so its value folds into the identity.RawCommandEntry16 → 48 bytes (inline hash;to_raw/from_rawround-trip). The reader rejects older versions, so the upgrade costs one full rebuild.detect_new_commandskeys change detection on identity.preserve_old_implicit_edgesremaps each carried edge's command from its old id to the new id through identity, which is stable across id shifts.remove_stale_outputsdeliberately keeps the command string as its structural-shape key — it asks "does a producer for this output still exist?", which is correctly env-insensitive (an env change should rebuild in place, not delete-and-recreate). The string command index is retained for that role.Deliberately out of scope
The grander "move implicit-dep headers into the in-memory graph and collapse the carry-forward passes" refactor is not done, on purpose: reproduction showed the machinery is otherwise correct (the removal-shift was the only live defect), and system headers (
/usr/include/*) have no natural node in the source-rooted path model — forcing them in adds complexity rather than removing it.Test plan
[envdep]and[idshift]added as regression guards (both red before, green after).clang-format,clang-tidy, andclang-include-cleanerclean on the changed code.DESIGN.mdupdated (v11 format + a "Why Structural Command Identity?" rationale).🤖 Generated with Claude Code