Skip to content

Herb: Implement Syntax Tree Diff Engine#1518

Draft
marcoroth wants to merge 1 commit intomainfrom
diff-engine
Draft

Herb: Implement Syntax Tree Diff Engine#1518
marcoroth wants to merge 1 commit intomainfrom
diff-engine

Conversation

@marcoroth
Copy link
Copy Markdown
Owner

@marcoroth marcoroth commented Mar 28, 2026

This pull request introduces a syntax tree diff engine that compares two parsed Herb syntax trees and produces the minimal set of HTML-semantic differences. The engine is implemented in C so it's available across all bindings (Ruby, Node.js/WASM, Rust, Java).

Motivation

The diff engine is designed to power reactivity and smart repopulation of affected nodes in HTML+ERB templates. By computing the minimal set of semantic changes between two ASTs, consumers can determine exactly which elements, attributes, or text content changed and apply targeted updates instead of reprocessing the entire document.

This enables hot-module reloading (HMR) for HTML+ERB templates, where a dev server can patch only what changed in the DOM and thus preserving element state, focus, scroll position, and event listeners. It also opens the door for incremental re-linting, incremental re-formatting, and language server diagnostics that only recompute affected regions.

How it works

The diff uses a multi-stage approach:

  1. Merkle hashing: a bottom-up pass computes FNV-1a hashes for every node, incorporating all children. Identical subtrees share the same hash and are skipped in O(1). This is the same concept as Merkle trees, used in Git and other content-addressable systems.

  2. LCS-based children diffing: child arrays are compared using the Longest Common Subsequence algorithm to find the minimal edit sequence of insertions, deletions, and keeps. This is the same algorithm behind diff and git diff, applied to AST node arrays instead of text lines.

  3. Move detection: after LCS, unmatched remove+insert pairs are checked for matching identity (same tag name + same attributes). Matches become node_moved operations instead of separate remove+insert. Elements are matched by tag name for the LCS pass, and by tag name + attributes (order-independent, using XOR of attribute hashes) for move detection. This approach is similar to React's reconciliation.

  4. Wrap/unwrap detection: detects when a node is wrapped in a new parent (e.g., <div></div><% if true? %><div></div><% end %>) or unwrapped from one. This uses Merkle hash matching to find removed nodes that appear as children of inserted nodes (wrap) or children of removed nodes that match inserted nodes (unwrap).

Operation types

node_inserted           — new child node added
node_removed            — child node removed
node_replaced           — node type changed entirely
node_moved              — node reordered within parent
node_wrapped            — node wrapped in a new parent element or ERB block
node_unwrapped          — node unwrapped from its parent
text_changed            — HTML text content changed
erb_content_changed     — ERB expression/code changed
attribute_added         — new attribute on element
attribute_removed       — attribute removed from element
attribute_value_changed — attribute value changed
tag_name_changed        — element tag name changed (div → span)

Usage

result = Herb.diff(old_source, new_source)

result.identical?  # => false
result.operations  # => [#<Herb::DiffOperation type=attribute_value_changed path=[0, 0]>, ...]

CLI:

bin/herb diff old.html.erb new.html.erb
4 differences found:

  1. node_removed at path [0]
     old: AST_HTML_ELEMENT_NODE

  2. node_inserted at path [0]
     new: AST_HTML_ELEMENT_NODE

  3. node_inserted at path [1]
     new: AST_HTML_TEXT_NODE

  4. node_inserted at path [2]
     new: AST_HTML_ELEMENT_NODE

Examples

Attribute and content changes
<div class="container">
  <% if current_user %>
    <p>Hello, <%= current_user.name %></p>
  <% end %>
</div>

changed to:

<div class="wrapper" id="main">
  <% if current_user %>
    <p>Hello, <%= current_user.email %></p>
    <span class="badge">Admin</span>
  <% end %>
</div>

produces:

1. attribute_value_changed  (class="container" → class="wrapper")
2. attribute_added          (id="main")
3. erb_content_changed      (current_user.name → current_user.email)
4. node_inserted            (<span>)

Move detection

<ul><li class="a">A</li><li class="b">B</li></ul>

changed to:

<ul><li class="b">B</li><li class="a">A</li></ul>

produces:

1. node_moved  (old_index=1, new_index=0)

Wrap/unwrap detection

<div>Content</div>

changed to:

<% if condition? %><div>Content</div><% end %>

produces:

1. node_wrapped  (old: HTMLElementNode, new: ERBIfNode)

Playground

A new "Diff" tab is added to the playground with two modes. The Live mode diffs on every keystroke, showing a scrollable feed of changes with undo/rollback and in the checkpoint mode you take a snapshot, edit, then explicitly diff. The diff is paused when the parse result has errors.

CleanShot.2026-03-24.at.02.13.58.mp4

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Mar 28, 2026

npx https://pkg.pr.new/@herb-tools/formatter@1518
npx https://pkg.pr.new/@herb-tools/language-server@1518
npx https://pkg.pr.new/@herb-tools/linter@1518

commit: a2adddc

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 28, 2026

🌿 Interactive Playground and Documentation Preview

A preview deployment has been built for this pull request. Try out the changes live in the interactive playground:


🌱 Grown from commit a2adddc

@marcoroth marcoroth changed the title Herb: Implement Diff Engine Herb: Implement Syntax Tree Diff Engine Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant