Executes compiled query graphs against Tree-sitter syntax trees. See 06-transitions.md for step types.
struct VM<'t> {
cursor: TreeCursor<'t>, // Never reset — preserves descendant_index for checkpointing
ip: StepId, // Current step index
frames: Vec<Frame>, // Call stack
effects: EffectLog<'t>, // Side-effect log
matched_node: Option<Node<'t>>, // Current match slot
}
struct Frame {
return_addr: u16, // Where to jump on Return
}Lifetime 't denotes the parsed tree-sitter tree.
Fetch step at ip → dispatch by type_id → execute → update ip.
- Execute
nav→ checknode_type→ checknode_field - Fail → backtrack
- Success: if
next == 0→ accept; elseip = next
- Execute
pre_effects, clearmatched_node - Execute
nav, checknode_type/node_field(see Epsilon Transitions below) - Success:
matched_node = cursor.node(), verify negated fields absent - Execute
post_effects - Continuation:
succ_count == 0→ acceptsucc_count == 1→ip = successors[0]succ_count >= 2→ branch viasuccessors(backtracking)
A Match8 or Match16–64 with node_type: None, node_field: None, and nav: Stay is an epsilon transition — it succeeds unconditionally without cursor interaction. This enables pure control-flow decisions (branching for quantifiers) even when the cursor is exhausted (EOF).
Common patterns:
- Quantifier branches:
(a)?uses epsilon to decide match-or-skip - Trailing cleanup: Many queries end with epsilon +
Up(n)to restore cursor position after matching, regardless of tree depth
Execute nav (with optional node_field check) → push { return_addr: next } → ip = target
Pop frame → ip = return_addr
Nav byte encodes cursor movement, resolved at compile time.
| Mode | Behavior |
|---|---|
| Stay | No movement |
| Next/Down | Skip any nodes until match |
| NextSkip/DownSkip | Skip trivia only |
| NextExact/DownExact | Immediate match required |
| Up(n) | Ascend n levels |
| UpSkipTrivia(n) | Ascend n, must be last non-trivia |
| UpExact(n) | Ascend n, must be last child |
- Move cursor → try match
- On fail: Exact → fail; Skip → fail if non-trivia, else retry; Any → retry
- On exhaustion: fail
Example: (foo (bar)) vs (foo (foo) (foo) (bar)) with Down mode skips two foo children to find bar. With DownExact, first mismatch fails immediately.
Backtracking needs to restore frames destroyed by failed branches. Solution: arena + parent pointer.
struct FrameArena {
frames: Vec<Frame>, // Append-only
current: Option<u32>, // "Stack pointer"
}
struct Frame {
return_addr: u16,
parent: Option<u32>, // Caller's frame index
}"Pop" just moves current — frames remain for checkpoint restoration.
Problem: (a)+ accumulates frames forever. Solution: high-water mark pruning after Return:
high_water = max(current_frame_idx, checkpoint_stack.max_frame_ref)
arena.truncate(high_water + 1)
Bounds arena to O(max_checkpoint_depth + current_call_depth).
O(1) Invariant: The checkpoint stack maintains max_frame_ref — the highest frame_index referenced by any active checkpoint.
| Operation | Invariant Update | Complexity |
|---|---|---|
| Push | max_frame_ref = max(max_frame_ref, cp.frame_index) |
O(1) |
| Pop | Recompute only if popping the max holder | O(1) amortized |
Amortized analysis: each checkpoint contributes to at most one recomputation over its lifetime.
Each call site stores its return address in the pushed frame.
struct Checkpoint {
descendant_index: u32, // Cursor position
effect_watermark: usize, // Effect stream length
frame_index: Option<u32>, // Frame arena state
ip: StepId, // Resume point
}- Save: Push checkpoint, track
max_frame_watermarkfor pruning - Restore:
goto_descendant(), truncate effects, setframes.current - Resume:
ip = checkpoint.ip
Save checkpoint for successors[1..] → try successors[0] → on fail, restore and try next.
Operations logged instead of inline output. Backtracking: truncate(watermark).
pub enum RuntimeEffect<'t> {
Node(tree_sitter::Node<'t>),
Text(tree_sitter::Node<'t>),
Arr,
Push,
EndArr,
Obj,
Set(u16), // member index
EndObj,
Enum(u16), // variant index
EndEnum,
Clear,
Null,
}
struct EffectLog<'t>(Vec<RuntimeEffect<'t>>);Lifetime 't denotes the parsed tree-sitter tree (per project conventions).
| Effect | Action |
|---|---|
| Node(n) | Capture node n |
| Text(n) | Extract source text from node n |
| Obj/EndObj | Object boundaries |
| Set(idx) | Assign to field at member index |
| Arr/EndArr | Array boundaries |
| Push | Append to array |
| Enum/EndEnum | Tagged union boundaries (variant at index) |
| Clear | Reset current value |
| Null | Null placeholder (optional/alternation) |
The Node and Text variants carry the actual tree_sitter::Node so the materializer has direct access without needing a separate node buffer. This single-stream design allows natural iteration: for effect in log.0 { match effect { ... } }.
Bytecode (EffectOp in bytecode/effects.rs): Compact 2-byte encoding with 6-bit opcode + 10-bit payload. No embedded data — the Node opcode signals "capture matched_node" but doesn't carry it.
Runtime (RuntimeEffect): The VM interprets bytecode effects and produces runtime effects with embedded data. When the VM executes a bytecode Node effect, it emits RuntimeEffect::Node(matched_node).
Materializer consumes EffectLog to build output. Stream is purely structural; nominal types come from Entrypoint.result_type. See docs/wip/materializer.md for the materialization API.
| Limit | Default | Purpose |
|---|---|---|
| Exec fuel | 1,000,000 | Total transitions |
| Recursion fuel | 1,024 | Call depth |
Exhaustion returns RuntimeError, not panic.
Per-language trivia list used for *Skip navigation. A node is never skipped if it matches the current target — (comment) still matches comments.