From 1d1876d41094957ed09d5ee324b1123a0fe24bae Mon Sep 17 00:00:00 2001 From: Sergei Zharinov Date: Thu, 8 Jan 2026 21:46:53 -0300 Subject: [PATCH] docs: Sync binary format docs with implementation - Fix TypeKind enum values in 04-types.md (0=Void, 1=Node, etc.) - Fix TypeId primitives model (all types stored in TypeDefs) - Document Trampoline opcode (0x8) in 06-transitions.md - Fix Epsilon definition in 07-dump-format.md - Replace hand-written examples with real CLI output - Use JSON-based example query that actually runs - Update trace format examples with real plotnik trace output --- AGENTS.md | 6 +- docs/binary-format/04-types.md | 131 ++++--- docs/binary-format/06-transitions.md | 43 ++- docs/binary-format/07-dump-format.md | 185 ++++----- docs/binary-format/08-trace-format.md | 514 +++++++++----------------- 5 files changed, 347 insertions(+), 532 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 2211bd0..b03e5d0 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -73,9 +73,9 @@ Nested = (call function: [(id) @name (Nested) @inner]) The `.` anchor adapts to what it's anchoring: -| Pattern | Behavior | -| ----------- | ------------------------------------------- | -| `(a) . (b)` | Skip trivia, no named nodes between | +| Pattern | Behavior | +| ----------- | --------------------------------------------- | +| `(a) . (b)` | Skip trivia, no named nodes between | | `"x" . (b)` | Strict — nothing between (anonymous involved) | | `(a) . "x"` | Strict — nothing between (anonymous involved) | diff --git a/docs/binary-format/04-types.md b/docs/binary-format/04-types.md index 847ea19..da14241 100644 --- a/docs/binary-format/04-types.md +++ b/docs/binary-format/04-types.md @@ -4,16 +4,11 @@ This section defines the type system metadata used for code generation and runti ## 1. Primitives -**TypeId (u16)**: Index into the Type Definition table. - -- `0`: `Void` (Captures nothing) -- `1`: `Node` (AST Node reference) -- `2`: `String` (Source text) -- `3..N`: Composite types (Index = `TypeId - 3`) +**TypeId (u16)**: Zero-based index into the TypeDefs array. All types, including primitives, are stored as TypeDef entries. ### Node Semantics -`TYPE_NODE` (1) represents a platform-dependent handle to a tree-sitter AST node: +The `Node` type (`TypeKind = 1`) represents a platform-dependent handle to a tree-sitter AST node: | Context | Representation | | :--------- | :--------------------------------------------------------- | @@ -25,12 +20,15 @@ The handle provides access to node metadata (kind, span, text) without copying t **TypeKind (u8)**: Discriminator for `TypeDef`. -- `0`: `Optional` (Wraps another type) -- `1`: `ArrayStar` (Zero or more) -- `2`: `ArrayPlus` (One or more) -- `3`: `Struct` (Record with named fields) -- `4`: `Enum` (Discriminated union) -- `5`: `Alias` (Named reference to another type, e.g., `@x :: Identifier`) +- `0`: `Void` (Unit type, captures nothing) +- `1`: `Node` (AST node reference) +- `2`: `String` (Source text) +- `3`: `Optional` (Wraps another type) +- `4`: `ArrayZeroOrMore` (Zero or more, aka ArrayStar) +- `5`: `ArrayOneOrMore` (One or more, aka ArrayPlus) +- `6`: `Struct` (Record with named fields) +- `7`: `Enum` (Discriminated union) +- `8`: `Alias` (Named reference to another type, e.g., `@x :: Identifier`) ## 2. Layout @@ -96,16 +94,19 @@ struct TypeDef { **Semantics of `data` and `count` fields**: -| Kind | `data` (u16) | `count` (u8) | Interpretation | -| :---------- | :------------- | :------------- | :-------------------- | -| `Optional` | `InnerTypeId` | 0 | Wrapper `T?` | -| `ArrayStar` | `InnerTypeId` | 0 | Wrapper `T[]` | -| `ArrayPlus` | `InnerTypeId` | 0 | Wrapper `[T, ...T[]]` | -| `Struct` | `MemberIndex` | `FieldCount` | Record with fields | -| `Enum` | `MemberIndex` | `VariantCount` | Discriminated union | -| `Alias` | `TargetTypeId` | 0 | Named type reference | +| Kind | `data` (u16) | `count` (u8) | Interpretation | +| :---------------- | :------------- | :------------- | :-------------------- | +| `Void` | 0 | 0 | Unit type | +| `Node` | 0 | 0 | AST node reference | +| `String` | 0 | 0 | Source text | +| `Optional` | `InnerTypeId` | 0 | Wrapper `T?` | +| `ArrayZeroOrMore` | `InnerTypeId` | 0 | Wrapper `T[]` | +| `ArrayOneOrMore` | `InnerTypeId` | 0 | Wrapper `[T, ...T[]]` | +| `Struct` | `MemberIndex` | `FieldCount` | Record with fields | +| `Enum` | `MemberIndex` | `VariantCount` | Discriminated union | +| `Alias` | `TargetTypeId` | 0 | Named type reference | -> **Note**: The interpretation of `data` depends on `kind`. For wrappers and `Alias`, it's a `TypeId`. For `Struct` and `Enum`, it's an index into the TypeMembers section. Parsers must dispatch on `kind` first. +> **Note**: For primitives (`Void`, `Node`, `String`), `data` and `count` are unused. For wrappers and `Alias`, `data` is a `TypeId`. For `Struct` and `Enum`, `data` is an index into the TypeMembers section. Parsers must dispatch on `kind` first. > **Limit**: `count` is u8, so structs/enums are limited to 255 members. @@ -148,24 +149,30 @@ For code generation, build a reverse map (`TypeId → Option`) to look ## 3. Examples +> **Note**: In bytecode, only **used** primitives are emitted to TypeDefs. The emitter writes them first in order (Void, Node, String), then composite types. TypeId values depend on which primitives the query actually uses. + ### 3.1. Simple Struct Query: `Q = (function name: (identifier) @name)` -```text -Strings: ["name", "Q"] - Str#0 Str#1 +Run `plotnik dump -q ''` to see: -TypeDefs: - T3: Struct { data=0, count=1, kind=Struct } +``` +[type_defs] +T0 = +T1 = Struct M0:1 ; { name } -TypeMembers: - [0]: name=Str#0 ("name"), ty=1 (Node) +[type_members] +M0: S1 → T0 ; name: -TypeNames: - [0]: name=Str#1 ("Q"), type_id=T3 +[type_names] +N0: S2 → T1 ; Q ``` +- `T0` is the `Node` primitive (only used primitive is emitted) +- `T1` is a `Struct` with 1 member starting at `M0` +- `M0` maps "name" to type `T0` (Node) + ### 3.2. Recursive Enum Query: @@ -177,47 +184,53 @@ List = [ ] ``` -```text -Strings: ["List", "Nil", "Cons", "head", "tail"] - Str#0 Str#1 Str#2 Str#3 Str#4 - -TypeDefs: - T3: Enum { data=0, count=2, kind=Enum } - T4: Struct { data=2, count=2, kind=Struct } // Cons payload (anonymous) - -TypeMembers: - [0]: name=Str#1 ("Nil"), ty=0 (Void) // unit variant - [1]: name=Str#2 ("Cons"), ty=T4 // payload is struct - [2]: name=Str#3 ("head"), ty=1 (Node) - [3]: name=Str#4 ("tail"), ty=T3 // self-reference +Run `plotnik dump -q ''` to see: -TypeNames: - [0]: name=Str#0 ("List"), type_id=T3 +``` +[type_defs] +T0 = +T1 = +T2 = Struct M0:2 ; { head, tail } +T3 = Enum M2:2 ; Nil | Cons + +[type_members] +M0: S1 → T1 ; head: +M1: S2 → T3 ; tail: List +M2: S3 → T0 ; Nil: +M3: S4 → T2 ; Cons: T2 + +[type_names] +N0: S5 → T3 ; List ``` -The `tail` field's type (`T3`) points back to the `List` enum. Recursive types are naturally representable since everything is indexed. +- `T0` (Void) and `T1` (Node) are primitives used by the query +- `T2` is the Cons payload struct with `head` and `tail` fields +- `T3` is the `List` enum with `Nil` and `Cons` variants +- `M1` shows `tail: List` — self-reference to `T3` ### 3.3. Custom Type Annotation Query: `Q = (identifier) @name :: Identifier` -```text -Strings: ["Identifier", "name", "Q"] - Str#0 Str#1 Str#2 +Run `plotnik dump -q ''` to see: -TypeDefs: - T3: Alias { data=1 (Node), count=0, kind=Alias } - T4: Struct { data=0, count=1, kind=Struct } +``` +[type_defs] +T0 = +T1 = Alias(T0) +T2 = Struct M0:1 ; { name } -TypeMembers: - [0]: name=Str#1 ("name"), ty=T3 (Identifier alias) +[type_members] +M0: S2 → T1 ; name: Identifier -TypeNames: - [0]: name=Str#0 ("Identifier"), type_id=T3 - [1]: name=Str#2 ("Q"), type_id=T4 +[type_names] +N0: S1 → T1 ; Identifier +N1: S3 → T2 ; Q ``` -The `Alias` type creates a distinct TypeId so the emitter can render `Identifier` instead of `Node`. +- `T0` is the underlying `Node` primitive +- `T1` is an `Alias` pointing to `T0`, named "Identifier" +- The `name` field has type `T1` (the alias), so code generators emit `Identifier` instead of `Node` ## 4. Validation diff --git a/docs/binary-format/06-transitions.md b/docs/binary-format/06-transitions.md index 24668a0..29f1ab7 100644 --- a/docs/binary-format/06-transitions.md +++ b/docs/binary-format/06-transitions.md @@ -42,16 +42,17 @@ type_id (u8) | `10` | Anonymous | Anonymous node check (`"text"` literals) | | `11` | Reserved | Reserved for future use | -| Opcode | Name | Size | Description | -| :----- | :------ | :------- | :----------------------------------- | -| 0x0 | Match8 | 8 bytes | Fast-path match (1 successor, no fx) | -| 0x1 | Match16 | 16 bytes | Extended match with inline payload | -| 0x2 | Match24 | 24 bytes | Extended match with inline payload | -| 0x3 | Match32 | 32 bytes | Extended match with inline payload | -| 0x4 | Match48 | 48 bytes | Extended match with inline payload | -| 0x5 | Match64 | 64 bytes | Extended match with inline payload | -| 0x6 | Call | 8 bytes | Function call | -| 0x7 | Return | 8 bytes | Return from call | +| Opcode | Name | Size | Description | +| :----- | :--------- | :------- | :----------------------------------- | +| 0x0 | Match8 | 8 bytes | Fast-path match (1 successor, no fx) | +| 0x1 | Match16 | 16 bytes | Extended match with inline payload | +| 0x2 | Match24 | 24 bytes | Extended match with inline payload | +| 0x3 | Match32 | 32 bytes | Extended match with inline payload | +| 0x4 | Match48 | 48 bytes | Extended match with inline payload | +| 0x5 | Match64 | 64 bytes | Extended match with inline payload | +| 0x6 | Call | 8 bytes | Function call | +| 0x7 | Return | 8 bytes | Return from call | +| 0x8 | Trampoline | 8 bytes | Universal entry point | ### Terminal States @@ -291,6 +292,28 @@ struct Return { } ``` +### 4.6. Trampoline + +Universal entry point instruction. Like Call, but the target comes from VM context (external parameter) rather than being encoded in the instruction. Used at address 0 for the entry preamble. + +```rust +#[repr(C)] +struct Trampoline { + type_id: u8, // segment(2) | 0 | 0x8 + _pad1: u8, + next: u16, // Return address (StepId) + _pad2: [u8; 4], +} +``` + +The preamble at step 0 typically looks like: `Obj → Trampoline → EndObj → Accept`. When executed: + +1. VM pushes `next` (return address) onto call stack +2. VM jumps to `entrypoint_target` (set from entrypoint before execution) +3. When the entrypoint returns, execution continues at `next` + +This allows a single compiled preamble to dispatch to any entrypoint without recompilation. + ## 5. Execution Semantics ### 5.1. Match8 Execution diff --git a/docs/binary-format/07-dump-format.md b/docs/binary-format/07-dump-format.md index 6bb58d4..6bf20c6 100644 --- a/docs/binary-format/07-dump-format.md +++ b/docs/binary-format/07-dump-format.md @@ -1,132 +1,90 @@ -# Bytecode Dump Implementation +# Bytecode Dump Format + +The `dump` command displays compiled bytecode in a human-readable format. ## Example Query ``` -Ident = (identifier) @name :: string -Expression = [ - Literal: (number) @value - Variable: (identifier) @name -] -Assignment = (assignment_expression - left: (identifier) @target - right: (Expression) @value) +Value = (document [ + Num: (number) @n + Str: (string) @s +]) ``` +Run: `plotnik dump -q ''` + ## Bytecode Dump **Epsilon transitions** (`ε`) succeed unconditionally without cursor interaction. -They require all three conditions: - -- `nav == Stay` (no cursor movement) -- `node_type == None` (no type constraint) -- `node_field == None` (no field constraint) - -A step with `nav == Stay` but with a type constraint (e.g., `(identifier)`) is NOT -epsilon — it matches at the current cursor position. +They are identified by `nav == Epsilon` — a distinct navigation mode (not Stay). **Capture effect consolidation**: Scalar capture effects (`Node`, `Text`, `Set`) are placed directly on match instructions rather than in separate epsilon steps. Structural -effects (`Obj`, `EndObj`, `Arr`, `EndArr`, `Enum`, `EndEnum`) remain in epsilons. +effects (`Obj`, `EndObj`, `Arr`, `EndArr`, `Enum`, `EndEnum`) may appear in epsilons or +consolidated into match instructions. ``` [flags] linked = false [strings] -S00 "Beauty will save the world" -S01 "name" -S02 "value" -S03 "Literal" -S04 "Variable" -S05 "target" -S06 "Ident" -S07 "Expression" -S08 "Assignment" -S09 "identifier" -S10 "number" -S11 "assignment_expression" -S12 "right" -S13 "left" +S0 "Beauty will save the world" +S1 "n" +S2 "s" +S3 "Num" +S4 "Str" +S5 "Value" +S6 "document" +S7 "number" +S8 "string" [type_defs] -T00 = -T01 = -T02 = Struct M0:1 ; { name } -T03 = Struct M1:1 ; { value } -T04 = Struct M2:1 ; { name } -T05 = Enum M3:2 ; Literal | Variable -T06 = Struct M5:2 ; { value, target } -T07 = Struct M7:1 ; { target } -T08 = Struct M8:1 ; { value } +T0 = +T1 = Struct M0:1 ; { n } +T2 = Struct M1:1 ; { s } +T3 = Enum M2:2 ; Num | Str [type_members] -M0: S01 → T01 ; name: -M1: S02 → T00 ; value: -M2: S01 → T00 ; name: -M3: S03 → T03 ; Literal: T03 -M4: S04 → T04 ; Variable: T04 -M5: S02 → T05 ; value: Expression -M6: S05 → T00 ; target: -M7: S05 → T00 ; target: -M8: S02 → T05 ; value: Expression +M0: S1 → T0 ; n: +M1: S2 → T0 ; s: +M2: S3 → T1 ; Num: T1 +M3: S4 → T2 ; Str: T2 [type_names] -N0: S06 → T02 ; Ident -N1: S07 → T05 ; Expression -N2: S08 → T06 ; Assignment +N0: S5 → T3 ; Value [entrypoints] -Assignment = 12 :: T06 -Expression = 09 :: T05 -Ident = 01 :: T02 +Value = 06 :: T3 [transitions] - 00 ε ◼ - -Ident: - 01 ε 02 - 02 ε [Obj] 04 - 04 (identifier) [Text Set(M0)] 06 - 06 ε [EndObj] 08 - 08 ▶ - -Expression: - 09 ε 10 - 10 ε 30, 36 - -Assignment: - 12 ε 13 - 13 ε [Obj] 15 - 15 (assignment_expression) 16 - 16 ▽ left: (identifier) [Node Set(M6)] 18 - 18 ▷ right: (Expression) 09 : 19 - 19 ε [Set(M5)] 21 - 21 △ 22 - 22 ε [EndObj] 24 - 24 ▶ - 25 ▶ - 26 ε [EndEnum] 25 - 28 (number) [Node Set(M1)] 26 - 30 ε [Enum(M3)] 28 - 32 ε [EndEnum] 25 - 34 (identifier) [Node Set(M2)] 32 - 36 ε [Enum(M4)] 34 +_ObjWrap: + 00 ε [Obj] 02 + 02 Trampoline 03 + 03 ε [EndObj] 05 + 05 ▶ + +Value: + 06 ε 07 + 07 ! (document) 08 + 08 ε 11, 16 + 10 ▶ + 11 !!▽ [Enum(M2)] (number) [Node Set(M0) EndEnum] 19 + 14 ... + 15 ... + 16 !!▽ [Enum(M3)] (string) [Node Set(M1) EndEnum] 19 + 19 △ _ 10 ``` -## Files - -- `crates/plotnik-lib/src/bytecode/dump.rs` (new) -- `crates/plotnik-lib/src/bytecode/dump_tests.rs` (new) -- `crates/plotnik-lib/src/bytecode/mod.rs` (add exports) +### Sections Explained -## API +- **`_ObjWrap`**: Universal entry preamble. Wraps all entrypoints with `Obj`/`EndObj` and dispatches via `Trampoline`. +- **`Value`**: The compiled query definition. Step 08 branches to try `Num` (step 11) or `Str` (step 16). +- **`...`**: Padding slots (multi-step instructions occupy consecutive step IDs). -```rust -pub fn dump(module: &Module) -> String -``` +## Files -Future: options for verbosity levels, hiding sections, etc. +- `crates/plotnik-lib/src/bytecode/dump.rs` — Dump formatting logic +- `crates/plotnik-lib/src/bytecode/format.rs` — Shared formatting utilities ## Instruction Format @@ -178,6 +136,7 @@ Examples: | Epsilon | `step ε [effects] succ` | | Call | `step nav field: (Name) target : ret` | | Return | `step ▶` | +| Trampoline | `step Trampoline succ` | Successors aligned in right column. Omit empty `[pre]`, `[post]`, `(type)`, `field:`. @@ -185,23 +144,23 @@ Effects in `[pre]` execute before match attempt; effects in `[post]` execute aft ## Nav Symbols -| Nav | Symbol | Notes | -| --------------- | ------- | ----------------------------------- | -| Stay | (blank) | No movement, 5 spaces | -| Stay (epsilon) | ε | Only when no type/field constraints | -| StayExact | ! | No movement, exact match only | -| Down | ▽ | First child, skip any | -| DownSkip | !▽ | First child, skip trivia | -| DownExact | !!▽ | First child, exact | -| Next | ▷ | Next sibling, skip any | -| NextSkip | !▷ | Next sibling, skip trivia | -| NextExact | !!▷ | Next sibling, exact | -| Up(1) | △ | Ascend 1 level (no superscript) | -| Up(n≥2) | △ⁿ | Ascend n levels, skip any | -| UpSkipTrivia(n) | !△ⁿ | Ascend n, must be last non-trivia | -| UpExact(n) | !!△ⁿ | Ascend n, must be last child | - -**Note**: `ε` only appears when all three conditions are met: Stay nav, no type constraint, no field constraint. A step matching `(identifier)` at current position shows spaces, not `ε`. +| Nav | Symbol | Notes | +| --------------- | ------- | ---------------------------------- | +| Epsilon | ε | Pure control flow, no cursor check | +| Stay | (blank) | No movement, 5 spaces | +| StayExact | ! | No movement, exact match only | +| Down | ▽ | First child, skip any | +| DownSkip | !▽ | First child, skip trivia | +| DownExact | !!▽ | First child, exact | +| Next | ▷ | Next sibling, skip any | +| NextSkip | !▷ | Next sibling, skip trivia | +| NextExact | !!▷ | Next sibling, exact | +| Up(1) | △ | Ascend 1 level (no superscript) | +| Up(n≥2) | △ⁿ | Ascend n levels, skip any | +| UpSkipTrivia(n) | !△ⁿ | Ascend n, must be last non-trivia | +| UpExact(n) | !!△ⁿ | Ascend n, must be last child | + +**Note**: `ε` appears for `Nav::Epsilon` — a distinct mode from `Stay`. A step with `nav == Stay` but with type constraints (e.g., `(identifier)`) shows blank, not `ε`. ## Effects diff --git a/docs/binary-format/08-trace-format.md b/docs/binary-format/08-trace-format.md index 43f84bb..a698e23 100644 --- a/docs/binary-format/08-trace-format.md +++ b/docs/binary-format/08-trace-format.md @@ -119,400 +119,224 @@ The step number indicates _where_ we're restoring to. `❮❮❮` is centered in From `07-dump-format.md`: ``` -Ident = (identifier) @name :: string -Expression = [ - Literal: (number) @value - Variable: (identifier) @name -] -Assignment = (assignment_expression - left: (identifier) @target - right: (Expression) @value) +Value = (document [ + Num: (number) @n + Str: (string) @s +]) ``` +Run: `plotnik trace -q '' -s '' -l json -v` + ### Bytecode Reference ``` [entrypoints] -Assignment = 12 :: T06 -Expression = 09 :: T05 -Ident = 01 :: T02 +Value = 06 :: T3 [transitions] - 00 ε ◼ - -Ident: - 01 ε 02 - 02 ε [Obj] 04 - 04 (identifier) [Text Set(M0)] 06 - 06 ε [EndObj] 08 - 08 ▶ - -Expression: - 09 ε 10 - 10 ε 30, 36 - -Assignment: - 12 ε 13 - 13 ε [Obj] 15 - 15 (assignment_expression) 16 - 16 ▽ left: (identifier) [Node Set(M6)] 18 - 18 ▷ right: (Expression) 09 : 19 - 19 ε [Set(M5)] 21 - 21 △ 22 - 22 ε [EndObj] 24 - 24 ▶ - 25 ▶ - 26 ε [EndEnum] 25 - 28 (number) [Node Set(M1)] 26 - 30 ε [Enum(M3)] 28 - 32 ε [EndEnum] 25 - 34 (identifier) [Node Set(M2)] 32 - 36 ε [Enum(M4)] 34 +_ObjWrap: + 00 ε [Obj] 02 + 02 Trampoline 03 + 03 ε [EndObj] 05 + 05 ▶ + +Value: + 06 ε 07 + 07 ! (document) 08 + 08 ε 11, 16 + 10 ▶ + 11 !!▽ [Enum(M2)] (number) [Node Set(M0) EndEnum] 19 + 14 ... + 15 ... + 16 !!▽ [Enum(M3)] (string) [Node Set(M1) EndEnum] 19 + 19 △ _ 10 ``` --- -**Note**: The following trace examples are illustrative and use simplified step numbers for clarity. The actual step numbers in your output may differ based on the current bytecode generation. The format and sub-line conventions remain the same. - -## Trace 1: Successful Match with Backtracking (`-v`) +## Trace 1: Successful Match on First Branch (`-v`) -**Entrypoint:** `Assignment` -**Source:** `x = y` +**Source:** `42` (JSON number) ``` -(assignment_expression ; root - left: (identifier) ; "x" - right: (identifier)) ; "y" +(document + (number "42")) ``` ### Execution Trace ``` -Assignment: - 08 ε 09 - 09 (assignment_expression) 10 - ● assignment_expression x = y - 10 left: (identifier) [Node Set(M6)] 12 - ▽ identifier - ● left: - ● identifier x - ⬥ Node - ⬥ Set "target" - 12 (Expression) 05 : 13 - ▷ identifier - ● right: - ▶ (Expression) -Expression: - 05 ε 06 - 06 ε 22, 28 - 22 ε [Enum(M3)] 20 - ⬥ Enum "Literal" - 20 (number) [Node Set(M1)] 18 - ○ identifier y - 06 ❮❮❮ - 28 ε [Enum(M4)] 26 - ⬥ Enum "Variable" - 26 (identifier) [Node Set(M2)] 24 - ● identifier y +_ObjWrap: + 00 ε [Obj] 02 + ⬥ Obj + 02 Trampoline 03 + ▶ (Value) + +Value: + 06 ε 07 + 07 (document) 08 + document + ● document 42 + -------------------------------------------- + 08 ε 11, 16 + 11 [Enum(M2)] (number) [Node Set(M0) EndEnum] 19 + ⬥ Enum "Num" + ▽ number + ● number 42 ⬥ Node - ⬥ Set "name" - 24 ε [EndEnum] 17 + ⬥ Set "n" ⬥ EndEnum - 17 ◀ (Expression) -Assignment: - 13 ε [Set(M5)] 15 - ⬥ Set "value" - 15 16 - △ assignment_expression - 16 ◀ (Assignment) ◼ + -------------------------------------------- + 19 _ 10 + △ document + ● document 42 + -------------------------------------------- + 10 ◀ (Value) + +_ObjWrap: + -------------------------------------------- + 03 ε [EndObj] 05 + ⬥ EndObj + 05 ◀ _ObjWrap ◼ ``` -### Execution Summary - -1. **08→09**: Epsilon entry -2. **09→10**: Match `(assignment_expression)` at root -3. **10→12**: Navigate ▽, match `left: (identifier)`, capture "x" as `@target` -4. **12→05**: Navigate ▷, check `right:`, call Expression -5. **05→06→22**: Expression entry, checkpoint at 28 -6. **22→20**: Start Literal variant, try `(number)` -7. **20**: `(identifier)` found, type mismatch, backtrack to checkpoint -8. **28→26**: Start Variable variant, try `(identifier)` -9. **26→24**: `(identifier) "y"` matches, capture as `@name` -10. **24→17**: EndEnum, return from Expression -11. **13→15**: Set `@value` field -12. **15→16**: Navigate △ to root -13. **16**: Return from Assignment +First branch (`Num`) matches — checkpoint at step 16 is never used. --- -## Trace 2: Successful Match without Backtracking (`-v`) +## Trace 2: Successful Match with Backtracking (`-v`) -**Entrypoint:** `Assignment` -**Source:** `x = 1` +**Source:** `"hello"` (JSON string) ``` -(assignment_expression - left: (identifier) ; "x" - right: (number)) ; "1" +(document + (string "\"hello\"")) ``` ### Execution Trace ``` -Assignment: - 08 ε 09 - 09 (assignment_expression) 10 - ● assignment_expression x = 1 - 10 left: (identifier) [Node Set(M6)] 12 - ▽ identifier - ● left: - ● identifier x - ⬥ Node - ⬥ Set "target" - 12 (Expression) 05 : 13 - ▷ number - ● right: - ▶ (Expression) -Expression: - 05 ε 06 - 06 ε 22, 28 - 22 ε [Enum(M3)] 20 - ⬥ Enum "Literal" - 20 (number) [Node Set(M1)] 18 - ● number 1 +_ObjWrap: + 00 ε [Obj] 02 + ⬥ Obj + 02 Trampoline 03 + ▶ (Value) + +Value: + 06 ε 07 + 07 (document) 08 + document + ● document "hello" + -------------------------------------------- + 08 ε 11, 16 + 11 [Enum(M2)] (number) [Node Set(M0) EndEnum] 19 + ⬥ Enum "Num" + ▽ string + ○ string "hello" + 08 ❮❮❮ + -------------------------------------------- + 16 [Enum(M3)] (string) [Node Set(M1) EndEnum] 19 + ⬥ Enum "Str" + ▽ string + ● string "hello" ⬥ Node - ⬥ Set "value" - 18 ε [EndEnum] 17 + ⬥ Set "s" ⬥ EndEnum - 17 ◀ (Expression) -Assignment: - 13 ε [Set(M5)] 15 - ⬥ Set "value" - 15 16 - △ assignment_expression - 16 ◀ (Assignment) ◼ + 19 _ 10 + △ document + ● document "hello" + -------------------------------------------- + 10 ◀ (Value) + +_ObjWrap: + -------------------------------------------- + 03 ε [EndObj] 05 + ⬥ EndObj + 05 ◀ _ObjWrap ◼ ``` -First branch (Literal) matches immediately — checkpoint at 28 is never used. +### Execution Summary + +1. **00→02**: Preamble starts, emit `Obj` +2. **02→Value**: `Trampoline` dispatches to entrypoint +3. **07→08**: Match `(document)` succeeds +4. **08**: Branch — create checkpoint at 16, try 11 first +5. **11**: Try `Num` branch, navigate down, find `string` — type mismatch (`○`) +6. **08 ❮❮❮**: Backtrack to checkpoint +7. **16**: Try `Str` branch, navigate down, find `string` — match (`●`) +8. **19→10**: Navigate up, return from `Value` +9. **03→05**: Preamble cleanup, emit `EndObj`, accept (`◼`) --- ## Trace 3: Failed Match (`-v`) -**Entrypoint:** `Expression` -**Source:** `"hello"` +**Source:** `true` (JSON boolean — neither number nor string) ``` -(string) ; string literal, not number or identifier +(document + (true "true")) ``` ### Execution Trace ``` -Expression: - 05 ε 06 - 06 ε 22, 28 - 22 ε [Enum(M3)] 20 - ⬥ Enum "Literal" - 20 (number) [Node Set(M1)] 18 - ○ string hello - 06 ❮❮❮ - 28 ε [Enum(M4)] 26 - ⬥ Enum "Variable" - 26 (identifier) [Node Set(M2)] 24 - ○ string hello +_ObjWrap: + 00 ε [Obj] 02 + ⬥ Obj + 02 Trampoline 03 + ▶ (Value) + +Value: + 06 ε 07 + 07 (document) 08 + document + ● document true + -------------------------------------------- + 08 ε 11, 16 + 11 [Enum(M2)] (number) [Node Set(M0) EndEnum] 19 + ⬥ Enum "Num" + ▽ true + ○ true true + 08 ❮❮❮ + -------------------------------------------- + 16 [Enum(M3)] (string) [Node Set(M1) EndEnum] 19 + ⬥ Enum "Str" + ▽ true + ○ true true ``` Both branches fail. No more checkpoints — query does not match. The CLI exits with code 1. --- -## Trace 4: Text Effect (String Capture) (`-v`) +## Trace 4: Default Verbosity (Compact) -**Entrypoint:** `Ident` -**Source:** `foo` +Same as Trace 2 but with default verbosity (no `-v` flag). Navigation and effect sub-lines are hidden: ``` -(identifier) ; "foo" -``` +_ObjWrap: + 00 ε [Obj] 02 + 02 Trampoline 03 + ▶ (Value) -### Execution Trace +Value: + 06 ε 07 + 07 (document) 08 + ● document + 08 ε 11, 16 + 11 [Enum(M2)] (number) [Node Set(M0) EndEnum] 19 + ○ string + 08 ❮❮❮ + 16 [Enum(M3)] (string) [Node Set(M1) EndEnum] 19 + ● string + 19 _ 10 + ● document + 10 ◀ (Value) -``` -Ident: - 01 ε 02 - 02 (identifier) [Text Set(M0)] 04 - ● identifier foo - ⬥ Text - ⬥ Set "name" - 04 ◀ (Ident) ◼ -``` - -The `Text` effect extracts the node's source text as a string (from `@name :: string`). - ---- - -## Trace 5: Search with Skipping (`-v`) - -To demonstrate skip behavior, consider a different query: - -``` -ReturnVal = (statement_block (return_statement) @ret) -``` - -**Bytecode:** - -``` -ReturnVal: - 01 ε 02 - 02 (statement_block) 03 - 03 ▽ (return_statement) [Node Set(M0)] 04 - 04 △ 05 - 05 ◼ -``` - -**Entrypoint:** `ReturnVal` -**Source:** `{ x; return 1; }` - -``` -(statement_block - (expression_statement) ; "x;" - (return_statement)) ; "return 1;" -``` - -### Execution Trace - -``` -ReturnVal: - 01 ε 02 - 02 (statement_block) 03 - ● statement_block { x; return 1; } - 03 (return_statement) [Node Set(M0)] 04 - ▽ expression_statement - ○ expression_statement x; - ▷ return_statement - ● return_statement return 1; - ⬥ Node - ⬥ Set "ret" - 04 05 - △ statement_block - 05 ◀ (ReturnVal) ◼ -``` - -The navigation lands on `(expression_statement)`, type mismatch, skip `▷` to next sibling, find `(return_statement)`. - ---- - -## Trace 6: Immediate Failure (`-v`) - -**Entrypoint:** `Assignment` -**Source:** `42` - -``` -(number) ; just a number literal -``` - -### Execution Trace - -``` -Assignment: - 08 ε 09 - 09 (assignment_expression) 10 - ○ number 42 -``` - -Type check fails at root — no navigation occurs. The CLI exits with code 1. - ---- - -## Trace 7: Suppressive Capture (`-v`) - -Suppressive captures (`@_`) match structurally but don't emit effects. The trace shows: - -- `⬥ SuppressBegin` / `⬥ SuppressEnd` when entering/exiting suppression -- `⬦` for data effects that are suppressed -- `⬦ SuppressBegin` / `⬦ SuppressEnd` for nested suppression (already inside another `@_`) - -**Query:** - -``` -Pair = (pair key: (string) @_ value: (number) @value) -``` - -**Entrypoint:** `Pair` -**Source:** `"x": 1` - -``` -(pair - key: (string) ; "x" - value: (number)) ; 1 -``` - -### Execution Trace - -``` -Pair: - 01 ε 02 - 02 ε [Obj] 04 - ⬥ Obj - 04 (pair) 05 - ● pair "x": 1 - 05 key: (string) [SuppressBegin] 06 - ▽ string - ● key: - ● string "x" - ⬥ SuppressBegin - 06 ε [SuppressEnd] 08 - ⬦ Node - ⬦ Set "key" - ⬥ SuppressEnd - 08 value: (number) [Node Set(M0)] 10 - ▷ number - ● value: - ● number 1 - ⬥ Node - ⬥ Set "value" - 10 12 - △ pair - 12 ε [EndObj] 14 - ⬥ EndObj - 14 ◀ (Pair) ◼ -``` - -The `@_` capture on `key:` wraps its inner effects with `SuppressBegin`/`SuppressEnd`. Effects between them (`Node`, `Set "key"`) appear as `⬦` (suppressed). The `@value` capture emits normally with `⬥`. - ---- - -## Trace 8: Default Verbosity (Compact) - -Same as Trace 1 but with default verbosity (no `-v` flag). Navigation and effect sub-lines are hidden: - -``` -Assignment: - 08 ε 09 - 09 (assignment_expression) 10 - ● assignment_expression - 10 left: (identifier) [Node Set(M6)] 12 - ● identifier - 12 (Expression) 05 : 13 - ▶ (Expression) -Expression: - 05 ε 06 - 06 ε 22, 28 - 22 ε [Enum(M3)] 20 - 20 (number) [Node Set(M1)] 18 - ○ identifier - 06 ❮❮❮ - 28 ε [Enum(M4)] 26 - 26 (identifier) [Node Set(M2)] 24 - ● identifier - 24 ε [EndEnum] 17 - 17 ◀ (Expression) -Assignment: - 13 ε [Set(M5)] 15 - 15 16 - ● assignment_expression - 16 ◀ (Assignment) ◼ +_ObjWrap: + 03 ε [EndObj] 05 + 05 ◀ _ObjWrap ◼ ``` Default shows: @@ -560,21 +384,17 @@ Step number `NN` is the checkpoint we're restoring to. Appears as an instruction ## Nav Symbols -| Nav | Symbol | Meaning | -| --------------- | ------- | ------------------------------- | -| Stay | (space) | No movement | -| Stay (epsilon) | ε | No movement, no constraints | -| StayExact | ! | Stay at position, exact only | -| Down | ▽ | First child, skip any | -| DownSkip | !▽ | First child, skip trivia | -| DownExact | !!▽ | First child, exact | -| Next | ▷ | Next sibling, skip any | -| NextSkip | !▷ | Next sibling, skip trivia | -| NextExact | !!▷ | Next sibling, exact | -| Up(1) | △ | Ascend 1 level (no superscript) | -| Up(n≥2) | △ⁿ | Ascend n levels | -| UpSkipTrivia(n) | !△ⁿ | Ascend n, last non-trivia | -| UpExact(n) | !!△ⁿ | Ascend n, last child | +In trace output, navigation symbols are **simplified** — skip/exact variants are not distinguished: + +| Nav | Symbol | Meaning | +| ---------------------------- | ------- | ---------------------------- | +| Epsilon | ε | Pure control flow, no cursor | +| Stay, StayExact | (space) | No movement | +| Down, DownSkip, DownExact | ▽ | Descended to child | +| Next, NextSkip, NextExact | ▷ | Moved to sibling | +| Up(n), UpSkipTrivia, UpExact | △ | Ascended to parent | + +> **Note**: For detailed nav symbols with mode modifiers (`!▽`, `!!▽`, etc.), see [07-dump-format.md](07-dump-format.md#nav-symbols). Trace format simplifies these for readability. ## Effects