|
1 | | -<br/> |
2 | | -<br/> |
3 | | - |
4 | 1 | <p align="center"> |
5 | 2 | <img width="400" alt="The logo: a curled wood shaving on a workbench" src="https://github.com/user-attachments/assets/8f1162aa-5769-415d-babe-56b962256747" /> |
6 | 3 | </p> |
|
11 | 8 |
|
12 | 9 | <p align="center"> |
13 | 10 | A type-safe query language for <a href="https://tree-sitter.github.io">Tree-sitter</a>.<br/> |
14 | | - Query in, typed data out. |
| 11 | + Powered by the <a href="https://github.com/bearcove/arborium">arborium</a> grammar collection. |
15 | 12 | </p> |
16 | 13 |
|
17 | 14 | <br/> |
|
26 | 23 | <br/> |
27 | 24 |
|
28 | 25 | <p align="center"> |
29 | | - ⚠️ <a href="#status">ALPHA STAGE</a>: not for production use ⚠️<br/> |
| 26 | + <sub> |
| 27 | + ⚠️ Beta: not for production use ⚠️<br/> |
| 28 | + </sub> |
30 | 29 | </p> |
31 | 30 |
|
32 | 31 | <br/> |
33 | | -<br/> |
34 | | - |
35 | | -## The problem |
36 | | - |
37 | | -Tree-sitter solved parsing. It powers syntax highlighting and code navigation at GitHub, drives the editing experience in Zed, Helix, and Neovim. It gives you a fast, accurate, incremental syntax tree for virtually any language. |
38 | | - |
39 | | -The hard problem now is what comes _after_ parsing: extracting structured data from the tree: |
40 | | - |
41 | | -```typescript |
42 | | -function extractFunction(node: SyntaxNode): FunctionInfo | null { |
43 | | - if (node.type !== "function_declaration") { |
44 | | - return null; |
45 | | - } |
46 | | - const name = node.childForFieldName("name"); |
47 | | - const body = node.childForFieldName("body"); |
48 | | - if (!name || !body) { |
49 | | - return null; |
50 | | - } |
51 | | - return { |
52 | | - name: name.text, |
53 | | - body, |
54 | | - }; |
55 | | -} |
56 | | -``` |
57 | | - |
58 | | -Every extraction requires a new function, each one a potential source of bugs that won't surface until production. |
59 | | - |
60 | | -## The solution |
61 | | - |
62 | | -Plotnik extends Tree-sitter queries with type annotations: |
63 | | - |
64 | | -```clojure |
65 | | -(function_declaration |
66 | | - name: (identifier) @name :: string |
67 | | - body: (statement_block) @body |
68 | | -) @func :: FunctionInfo |
69 | | -``` |
70 | | - |
71 | | -The query describes structure, and Plotnik infers the output type: |
72 | | - |
73 | | -```typescript |
74 | | -interface FunctionInfo { |
75 | | - name: string; |
76 | | - body: SyntaxNode; |
77 | | -} |
78 | | -``` |
79 | | - |
80 | | -This structure is guaranteed by the query engine. No defensive programming needed. |
81 | | - |
82 | | -## But what about Tree-sitter queries? |
83 | | - |
84 | | -Tree-sitter already has queries: |
85 | | - |
86 | | -```clojure |
87 | | -(function_declaration |
88 | | - name: (identifier) @name |
89 | | - body: (statement_block) @body) |
90 | | -``` |
91 | | - |
92 | | -The result is a flat capture list: |
93 | 32 |
|
94 | | -```typescript |
95 | | -query.matches(tree.rootNode); |
96 | | -// → [{ captures: [{ name: "name", node }, { name: "body", node }] }, ...] |
97 | | -``` |
98 | | - |
99 | | -The assembly layer is up to you: |
100 | | - |
101 | | -```typescript |
102 | | -const name = match.captures.find((c) => c.name === "name")?.node; |
103 | | -const body = match.captures.find((c) => c.name === "body")?.node; |
104 | | -if (!name || !body) throw new Error("Missing capture"); |
105 | | -return { name: name.text, body }; |
106 | | -``` |
107 | | - |
108 | | -This means string-based lookup, null checks, and manual type definitions kept in sync by convention. |
109 | | - |
110 | | -Tree-sitter queries are designed for matching. Plotnik adds the typing layer: the query _is_ the type definition. |
111 | | - |
112 | | -## Why Plotnik? |
| 33 | +Tree-sitter gives you the syntax tree. Extracting structured data from it still means writing imperative navigation code, null checks, and maintaining type definitions by hand. Plotnik makes extraction declarative: write a pattern, get typed data. The query is the type definition. |
113 | 34 |
|
114 | | -| Hand-written extraction | Plotnik | |
115 | | -| -------------------------- | ---------------------------- | |
116 | | -| Manual navigation | Declarative pattern matching | |
117 | | -| Runtime type errors | Compile-time type inference | |
118 | | -| Repetitive extraction code | Single-query extraction | |
119 | | -| Ad-hoc data structures | Generated structs/interfaces | |
| 35 | +## Features |
120 | 36 |
|
121 | | -Plotnik extends Tree-sitter's query syntax with: |
| 37 | +- [x] Static type inference from query structure |
| 38 | +- [x] Named expressions for composition and reuse |
| 39 | +- [x] Recursion for nested structures |
| 40 | +- [x] Tagged unions (discriminated unions) |
| 41 | +- [x] TypeScript type generation |
| 42 | +- [x] CLI: `exec` for matches, `infer` for types, `ast`/`trace`/`dump` for debug |
| 43 | +- [ ] Grammar verification (validate queries against tree-sitter node types) |
| 44 | +- [ ] Compile-time queries via proc-macro |
| 45 | +- [ ] LSP server |
| 46 | +- [ ] Editor extensions |
122 | 47 |
|
123 | | -- **Named expressions** for composition and reuse |
124 | | -- **Recursion** for arbitrarily nested structures |
125 | | -- **Type annotations** for precise output shapes |
126 | | -- **Alternations**: untagged for simplicity, tagged for precision (discriminated unions) |
| 48 | +## Example |
127 | 49 |
|
128 | | -## Use cases |
| 50 | +Extract function signatures from Rust. `Type` references itself to handle nested generics like `Option<Vec<String>>`. |
129 | 51 |
|
130 | | -- **Scripting:** Count patterns, extract metrics, audit dependencies |
131 | | -- **Custom linters:** Encode your business rules and architecture constraints |
132 | | -- **LLM Pipelines:** Extract signatures and types as structured data for RAG |
133 | | -- **Code Intelligence:** Outline views, navigation, symbol extraction across grammars |
134 | | - |
135 | | -## Language design |
136 | | - |
137 | | -Start simple—extract all function names from a file: |
| 52 | +`query.ptk`: |
138 | 53 |
|
139 | 54 | ```clojure |
140 | | -Functions = (program |
141 | | - {(function_declaration name: (identifier) @name :: string)}* @functions) |
142 | | -``` |
| 55 | +Type = [ |
| 56 | + Simple: [(type_identifier) (primitive_type)] @name :: string |
| 57 | + Generic: (generic_type |
| 58 | + type: (type_identifier) @name :: string |
| 59 | + type_arguments: (type_arguments (Type)* @args)) |
| 60 | +] |
143 | 61 |
|
144 | | -Plotnik infers the output type: |
| 62 | +Func = (function_item |
| 63 | + name: (identifier) @name :: string |
| 64 | + parameters: (parameters |
| 65 | + (parameter |
| 66 | + pattern: (identifier) @param :: string |
| 67 | + type: (Type) @type |
| 68 | + )* @params)) |
145 | 69 |
|
146 | | -```typescript |
147 | | -type Functions = { |
148 | | - functions: { name: string }[]; |
149 | | -}; |
| 70 | +Funcs = (source_file (Func)* @funcs) |
150 | 71 | ``` |
151 | 72 |
|
152 | | -Scale up to tagged unions for richer structure: |
153 | | - |
154 | | -```clojure |
155 | | -Statement = [ |
156 | | - Assign: (assignment_expression |
157 | | - left: (identifier) @target :: string |
158 | | - right: (Expression) @value) |
159 | | - Call: (call_expression |
160 | | - function: (identifier) @func :: string |
161 | | - arguments: (arguments (Expression)* @args)) |
162 | | -] |
| 73 | +`lib.rs`: |
163 | 74 |
|
164 | | -Expression = [ |
165 | | - Ident: (identifier) @name :: string |
166 | | - Num: (number) @value :: string |
167 | | -] |
| 75 | +```rust |
| 76 | +fn get(key: Option<Vec<String>>) {} |
168 | 77 |
|
169 | | -TopDefinitions = (program (Statement)+ @statements) |
| 78 | +fn set(key: String, val: i32) {} |
170 | 79 | ``` |
171 | 80 |
|
172 | | -This produces: |
| 81 | +Plotnik infers TypeScript types from the query structure. `Type` is recursive: `args: Type[]`. |
173 | 82 |
|
174 | | -```typescript |
175 | | -type Statement = |
176 | | - | { $tag: "Assign"; $data: { target: string; value: Expression } } |
177 | | - | { $tag: "Call"; $data: { func: string; args: Expression[] } }; |
| 83 | +```sh |
| 84 | +❯ plotnik infer query.ptk -l rust |
| 85 | +export type Type = |
| 86 | + | { $tag: "Simple"; $data: { name: string } } |
| 87 | + | { $tag: "Generic"; $data: { name: string; args: Type[] } }; |
178 | 88 |
|
179 | | -type Expression = |
180 | | - | { $tag: "Ident"; $data: { name: string } } |
181 | | - | { $tag: "Num"; $data: { value: string } }; |
| 89 | +export interface Func { |
| 90 | + name: string; |
| 91 | + params: { param: string; type: Type }[]; |
| 92 | +} |
182 | 93 |
|
183 | | -type TopDefinitions = { |
184 | | - statements: [Statement, ...Statement[]]; |
185 | | -}; |
| 94 | +export interface Funcs { |
| 95 | + funcs: Func[]; |
| 96 | +} |
186 | 97 | ``` |
187 | 98 |
|
188 | | -Then process the results: |
189 | | - |
190 | | -```typescript |
191 | | -for (const stmt of result.statements) { |
192 | | - switch (stmt.$tag) { |
193 | | - case "Assign": |
194 | | - console.log(`Assignment to ${stmt.$data.target}`); |
195 | | - break; |
196 | | - case "Call": |
197 | | - console.log( |
198 | | - `Call to ${stmt.$data.func} with ${stmt.$data.args.length} args`, |
199 | | - ); |
200 | | - break; |
201 | | - } |
| 99 | +Run the query against `lib.rs` to extract structured JSON: |
| 100 | +
|
| 101 | +```sh |
| 102 | +❯ plotnik exec query.ptk lib.rs |
| 103 | +{ |
| 104 | + "funcs": [ |
| 105 | + { |
| 106 | + "name": "get", |
| 107 | + "params": [{ |
| 108 | + "param": "key", |
| 109 | + "type": { |
| 110 | + "$tag": "Generic", |
| 111 | + "$data": { |
| 112 | + "name": "Option", |
| 113 | + "args": [{ |
| 114 | + "$tag": "Generic", |
| 115 | + "$data": { |
| 116 | + "name": "Vec", |
| 117 | + "args": [{ "$tag": "Simple", "$data": { "name": "String" } }] |
| 118 | + } |
| 119 | + }] |
| 120 | + } |
| 121 | + } |
| 122 | + }] |
| 123 | + }, |
| 124 | + { |
| 125 | + "name": "set", |
| 126 | + "params": [ |
| 127 | + { "param": "key", "type": { "$tag": "Simple", "$data": { "name": "String" } } }, |
| 128 | + { "param": "val", "type": { "$tag": "Simple", "$data": { "name": "i32" } } } |
| 129 | + ] |
| 130 | + } |
| 131 | + ] |
202 | 132 | } |
203 | 133 | ``` |
204 | 134 |
|
205 | | -For the detailed specification, see the [Language Reference](docs/lang-reference.md). |
| 135 | +## Why |
206 | 136 |
|
207 | | -## Documentation |
| 137 | +Pattern matching over syntax trees is powerful, but tree-sitter queries produce flat capture lists. You still need to assemble the results, handle missing captures, and define types by hand. Plotnik closes this gap: the query describes structure, the engine guarantees it. |
208 | 138 |
|
209 | | -- [CLI Guide](docs/cli.md) — Command-line tool usage |
210 | | -- [Language Reference](docs/lang-reference.md) — Complete syntax and semantics |
211 | | -- [Type System](docs/type-system.md) — How output types are inferred from queries |
212 | | -- [Runtime Engine](docs/runtime-engine.md) — VM execution model (for contributors) |
213 | | - |
214 | | -## Supported Languages |
215 | | - |
216 | | -Plotnik bundles 15 languages out of the box: Bash, C, C++, CSS, Go, HTML, Java, JavaScript, JSON, Python, Rust, TOML, TSX, TypeScript, and YAML. The underlying [arborium](https://github.com/bearcove/arborium) collection includes 60+ permissively-licensed grammars—additional languages can be enabled as needed. |
217 | | - |
218 | | -## Status |
219 | | - |
220 | | -**Working now:** Parser with error recovery, type inference, query execution, CLI tools (`check`, `dump`, `infer`, `exec`, `trace`, `tree`, `langs`). |
221 | | - |
222 | | -**Next up:** CLI distribution (Homebrew, npm), language bindings (TypeScript/WASM, Python), LSP server, editor extensions. |
| 139 | +## Documentation |
223 | 140 |
|
224 | | -⚠️ Alpha stage—API may change. Not for production use. |
| 141 | +- [CLI Guide](docs/cli.md) |
| 142 | +- [Language Reference](docs/lang-reference.md) |
| 143 | +- [Type System](docs/type-system.md) |
225 | 144 |
|
226 | 145 | ## Acknowledgments |
227 | 146 |
|
|
0 commit comments