|
5 | 5 | <img width="400" alt="The logo: a curled wood shaving on a workbench" src="https://github.com/user-attachments/assets/1fcef0a9-20f8-4500-960b-f31db3e9fd94" /> |
6 | 6 | </p> |
7 | 7 |
|
8 | | -<h1><p align="center">Plotnik</p></h1> |
| 8 | +<h1><p align="center">Plotnik</p></h1> |
9 | 9 |
|
10 | 10 | <p align="center"> |
11 | | - Typed query language for <a href="https://tree-sitter.github.io/">tree-sitter</a> |
| 11 | + Typed query language for <a href="https://tree-sitter.github.io/">Tree-sitter</a>. Your queries return typed structs.<br/> |
| 12 | + Captures become fields, quantifiers become arrays, alternations become unions. |
12 | 13 | </p> |
13 | 14 |
|
| 15 | +<br/> |
| 16 | + |
14 | 17 | <p align="center"> |
15 | 18 | <a href="https://github.com/plotnik-lang/plotnik/actions/workflows/stable.yml"><img src="https://github.com/plotnik-lang/plotnik/actions/workflows/stable.yml/badge.svg" alt="stable"></a> |
16 | 19 | <a href="https://github.com/plotnik-lang/plotnik/actions/workflows/nightly.yml"><img src="https://github.com/plotnik-lang/plotnik/actions/workflows/nightly.yml/badge.svg" alt="nightly"></a> |
|
21 | 24 | <br/> |
22 | 25 | <br/> |
23 | 26 |
|
24 | | -For more details, see [reference](docs/REFERENCE.md). |
| 27 | +## The problem |
| 28 | + |
| 29 | +Tree-sitter solved parsing. It powers syntax highlighting and code navigation at GitHub, drives the editing experience in Zed, Helix, and Neovim. It gives you a fast, accurate, incremental syntax tree for virtually any language. |
| 30 | + |
| 31 | +The hard problem now is what comes _after_ parsing, extraction of meaning from the tree: |
| 32 | + |
| 33 | +```typescript |
| 34 | +function extractFunction(node: SyntaxNode): FunctionInfo | null { |
| 35 | + if (node.type !== "function_declaration") { |
| 36 | + return null; |
| 37 | + } |
| 38 | + const name = node.childForFieldName("name"); |
| 39 | + const body = node.childForFieldName("body"); |
| 40 | + if (!name || !body) { |
| 41 | + return null; |
| 42 | + } |
| 43 | + return { |
| 44 | + name: name.text, |
| 45 | + body, |
| 46 | + }; |
| 47 | +} |
| 48 | +``` |
| 49 | + |
| 50 | +Every extraction requires a new function, each one a potential source of bugs that won't surface until production. |
| 51 | + |
| 52 | +## The solution |
| 53 | + |
| 54 | +Plotnik extends Tree-sitter queries with type annotations: |
| 55 | + |
| 56 | +```clojure |
| 57 | +(function_declaration |
| 58 | + name: (identifier) @name :: string |
| 59 | + body: (statement_block) @body |
| 60 | +) @func :: FunctionInfo |
| 61 | +``` |
| 62 | + |
| 63 | +The query describes structure, and Plotnik infers the output type: |
| 64 | + |
| 65 | +```typescript |
| 66 | +interface FunctionInfo { |
| 67 | + name: string; |
| 68 | + body: SyntaxNode; |
| 69 | +} |
| 70 | +``` |
| 71 | + |
| 72 | +This structure is guaranteed by the query engine. No defensive programming needed. |
| 73 | + |
| 74 | +## But what about Tree-sitter queries? |
| 75 | + |
| 76 | +Tree-sitter already has queries: |
| 77 | + |
| 78 | +```scheme |
| 79 | +(function_declaration |
| 80 | + name: (identifier) @name |
| 81 | + body: (statement_block) @body) |
| 82 | +``` |
| 83 | + |
| 84 | +The result is a flat capture list: |
| 85 | + |
| 86 | +```typescript |
| 87 | +query.matches(tree.rootNode); |
| 88 | +// → [{ captures: [{ name: "name", node }, { name: "body", node }] }, ...] |
| 89 | +``` |
| 90 | + |
| 91 | +The assembly layer is up to you: |
| 92 | + |
| 93 | +```typescript |
| 94 | +const name = match.captures.find((c) => c.name === "name")?.node; |
| 95 | +const body = match.captures.find((c) => c.name === "body")?.node; |
| 96 | +if (!name || !body) throw new Error("Missing capture"); |
| 97 | +return { name: name.text, body }; |
| 98 | +``` |
| 99 | + |
| 100 | +This means string-based lookup, null checks, and manual type definitions kept in sync by convention. |
| 101 | + |
| 102 | +Tree-sitter queries are designed for matching. Plotnik adds the typing layer: the query _is_ the type definition. |
| 103 | + |
| 104 | +## Why Plotnik? |
| 105 | + |
| 106 | +| Hand-written extraction | Plotnik | |
| 107 | +| -------------------------- | ---------------------------- | |
| 108 | +| Manual navigation | Declarative pattern matching | |
| 109 | +| Runtime type errors | Compile-time type inference | |
| 110 | +| Repetitive extraction code | Single-query extraction | |
| 111 | +| Ad-hoc data structures | Generated structs/interfaces | |
| 112 | + |
| 113 | +Plotnik extends Tree-sitter's query syntax with: |
| 114 | + |
| 115 | +- **Named expressions** for composition and reuse |
| 116 | +- **Recursion** for arbitrarily nested structures |
| 117 | +- **Type annotations** for precise output shapes |
| 118 | +- **Tagged alternations** for discriminated unions |
| 119 | + |
| 120 | +## Use cases |
| 121 | + |
| 122 | +- **Scripting:** Count patterns, extract metrics, audit dependencies |
| 123 | +- **Custom linters:** Encode your business rules and architecture constraints |
| 124 | +- **LLM Pipelines:** Extract signatures and types as structured data for RAG |
| 125 | +- **Code Intelligence:** Outline views, navigation, symbol extraction across grammars |
| 126 | + |
| 127 | +## Language design |
| 128 | + |
| 129 | +Plotnik builds on Tree-sitter's query syntax, extending it with the features needed for typed extraction: |
| 130 | + |
| 131 | +```clojure |
| 132 | +Statement = [ |
| 133 | + Assign: (assignment_expression |
| 134 | + left: (identifier) @target :: string |
| 135 | + right: (Expression) @value) |
| 136 | + Call: (call_expression |
| 137 | + function: (identifier) @func :: string |
| 138 | + arguments: (arguments (Expression)* @args)) |
| 139 | +] |
| 140 | + |
| 141 | +Expression = [ |
| 142 | + Ident: (identifier) @name :: string |
| 143 | + Num: (number) @value :: string |
| 144 | +] |
| 145 | + |
| 146 | +TopDefinitions = (program (Statement)+ @statements) |
| 147 | +``` |
| 148 | + |
| 149 | +This produces: |
| 150 | + |
| 151 | +```typescript |
| 152 | +type Statement = |
| 153 | + | { tag: "Assign"; target: string; value: Expression } |
| 154 | + | { tag: "Call"; func: string; args: Expression[] }; |
| 155 | + |
| 156 | +type Expression = |
| 157 | + | { tag: "Ident"; name: string } |
| 158 | + | { tag: "Num"; value: string }; |
| 159 | + |
| 160 | +type TopDefinitions = { |
| 161 | + statements: [Statement, ...Statement[]]; |
| 162 | +}; |
| 163 | +``` |
| 164 | + |
| 165 | +Then process the results: |
| 166 | + |
| 167 | +```typescript |
| 168 | +for (const stmt of result.statements) { |
| 169 | + switch (stmt.tag) { |
| 170 | + case "Assign": |
| 171 | + console.log(`Assignment to ${stmt.target}`); |
| 172 | + break; |
| 173 | + case "Call": |
| 174 | + console.log(`Call to ${stmt.func} with ${stmt.args.length} args`); |
| 175 | + break; |
| 176 | + } |
| 177 | +} |
| 178 | +``` |
| 179 | + |
| 180 | +For the detailed specification, see the [Language Reference](docs/REFERENCE.md). |
| 181 | + |
| 182 | +## Supported Languages |
| 183 | + |
| 184 | +Plotnik ships with schema support for 26 languages: |
| 185 | + |
| 186 | +> Bash, C, C++, C#, CSS, Elixir, Go, Haskell, HCL, HTML, Java, JavaScript, JSON, Kotlin, Lua, Nix, PHP, Python, Ruby, Rust, Scala, Solidity, Swift, TypeScript, TSX, YAML |
| 187 | +
|
| 188 | +Additional languages and dynamic loading are planned. |
| 189 | + |
| 190 | +## Roadmap |
| 191 | + |
| 192 | +### Ignition: the parser ✓ |
| 193 | + |
| 194 | +The foundation is complete: a resilient parser that recovers from errors and keeps going. |
| 195 | + |
| 196 | +- [x] Resilient lexer ([`logos`](https://github.com/maciejhirsz/logos)) and parser ([`rowan`](https://github.com/rust-analyzer/rowan)) with error recovery |
| 197 | +- [x] Typed AST layer over concrete syntax tree |
| 198 | +- [x] Rich diagnostics with spans, colored output, related locations, and suggested fixes |
| 199 | +- [x] Name resolution with two-pass symbol table construction |
| 200 | +- [x] Recursion validation via Tarjan SCC analysis (escape path detection) |
| 201 | +- [x] Shape cardinality inference (One vs Many) for field constraint validation |
| 202 | +- [x] Alternation validation (mixed tagged/untagged detection) |
| 203 | +- [ ] Semantic validation: capture naming rules, type annotation consistency |
| 204 | + |
| 205 | +### Liftoff: type inference |
| 206 | + |
| 207 | +The schema infrastructure is built. Type inference is next. |
| 208 | + |
| 209 | +- [x] `node-types.json` parsing and schema representation (`plotnik-core`) |
| 210 | +- [x] Proc macro for compile-time schema embedding (`plotnik-macros`) |
| 211 | +- [x] 26 languages bundled with static node type tables (`plotnik-langs`) |
| 212 | +- [ ] Query validation against language schemas (node types, fields, children) |
| 213 | +- [ ] Full type inference: query → output shape → generated structs |
| 214 | + |
| 215 | +### Acceleration: query engine |
25 | 216 |
|
26 | | -## Roadmap 🚀 |
| 217 | +- [ ] Thompson NFA construction for query IR |
| 218 | +- [ ] Runtime execution with backtracking cursor walker |
| 219 | +- [ ] Advanced validation powered by `grammar.json` (production rules, precedence) |
| 220 | +- [ ] Match result API with typed accessors |
27 | 221 |
|
28 | | -**Ignition** _(the parser)_ |
| 222 | +### Orbit: developer experience |
29 | 223 |
|
30 | | -- [x] Resuilient query language parser |
31 | | -- [x] Basic error messages |
32 | | -- [x] Name resolution |
33 | | -- [x] Recursion validator |
34 | | -- [ ] Semantic analyzer |
| 224 | +The CLI foundation exists. The full developer experience is ahead. |
35 | 225 |
|
36 | | -**Liftoff** _(type inference)_ |
| 226 | +- [x] CLI framework with `debug`, `docs`, `langs` commands |
| 227 | +- [x] Query inspection: AST dump, symbol table, cardinalities, spans |
| 228 | +- [x] Source inspection: Tree-sitter parse tree visualization |
| 229 | +- [ ] CLI distribution: Homebrew, cargo-binstall, npm wrapper |
| 230 | +- [ ] Compiled queries via Rust proc macros (zero-cost: query → native code) |
| 231 | +- [ ] Language bindings: TypeScript (WASM), Python, Ruby |
| 232 | +- [ ] LSP server: diagnostics, completions, hover, go-to-definition |
| 233 | +- [ ] Editor extensions: VS Code, Zed, Neovim |
37 | 234 |
|
38 | | -- [ ] Basic validation against `node-types.json` schemas |
39 | | -- [ ] Type inference of the query result shape |
| 235 | +## Acknowledgments |
40 | 236 |
|
41 | | -**Acceleration** _(query engine)_ |
| 237 | +[Max Brunsfeld](https://github.com/maxbrunsfeld) created Tree-sitter; [Amaan Qureshi](https://github.com/amaanq) and other contributors maintain the parser ecosystem that makes this project possible. |
42 | 238 |
|
43 | | -- [ ] Thompson construction of query IR |
44 | | -- [ ] Runtime execution engine |
45 | | -- [ ] Advanced validation powered by `grammar.json` files |
| 239 | +## License |
46 | 240 |
|
47 | | -**Orbit** _(the tooling)_ |
| 241 | +This project is licensed under the [MIT license]. |
48 | 242 |
|
49 | | -- [ ] The CLI app available via installers |
50 | | -- [ ] Compiled queries (using procedural macros) |
51 | | -- [ ] Enhanced error messages |
52 | | -- [ ] Bindings (TypeScript, Python, Ruby) |
53 | | -- [ ] LSP server |
54 | | -- [ ] Editor support (VSCode, Zed, Neovim) |
| 243 | +[MIT license]: LICENSE.md |
0 commit comments