Skip to content

Commit 97397d6

Browse files
authored
feat: Rewrite README (#56)
1 parent 4388967 commit 97397d6

1 file changed

Lines changed: 213 additions & 24 deletions

File tree

README.md

Lines changed: 213 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,15 @@
55
<img width="400" alt="The logo: a curled wood shaving on a workbench" src="https://github.com/user-attachments/assets/1fcef0a9-20f8-4500-960b-f31db3e9fd94" />
66
</p>
77

8-
<h1><p align="center">Plotnik</p></h1>
8+
<h1><p align="center">Plotnik</p></h1>
99

1010
<p align="center">
11-
Typed query language for <a href="https://tree-sitter.github.io/">tree-sitter</a>
11+
Typed query language for <a href="https://tree-sitter.github.io/">Tree-sitter</a>. Your queries return typed structs.<br/>
12+
Captures become fields, quantifiers become arrays, alternations become unions.
1213
</p>
1314

15+
<br/>
16+
1417
<p align="center">
1518
<a href="https://github.com/plotnik-lang/plotnik/actions/workflows/stable.yml"><img src="https://github.com/plotnik-lang/plotnik/actions/workflows/stable.yml/badge.svg" alt="stable"></a>
1619
<a href="https://github.com/plotnik-lang/plotnik/actions/workflows/nightly.yml"><img src="https://github.com/plotnik-lang/plotnik/actions/workflows/nightly.yml/badge.svg" alt="nightly"></a>
@@ -21,34 +24,220 @@
2124
<br/>
2225
<br/>
2326

24-
For more details, see [reference](docs/REFERENCE.md).
27+
## The problem
28+
29+
Tree-sitter solved parsing. It powers syntax highlighting and code navigation at GitHub, drives the editing experience in Zed, Helix, and Neovim. It gives you a fast, accurate, incremental syntax tree for virtually any language.
30+
31+
The hard problem now is what comes _after_ parsing, extraction of meaning from the tree:
32+
33+
```typescript
34+
function extractFunction(node: SyntaxNode): FunctionInfo | null {
35+
if (node.type !== "function_declaration") {
36+
return null;
37+
}
38+
const name = node.childForFieldName("name");
39+
const body = node.childForFieldName("body");
40+
if (!name || !body) {
41+
return null;
42+
}
43+
return {
44+
name: name.text,
45+
body,
46+
};
47+
}
48+
```
49+
50+
Every extraction requires a new function, each one a potential source of bugs that won't surface until production.
51+
52+
## The solution
53+
54+
Plotnik extends Tree-sitter queries with type annotations:
55+
56+
```clojure
57+
(function_declaration
58+
name: (identifier) @name :: string
59+
body: (statement_block) @body
60+
) @func :: FunctionInfo
61+
```
62+
63+
The query describes structure, and Plotnik infers the output type:
64+
65+
```typescript
66+
interface FunctionInfo {
67+
name: string;
68+
body: SyntaxNode;
69+
}
70+
```
71+
72+
This structure is guaranteed by the query engine. No defensive programming needed.
73+
74+
## But what about Tree-sitter queries?
75+
76+
Tree-sitter already has queries:
77+
78+
```scheme
79+
(function_declaration
80+
name: (identifier) @name
81+
body: (statement_block) @body)
82+
```
83+
84+
The result is a flat capture list:
85+
86+
```typescript
87+
query.matches(tree.rootNode);
88+
// → [{ captures: [{ name: "name", node }, { name: "body", node }] }, ...]
89+
```
90+
91+
The assembly layer is up to you:
92+
93+
```typescript
94+
const name = match.captures.find((c) => c.name === "name")?.node;
95+
const body = match.captures.find((c) => c.name === "body")?.node;
96+
if (!name || !body) throw new Error("Missing capture");
97+
return { name: name.text, body };
98+
```
99+
100+
This means string-based lookup, null checks, and manual type definitions kept in sync by convention.
101+
102+
Tree-sitter queries are designed for matching. Plotnik adds the typing layer: the query _is_ the type definition.
103+
104+
## Why Plotnik?
105+
106+
| Hand-written extraction | Plotnik |
107+
| -------------------------- | ---------------------------- |
108+
| Manual navigation | Declarative pattern matching |
109+
| Runtime type errors | Compile-time type inference |
110+
| Repetitive extraction code | Single-query extraction |
111+
| Ad-hoc data structures | Generated structs/interfaces |
112+
113+
Plotnik extends Tree-sitter's query syntax with:
114+
115+
- **Named expressions** for composition and reuse
116+
- **Recursion** for arbitrarily nested structures
117+
- **Type annotations** for precise output shapes
118+
- **Tagged alternations** for discriminated unions
119+
120+
## Use cases
121+
122+
- **Scripting:** Count patterns, extract metrics, audit dependencies
123+
- **Custom linters:** Encode your business rules and architecture constraints
124+
- **LLM Pipelines:** Extract signatures and types as structured data for RAG
125+
- **Code Intelligence:** Outline views, navigation, symbol extraction across grammars
126+
127+
## Language design
128+
129+
Plotnik builds on Tree-sitter's query syntax, extending it with the features needed for typed extraction:
130+
131+
```clojure
132+
Statement = [
133+
Assign: (assignment_expression
134+
left: (identifier) @target :: string
135+
right: (Expression) @value)
136+
Call: (call_expression
137+
function: (identifier) @func :: string
138+
arguments: (arguments (Expression)* @args))
139+
]
140+
141+
Expression = [
142+
Ident: (identifier) @name :: string
143+
Num: (number) @value :: string
144+
]
145+
146+
TopDefinitions = (program (Statement)+ @statements)
147+
```
148+
149+
This produces:
150+
151+
```typescript
152+
type Statement =
153+
| { tag: "Assign"; target: string; value: Expression }
154+
| { tag: "Call"; func: string; args: Expression[] };
155+
156+
type Expression =
157+
| { tag: "Ident"; name: string }
158+
| { tag: "Num"; value: string };
159+
160+
type TopDefinitions = {
161+
statements: [Statement, ...Statement[]];
162+
};
163+
```
164+
165+
Then process the results:
166+
167+
```typescript
168+
for (const stmt of result.statements) {
169+
switch (stmt.tag) {
170+
case "Assign":
171+
console.log(`Assignment to ${stmt.target}`);
172+
break;
173+
case "Call":
174+
console.log(`Call to ${stmt.func} with ${stmt.args.length} args`);
175+
break;
176+
}
177+
}
178+
```
179+
180+
For the detailed specification, see the [Language Reference](docs/REFERENCE.md).
181+
182+
## Supported Languages
183+
184+
Plotnik ships with schema support for 26 languages:
185+
186+
> Bash, C, C++, C#, CSS, Elixir, Go, Haskell, HCL, HTML, Java, JavaScript, JSON, Kotlin, Lua, Nix, PHP, Python, Ruby, Rust, Scala, Solidity, Swift, TypeScript, TSX, YAML
187+
188+
Additional languages and dynamic loading are planned.
189+
190+
## Roadmap
191+
192+
### Ignition: the parser ✓
193+
194+
The foundation is complete: a resilient parser that recovers from errors and keeps going.
195+
196+
- [x] Resilient lexer ([`logos`](https://github.com/maciejhirsz/logos)) and parser ([`rowan`](https://github.com/rust-analyzer/rowan)) with error recovery
197+
- [x] Typed AST layer over concrete syntax tree
198+
- [x] Rich diagnostics with spans, colored output, related locations, and suggested fixes
199+
- [x] Name resolution with two-pass symbol table construction
200+
- [x] Recursion validation via Tarjan SCC analysis (escape path detection)
201+
- [x] Shape cardinality inference (One vs Many) for field constraint validation
202+
- [x] Alternation validation (mixed tagged/untagged detection)
203+
- [ ] Semantic validation: capture naming rules, type annotation consistency
204+
205+
### Liftoff: type inference
206+
207+
The schema infrastructure is built. Type inference is next.
208+
209+
- [x] `node-types.json` parsing and schema representation (`plotnik-core`)
210+
- [x] Proc macro for compile-time schema embedding (`plotnik-macros`)
211+
- [x] 26 languages bundled with static node type tables (`plotnik-langs`)
212+
- [ ] Query validation against language schemas (node types, fields, children)
213+
- [ ] Full type inference: query → output shape → generated structs
214+
215+
### Acceleration: query engine
25216

26-
## Roadmap 🚀
217+
- [ ] Thompson NFA construction for query IR
218+
- [ ] Runtime execution with backtracking cursor walker
219+
- [ ] Advanced validation powered by `grammar.json` (production rules, precedence)
220+
- [ ] Match result API with typed accessors
27221

28-
**Ignition** _(the parser)_
222+
### Orbit: developer experience
29223

30-
- [x] Resuilient query language parser
31-
- [x] Basic error messages
32-
- [x] Name resolution
33-
- [x] Recursion validator
34-
- [ ] Semantic analyzer
224+
The CLI foundation exists. The full developer experience is ahead.
35225

36-
**Liftoff** _(type inference)_
226+
- [x] CLI framework with `debug`, `docs`, `langs` commands
227+
- [x] Query inspection: AST dump, symbol table, cardinalities, spans
228+
- [x] Source inspection: Tree-sitter parse tree visualization
229+
- [ ] CLI distribution: Homebrew, cargo-binstall, npm wrapper
230+
- [ ] Compiled queries via Rust proc macros (zero-cost: query → native code)
231+
- [ ] Language bindings: TypeScript (WASM), Python, Ruby
232+
- [ ] LSP server: diagnostics, completions, hover, go-to-definition
233+
- [ ] Editor extensions: VS Code, Zed, Neovim
37234

38-
- [ ] Basic validation against `node-types.json` schemas
39-
- [ ] Type inference of the query result shape
235+
## Acknowledgments
40236

41-
**Acceleration** _(query engine)_
237+
[Max Brunsfeld](https://github.com/maxbrunsfeld) created Tree-sitter; [Amaan Qureshi](https://github.com/amaanq) and other contributors maintain the parser ecosystem that makes this project possible.
42238

43-
- [ ] Thompson construction of query IR
44-
- [ ] Runtime execution engine
45-
- [ ] Advanced validation powered by `grammar.json` files
239+
## License
46240

47-
**Orbit** _(the tooling)_
241+
This project is licensed under the [MIT license].
48242

49-
- [ ] The CLI app available via installers
50-
- [ ] Compiled queries (using procedural macros)
51-
- [ ] Enhanced error messages
52-
- [ ] Bindings (TypeScript, Python, Ruby)
53-
- [ ] LSP server
54-
- [ ] Editor support (VSCode, Zed, Neovim)
243+
[MIT license]: LICENSE.md

0 commit comments

Comments
 (0)