Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
315 changes: 299 additions & 16 deletions gems/decomplex/lib/decomplex/syntax.rb

Large diffs are not rendered by default.

48 changes: 46 additions & 2 deletions gems/decomplex/test/syntax_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -245,14 +245,16 @@ def test_tree_sitter_python_adapter_extracts_hidden_assignment_and_call_facts

with_file(<<~PY, ".py") do |path|
class Worker:
def __init__(self, items):
self.items = items
def __init__(self, items: list[str]):
self.items: list[str] = items

def call(self):
self.items.append("x")
PY
doc = Decomplex::Syntax.parse(path, parser: "tree_sitter", language: :python)

assert_includes doc.state_declarations.map { |decl| [decl.owner, decl.field, decl.type] },
["Worker", "items", "list[str]"]
assert_includes doc.state_writes.map { |write| [write.receiver, write.field] }, ["self", "items"]
assert_includes doc.state_param_origins.map { |origin| [origin.owner, origin.function, origin.receiver, origin.field, origin.param] },
["Worker", "__init__", "self", "items", "items"]
Expand All @@ -261,6 +263,48 @@ def call(self):
end
end

def test_tree_sitter_lua_adapter_extracts_structural_facts_when_grammar_is_available
grammar = ENV["DECOMPLEX_TS_LUA_PATH"]
skip "set DECOMPLEX_TS_LUA_PATH to run Lua structural facts test" unless grammar && File.file?(grammar)

with_file(<<~LUA, ".lua") do |path|
local Worker = {}

function Worker.new(items)
local self = { items = items, count = 0 }
return self
end

function Worker:call(value)
self.items[#self.items + 1] = value
self.client:fetch(value)
return { value = value, ok = true }
end

Worker.run = function(self, job)
self.status = job
self:call(job)
end
LUA
doc = Decomplex::Syntax.parse(path, parser: "tree_sitter", language: :lua)

assert_equal :lua, doc.language
assert_includes doc.function_defs.map { |fn| [fn.owner, fn.name] }, ["Worker", "new"]
assert_includes doc.function_defs.map { |fn| [fn.owner, fn.name] }, ["Worker", "call"]
assert_includes doc.function_defs.map { |fn| [fn.owner, fn.name] }, ["Worker", "run"]
assert_includes doc.state_writes.map { |write| [write.owner, write.function, write.receiver, write.field] },
["Worker", "call", "self", "items"]
assert_includes doc.state_writes.map { |write| [write.owner, write.function, write.receiver, write.field] },
["Worker", "run", "self", "status"]
assert_includes doc.state_param_origins.map { |origin| [origin.owner, origin.function, origin.receiver, origin.field, origin.param] },
["Worker", "call", "self", "items", "value"]
assert_includes doc.state_param_origins.map { |origin| [origin.owner, origin.function, origin.receiver, origin.field, origin.param] },
["Worker", "run", "self", "status", "job"]
assert_includes doc.call_sites.map { |call| [call.owner, call.function, call.receiver, call.message] },
["Worker", "call", "self.client", "fetch"]
end
end

def test_tree_sitter_zig_adapter_extracts_structural_facts_when_grammar_is_available
grammar = ENV["DECOMPLEX_TS_ZIG_PATH"]
skip "set DECOMPLEX_TS_ZIG_PATH to run Zig structural facts test" unless grammar && File.file?(grammar)
Expand Down
77 changes: 77 additions & 0 deletions gems/lineage/docs/agents/cross-lang-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Cross-Language Support Validation

This document tracks the first practical validation pass for building Lineage databases from non-CLEAR repositories and ingesting analyzer, lint, coverage, hazard, and runtime evidence.

`gems/lineage/docs/agents/plugins.md` describes the plugin architecture and broad language targets. It does not prescribe exact repositories, so this pass used representative active OSS projects with enough real code to exercise the adapters.

## Goal

Create one `lineage.db` per target repository, ingest the best available evidence, start a Lineage UI server for each on `0.0.0.0`, and spot check that the UI can review the project with cross-language data.

## Validation Matrix

| Language | Repository | Local Clone | Database | UI Port | Status |
| --- | --- | --- | --- | --- | --- |
| Python | `https://github.com/Textualize/rich` | `/tmp/lineage-rich` | `/tmp/lineage-rich/lineage.db` | `8081` | Complete |
| TypeScript | `https://github.com/colinhacks/zod` | `/tmp/lineage-zod` | `/tmp/lineage-zod/lineage.db` | `8082` | Complete |
| Go | `https://github.com/junegunn/fzf` | `/tmp/lineage-fzf` | `/tmp/lineage-fzf/lineage.db` | `8083` | Complete |
| Lua | `https://github.com/luarocks/luarocks` | `/tmp/lineage-lua-luarocks` | `/tmp/lineage-lua-luarocks/lineage.db` | `8084` | Complete, no coverage |
| C | `https://github.com/libuv/libuv` | `/tmp/lineage-c-libuv` | `/tmp/lineage-c-libuv/lineage.db` | `8085` | Complete, no coverage |
| C++ | `https://github.com/fmtlib/fmt` | `/tmp/lineage-cpp-fmt` | `/tmp/lineage-cpp-fmt/lineage.db` | `8086` | Complete, no coverage |
| C# | `https://github.com/serilog/serilog` | `/tmp/lineage-csharp-serilog` | `/tmp/lineage-csharp-serilog/lineage.db` | `8087` | Complete, no coverage |
| Java | `https://github.com/google/gson` | `/tmp/lineage-java-gson` | `/tmp/lineage-java-gson/lineage.db` | `8088` | Complete, no coverage |
| Swift | `https://github.com/apple/swift-argument-parser` | `/tmp/lineage-swift-argument-parser` | `/tmp/lineage-swift-argument-parser/lineage.db` | `8089` | Complete, no coverage |
| Kotlin | `https://github.com/square/okio` | `/tmp/lineage-kotlin-okio` | `/tmp/lineage-kotlin-okio/lineage.db` | `8090` | Complete, no coverage |

All UI servers were restarted with detached sessions and smoke checked through `curl` on ports `8081` through `8090`.

## Evidence Targets

Each repository received as much of this evidence as the current tools could produce without repository-specific hacks:

- `lineage build`: Git history, logical units, churn, and ownership.
- Decomplex SARIF: structural complexity findings.
- SlopCop SARIF: coverage gaps and constraint findings.
- Boobytrap SARIF: bug-risk findings derived from churn, complexity, and coverage.
- Nil-kill SARIF: optionality, union, hidden enum, and primitive pressure findings where the language adapter supports them.
- Espalier SARIF: architectural pressure findings where the language adapter supports them.
- Lint SARIF: native lint output converted or emitted as SARIF where the repository already had a reasonable local toolchain.
- Coverage: native coverage output ingested through Lineage-supported formats when the toolchain was available.
- Runtime traces: Sentry-style stack trace ingestion for Python smoke coverage.
- Hazards: Go concurrency hazards for `fzf`.

## Current Counts

| Language | Logical Units | SARIF Artifacts | SARIF Findings | Quality Events | Coverage Line Events | Hazards | Runtime Events |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| Python / Rich | 2,152 | 6 | 6,270 | 1,022 | 7,792 | 0 | 1 |
| TypeScript / Zod | 2,437 | 6 | 8,112 | 1,365 | 8,908 | 0 | 0 |
| Go / fzf | 1,421 | 7 | 13,316 | 608 | 16,422 | 312 | 0 |
| Lua / LuaRocks | 1,043 | 6 | 5,056 | 0 | 0 | 0 | 0 |
| C / libuv | 3,920 | 6 | 21,895 | 0 | 0 | 0 | 0 |
| C++ / fmt | 6,014 | 6 | 2,982 | 0 | 0 | 0 | 0 |
| C# / Serilog | 615 | 6 | 1,281 | 0 | 0 | 0 | 0 |
| Java / Gson | 4,921 | 6 | 2,624 | 0 | 0 | 0 | 0 |
| Swift / Argument Parser | 1,938 | 6 | 835 | 0 | 0 | 0 | 0 |
| Kotlin / Okio | 3,357 | 6 | 1,900 | 0 | 0 | 0 | 0 |

## Adapter Work Completed

- Replaced generic language placeholders with explicit Decomplex lexicons for Lua, C, C++, C#, Java, Swift, and Kotlin.
- Added real Tree-sitter syntax support and tests for C, C++, C#, Java, Swift, and Kotlin structural facts.
- Added Swift member access and `switch_entry` support.
- Added Kotlin `when_expression` and `when_entry` support.
- Added grammar candidate support for packages that ship `tree_sitter_*_binding.node`, needed by `tree-sitter-kotlin`.
- Added Go concurrency hazard detection through SlopCop/Lineage.
- Fixed Lineage source extraction and coverage ingestion issues found during TypeScript/Go validation.
- Fixed Nil-kill static-only normalization so non-Ruby languages do not accidentally depend on stale runtime traces.
- Replaced Lineage regex-first logical-unit extraction for Ruby, Python, JavaScript/TypeScript, Go, Rust, Zig, C/C++, and C# with Tree-sitter-backed extraction. The regex heuristic path is now only for secondary experimental languages.

## Environment Gaps

- Lua coverage/lint was limited by missing local LuaRocks/Busted tooling.
- C and C++ coverage was not generated in this pass; static analyzer, syntax lint, and SARIF ingestion were validated.
- C#, Java, Swift, and Kotlin native build/lint/coverage were limited by missing `dotnet`, Java, Swift, and Kotlin toolchains in this environment.
- TypeScript and Go runtime tracing are still out of scope for this pass.

These are environment/toolchain gaps, not Lineage ingestion blockers. The DBs and UIs exist for all requested languages.
Loading
Loading