The core processing engine for CodeCartographer. Handles parsing, graph construction, and reference resolution.
cc-core/
├── model/ # Data structures (nodes, edges, graphs)
├── parser/ # Tree-sitter code extraction
├── repo/ # Repository scanning and cloning
└── resolver/ # Reference resolution (imports, calls, types)
Stable identifier for graph nodes using relative paths and block locations.
NodeId::directory("src") // Directory node
NodeId::file("src/main.rs") // File node
NodeId::code_block("src/main.rs", "main", 1) // "src/main.rs::main@1"Three node variants representing the code hierarchy:
Directory
id,name,path,children
File
id,name,path,language,children
CodeBlock
id,name,kind,span,signature,visibility,parent,children
Types of code blocks extracted:
| Kind | Description |
|---|---|
| Function | Function/method definition |
| Class | Class definition |
| Struct | Struct definition |
| Enum | Enum definition |
| Trait | Trait definition |
| Interface | Interface definition |
| Impl | Impl block |
| Module | Module definition |
| Constant | Const/static item |
| TypeAlias | Type alias |
Relationship types between nodes:
| Kind | Color | Description |
|---|---|---|
| Import | #6366f1 | Module import |
| FunctionCall | #22c55e | Function invocation |
| MethodCall | #14b8a6 | Method invocation |
| TypeReference | #f59e0b | Type usage |
| Inheritance | #ef4444 | Class inheritance |
| TraitImpl | #a855f7 | Trait implementation |
| VariableUsage | #64748b | Variable reference |
Complete graph representation:
pub struct CodeGraph {
pub nodes: HashMap<NodeId, CodeNode>,
pub edges: Vec<CodeEdge>,
pub root: NodeId,
pub forward_adj: HashMap<NodeId, Vec<(NodeId, usize)>>, // Skip serialization
pub reverse_adj: HashMap<NodeId, Vec<(NodeId, usize)>>, // Skip serialization
pub edge_dedup: HashMap<(NodeId, NodeId, EdgeKind), usize>, // Skip serialization
}The edge_dedup field provides O(1) edge deduplication. When add_edge() is called with
an edge that matches an existing (source, target, kind) triple, the weight is incremented
instead of creating a duplicate entry.
Methods: add_node(), add_edge(), node(), node_count(), edge_count(), rebuild_adjacency()
Filtered view for rendering:
pub struct SubGraph {
pub nodes: Vec<CodeNode>,
pub edges: Vec<CodeEdge>,
pub aggregated_edges: Vec<AggregatedEdge>,
}Main entry point for code analysis:
Extractor::extract_file(
file_path: &str,
source: &str,
language: &Language,
) -> (Vec<CodeNode>, Vec<RawReference>)Uses Tree-sitter to parse source and extract:
- Code block nodes (functions, classes, etc.)
- Raw references (imports, calls, type uses)
Python
function_definition→ Functionclass_definition→ Class- Private if name starts with
_
TypeScript/JavaScript
function_declaration,arrow_function→ Functionclass_declaration→ Classinterface_declaration→ Interfacetype_alias_declaration→ TypeAliasenum_declaration→ Enum
Rust
function_item→ Functionstruct_item→ Structenum_item→ Enumtrait_item→ Traitimpl_item→ Implmod_item→ Module- Visibility from
visibility_modifierpresence
Unevaluated reference found during parsing:
pub struct RawReference {
pub from_node: NodeId,
pub kind: RawRefKind,
pub name: String,
pub span: Span,
}Reference kinds extracted during parsing:
| Kind | Python | TypeScript/JS | Rust |
|---|---|---|---|
| Import | import, from...import |
import...from |
use declarations |
| FunctionCall | call with identifier |
call_expression with identifier |
call_expression with identifier or scoped_identifier |
| MethodCall | call with attribute (e.g. obj.method()) |
call_expression with member_expression |
call_expression with field_expression |
| TypeReference | type annotation nodes (non-builtin) |
type_identifier nodes (not predefined_type) |
type_identifier nodes (not primitive) |
| Inheritance | argument_list in class_definition |
extends_clause in class |
N/A |
| TraitImpl | N/A | N/A | impl_item with trait field |
| VariableUsage | Not yet extracted (requires name resolution) | Not yet extracted | Not yet extracted |
RepoScanner::scan(root: &Path) -> CodeGraphWalks directory tree respecting .gitignore:
- Creates Directory nodes for folders
- Creates File nodes with detected language
- Builds parent-child relationships
clone_repo(url: &str, target_dir: &Path) -> PathBufShallow clones GitHub/GitLab repositories (--depth 1).
Maps symbol names to node IDs:
SymbolTable::build_from_graph(graph) -> SymbolTable
SymbolTable::resolve_references(refs) -> Vec<CodeEdge>Stores both simple names (foo) and qualified names (path/file.rs::foo).
resolve_references applies name normalization before symbol lookup:
- FunctionCall/MethodCall: strips method receiver (
foo.bar()->bar) and module path (module::func->func) - TypeReference/Inheritance/TraitImpl: strips generic parameters (
Foo<Bar>->Foo) and path prefix (std::vec::Vec->Vec) - Falls back to the original qualified name for type references if the simplified name doesn't match
Resolves import statements to file-level edges:
- Handles relative imports (
./,../) - Python dotted imports (
foo.bar.baz) - Rust crate imports (
crate::module::item) - Tries multiple extensions (
.ts,.tsx,.js,.py,.rs) - Creates file-to-file Import edges (separate from the symbol-level edges from
SymbolTable)
// 1. Scan repository
let graph = RepoScanner::scan(&path)?;
// 2. Parse each file
for file in graph.files() {
let source = std::fs::read_to_string(&file.path)?;
let (blocks, refs) = Extractor::extract_file(&file.path, &source, &file.language);
// Add blocks to graph
}
// 3. Build symbol table
let symbols = SymbolTable::build_from_graph(&graph);
// 4. Resolve symbol references (with name normalization)
let edges = symbols.resolve_references(&all_refs);
// Add edges to graph
// 5. Resolve file-level import edges
let import_edges = ImportResolver::resolve(&graph, &all_refs);
// Add import edges to graph
// 6. Filter for rendering
let subgraph = SubGraph::from_graph(&graph, &visible_ids, &edge_kinds);| Crate | Purpose |
|---|---|
| tree-sitter | AST parsing framework |
| tree-sitter-python | Python parser |
| tree-sitter-typescript | TypeScript parser |
| tree-sitter-javascript | JavaScript parser |
| tree-sitter-rust | Rust parser |
| serde | Serialization |
| rayon | Parallel processing |
| ignore | Gitignore-aware walking |
| anyhow/thiserror | Error handling |