Skip to content

Latest commit

 

History

History
258 lines (194 loc) · 7.24 KB

File metadata and controls

258 lines (194 loc) · 7.24 KB

cc-core Crate

The core processing engine for CodeCartographer. Handles parsing, graph construction, and reference resolution.

Module Structure

cc-core/
├── model/      # Data structures (nodes, edges, graphs)
├── parser/     # Tree-sitter code extraction
├── repo/       # Repository scanning and cloning
└── resolver/   # Reference resolution (imports, calls, types)

Model Module

NodeId

Stable identifier for graph nodes using relative paths and block locations.

NodeId::directory("src")           // Directory node
NodeId::file("src/main.rs")        // File node
NodeId::code_block("src/main.rs", "main", 1)  // "src/main.rs::main@1"

CodeNode (Enum)

Three node variants representing the code hierarchy:

Directory

  • id, name, path, children

File

  • id, name, path, language, children

CodeBlock

  • id, name, kind, span, signature, visibility, parent, children

BlockKind

Types of code blocks extracted:

Kind Description
Function Function/method definition
Class Class definition
Struct Struct definition
Enum Enum definition
Trait Trait definition
Interface Interface definition
Impl Impl block
Module Module definition
Constant Const/static item
TypeAlias Type alias

EdgeKind

Relationship types between nodes:

Kind Color Description
Import #6366f1 Module import
FunctionCall #22c55e Function invocation
MethodCall #14b8a6 Method invocation
TypeReference #f59e0b Type usage
Inheritance #ef4444 Class inheritance
TraitImpl #a855f7 Trait implementation
VariableUsage #64748b Variable reference

CodeGraph

Complete graph representation:

pub struct CodeGraph {
    pub nodes: HashMap<NodeId, CodeNode>,
    pub edges: Vec<CodeEdge>,
    pub root: NodeId,
    pub forward_adj: HashMap<NodeId, Vec<(NodeId, usize)>>,           // Skip serialization
    pub reverse_adj: HashMap<NodeId, Vec<(NodeId, usize)>>,           // Skip serialization
    pub edge_dedup: HashMap<(NodeId, NodeId, EdgeKind), usize>,       // Skip serialization
}

The edge_dedup field provides O(1) edge deduplication. When add_edge() is called with an edge that matches an existing (source, target, kind) triple, the weight is incremented instead of creating a duplicate entry.

Methods: add_node(), add_edge(), node(), node_count(), edge_count(), rebuild_adjacency()

SubGraph

Filtered view for rendering:

pub struct SubGraph {
    pub nodes: Vec<CodeNode>,
    pub edges: Vec<CodeEdge>,
    pub aggregated_edges: Vec<AggregatedEdge>,
}

Parser Module

Extractor

Main entry point for code analysis:

Extractor::extract_file(
    file_path: &str,
    source: &str,
    language: &Language,
) -> (Vec<CodeNode>, Vec<RawReference>)

Uses Tree-sitter to parse source and extract:

  • Code block nodes (functions, classes, etc.)
  • Raw references (imports, calls, type uses)

Language-Specific Classification

Python

  • function_definition → Function
  • class_definition → Class
  • Private if name starts with _

TypeScript/JavaScript

  • function_declaration, arrow_function → Function
  • class_declaration → Class
  • interface_declaration → Interface
  • type_alias_declaration → TypeAlias
  • enum_declaration → Enum

Rust

  • function_item → Function
  • struct_item → Struct
  • enum_item → Enum
  • trait_item → Trait
  • impl_item → Impl
  • mod_item → Module
  • Visibility from visibility_modifier presence

RawReference

Unevaluated reference found during parsing:

pub struct RawReference {
    pub from_node: NodeId,
    pub kind: RawRefKind,
    pub name: String,
    pub span: Span,
}

Reference kinds extracted during parsing:

Kind Python TypeScript/JS Rust
Import import, from...import import...from use declarations
FunctionCall call with identifier call_expression with identifier call_expression with identifier or scoped_identifier
MethodCall call with attribute (e.g. obj.method()) call_expression with member_expression call_expression with field_expression
TypeReference type annotation nodes (non-builtin) type_identifier nodes (not predefined_type) type_identifier nodes (not primitive)
Inheritance argument_list in class_definition extends_clause in class N/A
TraitImpl N/A N/A impl_item with trait field
VariableUsage Not yet extracted (requires name resolution) Not yet extracted Not yet extracted

Repository Module

RepoScanner

RepoScanner::scan(root: &Path) -> CodeGraph

Walks directory tree respecting .gitignore:

  • Creates Directory nodes for folders
  • Creates File nodes with detected language
  • Builds parent-child relationships

clone_repo

clone_repo(url: &str, target_dir: &Path) -> PathBuf

Shallow clones GitHub/GitLab repositories (--depth 1).

Resolver Module

SymbolTable

Maps symbol names to node IDs:

SymbolTable::build_from_graph(graph) -> SymbolTable
SymbolTable::resolve_references(refs) -> Vec<CodeEdge>

Stores both simple names (foo) and qualified names (path/file.rs::foo).

resolve_references applies name normalization before symbol lookup:

  • FunctionCall/MethodCall: strips method receiver (foo.bar() -> bar) and module path (module::func -> func)
  • TypeReference/Inheritance/TraitImpl: strips generic parameters (Foo<Bar> -> Foo) and path prefix (std::vec::Vec -> Vec)
  • Falls back to the original qualified name for type references if the simplified name doesn't match

ImportResolver

Resolves import statements to file-level edges:

  • Handles relative imports (./, ../)
  • Python dotted imports (foo.bar.baz)
  • Rust crate imports (crate::module::item)
  • Tries multiple extensions (.ts, .tsx, .js, .py, .rs)
  • Creates file-to-file Import edges (separate from the symbol-level edges from SymbolTable)

Typical Workflow

// 1. Scan repository
let graph = RepoScanner::scan(&path)?;

// 2. Parse each file
for file in graph.files() {
    let source = std::fs::read_to_string(&file.path)?;
    let (blocks, refs) = Extractor::extract_file(&file.path, &source, &file.language);
    // Add blocks to graph
}

// 3. Build symbol table
let symbols = SymbolTable::build_from_graph(&graph);

// 4. Resolve symbol references (with name normalization)
let edges = symbols.resolve_references(&all_refs);
// Add edges to graph

// 5. Resolve file-level import edges
let import_edges = ImportResolver::resolve(&graph, &all_refs);
// Add import edges to graph

// 6. Filter for rendering
let subgraph = SubGraph::from_graph(&graph, &visible_ids, &edge_kinds);

Dependencies

Crate Purpose
tree-sitter AST parsing framework
tree-sitter-python Python parser
tree-sitter-typescript TypeScript parser
tree-sitter-javascript JavaScript parser
tree-sitter-rust Rust parser
serde Serialization
rayon Parallel processing
ignore Gitignore-aware walking
anyhow/thiserror Error handling