scanner.py → module_mapper.py → single_file.py → cross_file.py → cli.py → views.py
↑ ↑
symbol_table.py source_resolution.py
scope.py classification.py
sources.py library_usage.py
ir.py decorator_provenance.py
types.py call_graph.py
diagnostics.py
| Layer | Module | Input | Output |
|---|---|---|---|
| Scan | scanner.py |
Project root path | List of .py/.pyi files (excluding venv) |
| Module map | module_mapper.py |
File list | File path ↔ dotted module name |
| Parse + single-file | single_file.py |
Source code | SymbolTable, api_calls (dict list), call_site_objects, symbol_refs |
| Cross-file | cross_file.py |
Per-file tracers | ProjectAnalysis (global symbols, chains, api calls, provenance, library usage) |
| Views | views.py |
ProjectAnalysis |
Dict/list for JSON serialization |
| CLI | cli.py |
Project root + args | Human-readable text or JSON |
scanner.pyproduces a list of absolute file paths.module_mapper.pymaps each file to a dotted module name (e.g.,pkg/sub.py→pkg.sub).
SingleFileAnalyzer is an ast.NodeVisitor that produces:
| Output | Type | Purpose |
|---|---|---|
symbols.direct |
dict[str, object] |
Name → source mapping (module-level or v1 all-level) |
symbols.chains |
dict[str, list] |
Name → resolution chain |
api_calls |
list[dict] |
Legacy call records (keyed by api, top, base, chain, ...) |
call_site_objects |
list[CallSite] |
New typed call-site IR (parallel to api_calls) |
symbol_refs |
list[SymbolRef] |
Symbol references for provenance |
return_sources |
dict[str, object] |
Function name → return expression source (SourceSet for multi-return) |
call_sites |
dict[str, list[dict]] |
Function name → call-site parameter sources (for ad-hoc param tracing) |
function_params |
dict[str, list[str]] |
Function name → parameter name list |
defined_functions |
set[str] |
Names of locally defined functions |
import_from_symbols |
dict[str, str] |
Import alias → fully qualified name |
instance_attrs |
dict[(class, attr), source] |
(ClassName, self.attr) → source (v2 constructor propagation) |
ProjectAnalyzer orchestrates:
- Parse: Iterates files, creates
SingleFileAnalyzerper file. - Resolve:
resolve_cross_file_symbols()traces each symbol through imports/assignments across modules, populatedglobal_symbolsandsymbol_chains. - SourceSet convergence:
SourceSetResolverinsource_resolution.pyresolves multi-source bindings with origin-aware rules. - Classify:
ClassificationPipelineinclassification.pyassigns reason, confidence, and alternatives via priority-ordered rules. - Provenance:
_build_symbol_provenance()traces eachSymbolRefinto aSymbolProvenance. - Library Usage:
build_library_usage()inlibrary_usage.pyaggregates calls and provenance bytop_library. - Decorator evidence:
build_decorator_index()/lookup_decorated_by()indecorator_provenance.pypopulateApiCall.decorated_by. - Call graph:
call_graph.pyholdsFunctionSummary/ClassSummary/CallEdgefacts.
Output: ProjectAnalysis
| Field | Purpose |
|---|---|
files |
Per-file FileAnalysis (symbols, chains, api_calls, provenance) |
all_api_calls |
Flat list of every ApiCall across all files |
all_symbol_provenance |
Flat list of every SymbolProvenance |
library_usage |
dict[library → LibraryUsage] with counts, files, imports |
diagnostics |
Parse/read errors |
stats |
Module counts, scope model |
| Behavior | v1 (legacy) | v2 (default, lexical scopes) |
|---|---|---|
| Function params | Written to module symbols.direct |
Only in function scope |
| Local variables | Written to module symbols.direct |
Only in function scope |
| Comprehension vars | Written to module symbols.direct |
Only in comprehension scope |
| CallResult callee | Raw name (e.g., "nx") |
Scope-resolved name (e.g., "networkx") |
get_base (non-call-lookup) |
Returns name | Returns scope binding source |
get_base (call_lookup=True) |
Returns name | Returns name (kept raw for CallResult) |
v2 is the default as of 1.0.4. v2 fixes scope pollution and no longer produces dataclass repr or structured source display strings as library keys. v1 is available via --scope-model v1 for legacy comparison.
Compatibility surfaces still present in the codebase:
| Surface | Current Status | Notes |
|---|---|---|
SymbolTable.direct |
Still used as module-level fallback | Scope model writes to Scope.bindings; direct is compat bridge |
api_calls (dict list) |
Still the primary single-file output | Typed CallSite collected in parallel |
return_sources (SourceSet) |
Multi-return tracking via SourceSet + CallGraph |
Current default |
_base_top_source() |
Wraps ClassificationPipeline.classify() |
Current default |
| Instance attr propagation | Constructor arg → self.attr tracking | Full class-aware receiver resolution is future work |
--json (dataclass dump) |
Replaced by full provenance schema | 1.0.4+ default |
-
_resolve_structured_source()dispatchescontainer_item,instance_method,container_iter,call_result.SourceSetconvergence is handled bysource_resolution.py::SourceSetResolver. The non-SourceSet branches still live inline here. -
trace_symbol()is the trace orchestration hotspot, mixing cross-module symbol lookup with wildcard import resolution and parameter back-tracing. Call-graph facts (call_graph.py) feed into it for return-object and arg-source propagation. -
_build_symbol_provenance()passes_direct_source=ref.sourcefor all SymbolRefs, enabling per-assignment provenance even when module-level reassignment overwrites the symbol table.