Related Code Files:
code-intelligence-toolkit/data_flow_tracker_v2.py- Advanced data flow analysis with intelligence layercode-intelligence-toolkit/run_any_python_tool.sh- Wrapper script for executioncode-intelligence-toolkit/templates/- HTML visualization templates (optional)
Data Flow Tracker V2 is an advanced code intelligence and algorithm analysis tool that provides deep insights into how data flows through your code. Built on the foundation of the original data flow tracker, V2 adds five major capabilities that transform it from a simple dependency tracker into a comprehensive code intelligence platform.
- Impact Analysis - Shows where data escapes scope and causes effects
- Calculation Path Analysis - Extracts minimal critical path for values
- Type and State Tracking - Monitors type evolution and state changes
- Natural Language Explanations - Intuitive explanations of complex analysis
- Interactive HTML Visualization - Self-contained reports with vis.js network graphs
- Safety Analysis: Understand what changing a variable affects before refactoring
- Algorithm Understanding: Trace how complex calculations derive their values
- Debugging: Find the root cause of incorrect values or unexpected behavior
- Code Review: Generate comprehensive reports for technical documentation
- Refactoring: Assess risk levels before making changes to critical code
# Already included in code-intelligence-toolkit
./run_any_python_tool.sh data_flow_tracker_v2.py --help- Python: Full support with complete AST analysis
- Java: Requires
javalang>=0.13.0(included in requirements-core.txt)
- Enhanced HTML:
jinja2>=3.0.0for templated reports (requirements-optional.txt) - Graph Export: GraphViz for DOT format visualization exports
# Impact analysis - see what changing a variable affects
./run_any_python_tool.sh data_flow_tracker_v2.py --var config --show-impact --file app.py
# Calculation path - understand how a value is computed
./run_any_python_tool.sh data_flow_tracker_v2.py --var final_price --show-calculation-path --file pricing.py
# Type and state tracking - monitor variable evolution
./run_any_python_tool.sh data_flow_tracker_v2.py --var user_data --track-state --file process.py
# Interactive HTML report
./run_any_python_tool.sh data_flow_tracker_v2.py --var database_config --show-impact --output-html --file app.py--file FILE- Source file to analyze (Python or Java)
--var VAR- Specific variable to track--var-all- Generate reports for all variables (batch mode)
--show-impact- Impact analysis (where data escapes scope)--show-calculation-path- Minimal calculation path extraction--track-state- Type and state evolution tracking--direction {forward,backward}- Standard tracking direction (V1 compatibility)
--format {text,json,graph}- Output format--output-html- Generate interactive HTML visualization--html-out PATH- Write HTML to specific path/directory--explain- Add natural language explanations
--max-depth N- Maximum tracking depth (-1 for unlimited)--inter-procedural- Enable inter-procedural analysis--show-all- Show all variables and dependencies
Purpose: Understand what changing a variable will affect in your codebase.
What it analyzes:
- Return value dependencies
- Side effects (file writes, network calls, console output)
- State changes (global variables, class attributes)
- External function calls
- Risk assessment with severity levels
Example Output:
{
"returns": [
{"type": "return", "function": "calculate_total", "severity": "high"}
],
"side_effects": [
{"type": "file_write", "location": "save_results:45", "severity": "medium"}
],
"summary": {
"total_exit_points": 3,
"functions_affected": 2,
"high_risk_count": 1,
"recommendation": "⚠️ HIGH RISK: Review all high-severity exit points"
}
}When to use:
- Before refactoring critical variables
- Assessing the scope of a change
- Code review and safety analysis
Purpose: Extract the minimal critical path showing how a value is calculated.
What it analyzes:
- Essential calculation steps only (non-essential steps pruned)
- Input variable dependencies
- Function call chains with parameter mapping
- Mathematical operations and transformations
Example Output:
[
{
"variable": "base_price",
"operation": "assignment",
"inputs": ["product.price"],
"location": "pricing.py:23",
"essential": true
},
{
"variable": "tax_amount",
"operation": "calculation",
"inputs": ["base_price", "tax_rate"],
"location": "pricing.py:25",
"essential": true
},
{
"variable": "final_price",
"operation": "calculation",
"inputs": ["base_price", "tax_amount"],
"location": "pricing.py:27",
"essential": true
}
]When to use:
- Understanding complex algorithms
- Debugging calculation errors
- Optimizing performance bottlenecks
- Creating algorithm documentation
Purpose: Monitor how variable types and states evolve through the code.
What it analyzes:
- Type changes and confidence levels
- State modifications (assignments, mutations)
- Loop and conditional context
- Null safety warnings
Example Output:
{
"type_evolution": [
{
"location": "process.py:15",
"type": "str",
"confidence": 0.9,
"nullable": false,
"operation": "assignment"
},
{
"location": "process.py:22",
"type": "dict",
"confidence": 0.8,
"nullable": true,
"operation": "transformation"
}
],
"warnings": [
"Variable may be None - add null checks",
"Type changes detected: str → dict"
]
}When to use:
- Type safety analysis
- Finding potential null pointer exceptions
- Understanding data transformations
- Optimizing variable usage patterns
Purpose: Generate human-readable explanations of analysis results.
Example Output:
📊 Impact Analysis for 'database_config':
⚠️ Medium Risk Change: Modifying 'database_config' affects 7 places in your code.
It affects 2 return values, causes 1 external side effects (like file writes),
and modifies 2 global variables.
💡 Recommendation: Focus testing on the affected areas and verify all return
values and side effects.
🔍 Detailed Breakdown:
- The variable flows into 3 different functions
- It's used in 2 conditional statements that control program flow
- Changes will affect database connection logic in ConnectionManager
When to use:
- Code reviews and documentation
- Teaching and knowledge transfer
- Non-technical stakeholder reports
Purpose: Generate self-contained HTML reports with interactive network graphs.
Features:
- Network visualization using vis.js (no external dependencies)
- Expandable/collapsible sections
- Copy-to-clipboard code examples
- Responsive design for different screen sizes
- Dark/light theme support
Generated report includes:
- Executive summary with risk assessment
- Interactive dependency network graph
- Detailed analysis tables
- Code snippets with syntax highlighting
- Downloadable data in JSON format
When to use:
- Formal documentation and reports
- Sharing analysis with team members
- Interactive exploration of complex dependencies
Strengths:
- Complete AST parsing with full Python syntax support
- Advanced type inference using context and assignments
- Comprehensive function call analysis with parameter mapping
- Exception handling and control flow analysis
Supported constructs:
- Functions, classes, methods
- Lambda expressions and comprehensions
- Decorators and context managers
- Async/await patterns
- Import analysis and module dependencies
Strengths:
- Robust class and method structure analysis
- Package and import dependency tracking
- Method overloading and inheritance handling
Limitations:
- Basic type inference (inherits from underlying analyzer)
- Limited state tracking compared to Python
- No generic type parameter analysis
Supported constructs:
- Classes, methods, constructors
- Field and local variable tracking
- Static and instance method calls
- Basic inheritance patterns
# Generate reports for all variables in a module
./run_any_python_tool.sh data_flow_tracker_v2.py --var-all --show-impact --output-html --html-out reports/ --file critical_module.py
# This creates individual HTML reports for each variable# Combine multiple analysis modes for comprehensive coverage
./run_any_python_tool.sh data_flow_tracker_v2.py \
--var user_input \
--show-impact \
--show-calculation-path \
--track-state \
--explain \
--output-html \
--file security_module.py# Risk assessment for continuous integration
./run_any_python_tool.sh data_flow_tracker_v2.py \
--var critical_config \
--show-impact \
--format json \
--file config.py > impact_report.json
# Parse the JSON to fail builds on high-risk changes# Generate DOT format for inclusion in documentation
./run_any_python_tool.sh data_flow_tracker_v2.py \
--var data_processor \
--format graph \
--file processor.py > data_flow.dot
# Convert to various image formats
dot -Tpng data_flow.dot -o data_flow.png
dot -Tsvg data_flow.dot -o data_flow.svg- Depth limiting: Use
--max-depthto prevent runaway analysis - Essential path pruning: Calculation paths automatically filter non-critical steps
- Memoization: Results cached during single analysis run
- Memory management: Large codebases handled efficiently
- Small files (< 1000 lines): All analysis modes, unlimited depth
- Medium files (1000-5000 lines): Use
--max-depth 20for complex analysis - Large files (> 5000 lines): Focus on specific variables, use
--max-depth 10 - Batch mode: Process files individually rather than analyzing entire projects
- Basic tracking: ~10MB per 1000 lines of code
- Full analysis: ~50MB per 1000 lines with all modes enabled
- HTML reports: Additional ~5MB per report with visualization
1. "Function not found" errors
- Ensure the function exists in the specified file
- Check for typos in function names
- Verify file path is correct
2. Java analysis fails
- Install javalang:
pip install javalang>=0.13.0 - Check Java syntax is valid
- Ensure file is UTF-8 encoded
3. Large analysis times
- Use
--max-depthto limit recursion - Focus on specific variables instead of
--var-all - Check for circular dependencies causing infinite loops
4. HTML reports not generating
- Check write permissions to output directory
- Ensure sufficient disk space
- Use
--verbosefor detailed error messages
# Enable verbose logging for troubleshooting
./run_any_python_tool.sh data_flow_tracker_v2.py --var debug_var --show-impact --verbose --file problem.py# Generate documentation using data flow insights
./run_any_python_tool.sh doc_generator.py --function process_data --style technical --depth deep --file data.py# Find all variables before analyzing
./run_any_python_tool.sh find_text_v7.py "^[a-zA-Z_][a-zA-Z0-9_]*\s*=" --type regex --file code.py
# Analyze each found variable
for var in $(previous_command_output); do
./run_any_python_tool.sh data_flow_tracker_v2.py --var "$var" --show-impact --file code.py
done# Check impact before renaming
./run_any_python_tool.sh data_flow_tracker_v2.py --var old_name --show-impact --file module.py
# If safe, proceed with rename
./run_any_python_tool.sh replace_text_v8.py "old_name" "new_name" module.py- Start with Impact Analysis: Always check
--show-impactbefore making changes - Use Calculation Paths for Debugging: When values are wrong, trace with
--show-calculation-path - Monitor State Changes: Use
--track-statefor variables that change frequently - Generate Reports for Reviews: Use
--output-htmlfor formal documentation
- HIGH Risk: Changes affect return values or external systems
- MEDIUM Risk: Changes affect internal state or have side effects
- LOW Risk: Changes are local scope only
- Variables with >10 exit points may indicate tight coupling
- Calculation paths >20 steps suggest complex algorithms needing decomposition
- Frequent type changes indicate potential design issues
- Run impact analysis for critical functions
- Generate HTML reports for stakeholder review
- Include calculation paths in algorithm documentation
- Use state tracking for API boundary analysis
# Understand how final price is calculated
./run_any_python_tool.sh data_flow_tracker_v2.py \
--var final_price \
--show-calculation-path \
--explain \
--file pricing_engine.pyResult: Shows tax calculations, discount applications, and shipping costs flow into final price.
# Assess impact of changing database configuration
./run_any_python_tool.sh data_flow_tracker_v2.py \
--var db_config \
--show-impact \
--output-html \
--file application.pyResult: Reveals which services, cache systems, and logging components are affected.
# Track how user input flows through security checks
./run_any_python_tool.sh data_flow_tracker_v2.py \
--var user_input \
--track-state \
--show-impact \
--explain \
--file security_handler.pyResult: Shows validation steps, sanitization processes, and where the data exits the system.
While primarily designed as a command-line tool, the V2 analyzer can be imported and used programmatically:
from data_flow_tracker_v2 import DataFlowAnalyzerV2
# Create analyzer instance
analyzer = DataFlowAnalyzerV2(source_code, filename, language="python")
analyzer.analyze()
# Get impact analysis
impact = analyzer.track_forward("variable_name")
print(f"Affects {len(impact['affects'])} components")
# Get calculation path
path = analyzer.show_calculation_path("result_var")
print(f"Calculation involves {len(path)} steps")
# Generate explanations
explanation = analyzer.generate_explanation(impact, "impact", "variable_name")
print(explanation)Data Flow Tracker V2 transforms code analysis from simple dependency tracking into comprehensive code intelligence. Its five analysis modes provide different lenses for understanding code behavior:
- Use Impact Analysis for safety and refactoring decisions
- Use Calculation Paths for algorithm understanding and debugging
- Use State Tracking for type safety and data transformation analysis
- Use Natural Language Explanations for documentation and teaching
- Use HTML Visualization for interactive exploration and formal reports
The tool scales from quick single-variable checks to comprehensive codebase analysis, making it valuable for developers, code reviewers, and technical writers working with complex codebases.