Skip to content

feat: Add production-ready compute unit benchmarking framework with statistical analysis#5

Merged
levicook merged 3 commits intomainfrom
feat/cu-bench-framework
Jun 11, 2025
Merged

feat: Add production-ready compute unit benchmarking framework with statistical analysis#5
levicook merged 3 commits intomainfrom
feat/cu-bench-framework

Conversation

@levicook
Copy link
Copy Markdown
Owner

@levicook levicook commented Jun 11, 2025

Dual Benchmarking Paradigms

🔬 Instruction Benchmarking - Pure CU measurement

// Measures exactly what you ask for - no hidden overhead
let result = benchmark_instruction(sol_transfer_bench, 100);
// Result: 150 CU (0% variance) - perfectly consistent

🔄 Transaction Benchmarking - Complete workflow analysis

// Real-world multi-program scenarios
let result = benchmark_transaction(token_setup_bench, 100);
// Result: 28,322-38,822 CU range - realistic variance

Statistical Analysis Engine

Percentile-Based Estimates (inspired by Helius Priority Fee API):

{
  "cu_estimate": {
    "min": 28322,           // 0th percentile - absolute minimum
    "conservative": 30145,  // 25th percentile - safe for most cases
    "balanced": 32891,      // 50th percentile - good default  
    "safe": 35123,         // 75th percentile - high reliability
    "very_high": 37456,    // 95th percentile - very reliable
    "unsafe_max": 38822,   // 100th percentile - maximum observed
    "sample_size": 100
  }
}

🔧 Major Technical Achievements

1. Context Discovery System

  • Two-phase measurement: Simulation for context + execution for statistics
  • Rich execution context: SVM state, program details, CPI analysis
  • Address book system: Human-readable program names vs raw pubkeys

2. Statistical Rigor

  • Fixed percentile calculation bugs that were showing incorrect variance
  • Comprehensive unit tests (7 test cases covering edge cases)
  • Proper handling of duplicates, unsorted input, boundary conditions

3. Clean Architecture

  • Modular design: Separate concerns across focused modules
  • Type-safe domain modeling: StatType enum for instruction vs transaction distinction
  • Professional tooling: env_logger integration, clean JSON serialization

4. Framework Design Excellence

  • No hidden overhead: Removed automatic ComputeBudgetInstruction for transparency
  • SVM state accumulation: Realistic measurements vs isolated tests
  • User control: Benchmark authors control SVM configuration completely

📊 Working Examples & Living Documentation

Benchmarks as Primary Documentation

Comprehensive Documentation

  • 📖 Complete Guide: BENCHMARKING.md - 274 lines of practical documentation
  • 🎯 Enhanced README: Repositions project as testing + benchmarking platform
  • 💡 Learning Path: README → BENCHMARKING.md → working benchmark files

🚀 Key Design Decisions & Evolution

Problems Solved During Development

  1. Multiple SVM Issue: Fixed benchmark runner creating different SVM instances
  2. Account Collision Errors: Generate fresh keypairs to avoid conflicts
  3. Insufficient Funds: Increased funding for long measurement runs
  4. Framework Overhead: Removed hidden ComputeBudgetInstruction for transparency
  5. Percentile Calculation Bug: Fixed incorrect indexing showing false consistency
  6. Domain Modeling: Created proper instruction vs transaction distinction

Architecture Choices

  • Living Documentation: Benchmark files serve as guaranteed-working examples
  • Statistical Approach: Multiple samples → confidence intervals vs single measurements
  • User Control: Framework measures exactly what users ask for, nothing more
  • Professional UX: Quiet by default, rich logging via RUST_LOG=info

📈 Impact & Results

Ecosystem Positioning

Elevates litesvm-testing from "another testing framework" to unique dual-purpose toolkit:

  • Testing: Comprehensive error assertions and log verification
  • Benchmarking: Systematic CU analysis with statistical confidence

Concrete Value Delivered

Capability Before After
CU Measurement Manual, ad-hoc Systematic, statistical
Fee Estimation Guesswork Data-driven with confidence intervals
Instruction Analysis None Pure measurement without overhead
Transaction Analysis None Multi-program workflow insights
Reproducibility Inconsistent Professional methodology

Technical Metrics

  • +1,665 lines, -506 lines across 15 files
  • 324 lines of comprehensive unit tests
  • 274 lines of documentation
  • 3 working benchmarks demonstrating different paradigms

🔄 Migration & Integration

Existing Users: Zero breaking changes - all existing testing functionality preserved

New Capabilities: Opt-in via --features cu_bench for benchmarking functionality

Production Integration:

// Load benchmark results for fee estimation
let cu_estimate = load_benchmark_result("sol_transfer")?.cu_estimate.conservative;
let compute_budget_ix = ComputeBudgetInstruction::set_compute_unit_limit(cu_estimate);

This PR establishes litesvm-testing as the definitive toolkit for Solana program development - combining comprehensive testing utilities with production-ready performance analysis capabilities not available elsewhere in the ecosystem.

Ready for review! 🎯

- Add InstructionBenchmark trait for clean separation of concerns
  - Benchmark owns: SVM setup, keypairs, signing
  - Framework owns: unsigned tx building, CU measurement, statistics
- Implement benchmark_instruction() runner with SVM state accumulation
- Convert SOL and SPL token transfer benchmarks to use new framework
- Add solana-message dependency for unsigned transaction creation
- Simplify benchmark output to single summary line + JSON data
- Eliminate 150+ lines of boilerplate from benchmark implementations
- Maintain identical CU measurements (300 CU SOL, 4794 CU SPL)

The framework enables measuring any Solana instruction's compute unit
usage with minimal code while providing structured estimates similar
to Helius Priority Fee API.
@levicook levicook self-assigned this Jun 11, 2025
levicook added 2 commits June 11, 2025 12:56
Add comprehensive CU benchmarking framework with dual instruction/transaction paradigms:

Framework Features:
- InstructionBenchmark: Pure instruction CU measurement (no framework overhead)
- TransactionBenchmark: Complete workflow measurement with multi-program context
- Rich execution context discovery through simulation
- Percentile-based CU estimates (min/conservative/balanced/safe/very_high/unsafe_max)
- Professional logging with env_logger integration
- Clean JSON output with proper domain modeling

Key Design Decisions:
- Remove automatic ComputeBudgetInstruction from instruction benchmarks for transparency
- Two-phase measurement: simulation for context + execution for statistics
- SVM state accumulation across measurements for realism
- StatType enum for clean instruction vs transaction distinction
- Comprehensive unit tests for percentile calculations

Benchmarks:
- SOL transfer: 150 CU (pure instruction)
- SPL token transfer: instruction-level benchmark
- Token setup workflow: 28,322-38,822 CU transaction benchmark

This provides systematic, reproducible CU analysis for both research and production planning.
Transform project positioning from "testing framework" to "testing and benchmarking framework" with comprehensive documentation:

Documentation Additions:
- Add BENCHMARKING.md: Complete guide with living examples and best practices
- Enhance README: Prominently feature CU benchmarking alongside testing
- Create clear learning path: README → BENCHMARKING.md → benchmark files

Key Documentation Features:
- Dual paradigm explanation (instruction vs transaction benchmarking)
- Statistical output interpretation (percentile-based estimates)
- Production integration patterns for fee estimation
- Troubleshooting guide for common benchmark issues
- Living documentation that references actual working benchmark files

Project Positioning:
- README hero section now highlights both testing AND benchmarking capabilities
- CU benchmarking quick start with concrete examples (SOL transfer: 150 CU, Token setup: 28K-38K CU)
- Updated roadmap showing completed benchmarking framework
- Enhanced examples section showcasing benchmark files as primary documentation

This establishes systematic CU analysis as a unique differentiator alongside the existing comprehensive testing utilities.
@levicook levicook changed the title feat: add compute unit benchmarking framework with trait-based design feat: Add comprehensive compute unit benchmarking framework with statistical analysis Jun 11, 2025
@levicook levicook changed the title feat: Add comprehensive compute unit benchmarking framework with statistical analysis feat: Add production-ready compute unit benchmarking framework with statistical analysis Jun 11, 2025
@levicook levicook marked this pull request as ready for review June 11, 2025 19:13
@levicook levicook merged commit d58a041 into main Jun 11, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant