Skip to content

Integrate Stage Profiling #58

@lxsaah

Description

@lxsaah

Summary

Add automatic timing instrumentation to AimDB's execution primitives (.source(), .tap(), .link(), and future .transform()), enabling users to identify slow stages without manual instrumentation.

Motivation

AimDB owns the execution boundary for all user-provided callbacks. Since AimDB spawns and awaits these callbacks, it can automatically measure their execution time - giving users immediate visibility into which stage is the bottleneck without requiring any instrumentation code.

User story: "As an AimDB user, I want AimDB to tell me which .source(), .tap(), or .link() callback is slow, so I can identify bottlenecks without adding manual profiling code."

Important: Wall-Clock Time

Stage profiling measures wall-clock time (including await points), not CPU time. This is useful for identifying which stage is slow, but not why. Users should use tracing or tokio-console for detailed CPU vs I/O analysis.

Design

See full RFC: docs/design/014-M6-stage-profiling.md

Key Points

  1. Feature-flagged: profiling feature, disabled by default (separate from metrics)
  2. Zero-cost when disabled: No timing overhead without feature
  3. Automatic instrumentation: AimDB times callbacks internally
  4. Builder pattern for names: .source(...).with_name("camera_capture")
  5. MCP integration: get_stage_profiling and reset_stage_profiling tools

Stage Metrics Tracked

  • call_count - number of invocations
  • total_time_ns - cumulative wall-clock time
  • avg_time_ns - average per invocation
  • min_time_ns - fastest invocation
  • max_time_ns - slowest invocation

Design Decisions

  • Named stages via .with_name("xxx") builder pattern
  • No histogram/percentile support (avg/min/max sufficient)
  • Reset API via MCP tool
  • Stub transform profiling for future implementation
  • Wall-clock time only (not CPU time) - document limitation clearly

Tasks

Phase 1: Core Infrastructure

  • Add StageMetrics struct to aimdb-core/src/profiling/stage_metrics.rs
  • Add profiling feature flag to aimdb-core/Cargo.toml
  • Add RecordProfilingMetrics container for per-record stage tracking
  • Unit tests for StageMetrics (record, reset, accessors)

Phase 2: Instrumentation

  • Instrument .source() execution in TypedRecord - wrap callback with timing
  • Instrument .tap() execution in dispatcher - wrap handler with timing
  • Instrument .link() execution in connector dispatch
  • Stub .transform() metrics collection (for future use)
  • Wire up metrics to RecordProfilingMetrics per record

Phase 3: Named Stages API

  • Add .with_name(name: &str) builder method to source registration
  • Add .with_name(name: &str) builder method to tap registration
  • Add .with_name(name: &str) builder method to link registration
  • Store stage names in RecordProfilingMetrics

Phase 4: Metadata & MCP

  • Add StageProfilingInfo struct to aimdb-core/src/remote/metadata.rs
  • Extend RecordMetadata with optional stage_profiling field
  • Update collect_metadata() to include profiling data when feature enabled
  • Add get_stage_profiling MCP tool
  • Add reset_stage_profiling MCP tool
  • Add bottleneck detection logic (identify slowest stage)
  • Update MCP README documentation

Phase 5: Embassy Support

  • Add portable-atomic dependency (conditional on profiling feature)
  • Verify StageMetrics works with portable-atomic::AtomicU64
  • Cross-compile test: cargo check --target thumbv7em-none-eabihf --features profiling

Phase 6: Testing & Documentation

  • Integration test with examples/remote-access-demo
  • Add profiling example to documentation
  • Document wall-clock limitation in user-facing docs
  • Run full CI: make all

Acceptance Criteria

  1. With --features profiling enabled, get_stage_profiling MCP tool returns:

    {
      "record": "sensor::Temperature",
      "stages": [
        {
          "stage_type": "source",
          "index": 0,
          "name": "sensor_reader",
          "call_count": 1000,
          "avg_time_ns": 5000000
        },
        {
          "stage_type": "tap",
          "index": 0,
          "name": "data_processor",
          "call_count": 1000,
          "avg_time_ns": 40000000
        }
      ],
      "bottleneck": {
        "stage_type": "tap",
        "name": "data_processor",
        "avg_time_ns": 40000000
      }
    }
  2. Without profiling feature, no performance impact

  3. Embassy adapter compiles for thumbv7em-none-eabihf with profiling enabled

  4. All existing tests pass with and without profiling feature

  5. Stage names set via .with_name() appear in MCP output

Example Usage

// Register source with profiling name
db.record::<Temperature>()
    .source(|producer| async move {
        loop {
            let temp = read_sensor().await;
            producer.produce(temp).await;
        }
    })
    .with_name("sensor_reader");

// Register tap with profiling name
db.record::<Temperature>()
    .tap(|value| async move {
        process_temperature(value).await;
    })
    .with_name("data_processor");

Then query via MCP:

mcp_aimdb_get_stage_profiling(socket_path, "sensor::Temperature")

Out of Scope

  • CPU time tracking (only wall-clock time)
  • Histogram/percentile tracking (P50/P95/P99)
  • External StageProfiler API for user code
  • Pipeline-level composition patterns

Related

  • RFC: docs/design/014-M6-stage-profiling.md
  • Buffer metrics feature: docs/design/013-M6-buffer-introspection-metrics.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions