Skip to content

[Benchmark] Benchmark probe issue, memory probe and estimation may be inaccurate for some kernels #1200

@lowdy1

Description

@lowdy1

Issue with Current Probe Design

In the current design, estimate_kernel_peak_memory probes a kernel’s peak memory usage and returns peak_bytes:

peak_bytes = estimate_kernel_peak_memory(probe_fn=_probe)
kernel_bpt = peak_bytes // num_tokens  # if needed

This derives a bytes-per-token (BPT) estimate by assuming memory scales linearly with the number of tokens.


Limitation

This assumption only holds for kernels whose memory footprint scales linearly with num_tokens.

However, for some kernels Scaling may be non-linear (e.g., quadratic)

As a result, the computed kernel_bpt can be inaccurate or misleading for such kernels.


Summary

The current BPT estimation is:

  • ✅ Reasonable for linear-scaling kernels
  • ❌ Unreliable for kernels with non-linear or structured memory behavior

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions