Skip to content

[AIROCMLIR-707] Make attentionSweeps split_kv filter device-memory-aware#2366

Draft
bogdan-petkovic wants to merge 1 commit intoROCm:developfrom
bogdan-petkovic:bogdan-petkovic/attention-sweeps-device-memory-filter
Draft

[AIROCMLIR-707] Make attentionSweeps split_kv filter device-memory-aware#2366
bogdan-petkovic wants to merge 1 commit intoROCm:developfrom
bogdan-petkovic:bogdan-petkovic/attention-sweeps-device-memory-filter

Conversation

@bogdan-petkovic
Copy link
Copy Markdown
Contributor

Motivation

Some attention sweep samples with large split_kv generate very high temporary storage pressure and are currently rejected as FAIL after entering the pipeline.
This PR adds an early sweep-side prefilter so memory-heavy splitKV cases can be skipped before expensive execution, reducing wasted sweep effort while keeping existing compiler/runtime behavior unchanged.

Technical Details

Updated mlir/utils/performance/attentionSweeps.py to add a device-memory-aware splitKV prefilter in sampling:

  • Added splitKV extra-storage estimator for sampled attention shapes.
  • Added default splitKV limit policy based on visible device memory:
    • deviceMem / 8, clamped to [1 GiB, 8 GiB]
    • fallback to 1.5 GiB when memory query is unavailable.
  • Added CLI control:
    • --splitkv-extra-bytes-limit (non-negative int64-validated override).
      Refactored sample filtering to track reasons separately:
  • MAX_TOKENS filter
  • splitKV extra-storage filter
  • cumulative reporting across initial and refill sampling batches.
    Added focused unit tests in mlir/utils/performance/tests/test_attentionSweeps.py:
  • limit policy behavior (clamp/fallback),
  • splitKV estimator behavior,
  • per-reason filter accounting.
    No compiler/verifier logic was changed in this PR; scope is limited to sweep generation/filtering behavior.

Test Plan

  • Run attentionSweeps.py through CI to validate end-to-end sweep behavior with the new splitKV prefilter.
  • Confirm that memory-heavy splitKV cases are filtered before expensive execution stages.

Test Result

  • CI attention-sweep run

Submission Checklist

Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant