Skip to content

feat: Online tuning for hipblaslt gemm#277

Open
Z-Y00 wants to merge 1 commit into
AMD-AGI:mainfrom
Z-Y00:online_tune
Open

feat: Online tuning for hipblaslt gemm#277
Z-Y00 wants to merge 1 commit into
AMD-AGI:mainfrom
Z-Y00:online_tune

Conversation

@Z-Y00
Copy link
Copy Markdown

@Z-Y00 Z-Y00 commented Apr 10, 2026

No description provided.

@Z-Y00 Z-Y00 force-pushed the online_tune branch 4 times, most recently from 3ff9df4 to 7bb0243 Compare April 17, 2026 21:59
Comment thread docs/examples.md

GlobalBackendManager.set_auto_tune(True) # or set PRIMUS_TURBO_AUTO_TUNE=1
# Level 1: backend selection only (same as the old True / PRIMUS_TURBO_AUTO_TUNE=1)
GlobalBackendManager.set_auto_tune(1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reuse of set_auto_tune will cause confused. I suggest that you can add another an API named GlobalBackendManager.set_tune_level() to control tuning level.

  1. When set_auto_tune(False), disable auto tune.
  2. When set_auto_tune(True) and set_tune_level(1), use backend selection only
  3. When set_auto_tune(True) and set_tune_level(2), use backend selection and hipblaslt multi-algo tuning.

Comment thread docs/examples.md
from primus_turbo.pytorch.core.backend import GlobalBackendManager

GlobalBackendManager.set_auto_tune(True) # or set PRIMUS_TURBO_AUTO_TUNE=1
# Level 1: backend selection only (same as the old True / PRIMUS_TURBO_AUTO_TUNE=1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add an environmental variable to control auto tune level.

@xiaobochen-amd
Copy link
Copy Markdown
Collaborator

The new feature added in primus_turbo/pytorch/core/backend.py::AutoKernelDispatcher is too biased toward hipblaslt. It’s not recommended to add it here. With a different backend, it may no longer be an algo_index.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark file can be removed. We should try to consolidate everything into the existing bench_gemm_turbo.py as much as possible.

@xiaobochen-amd
Copy link
Copy Markdown
Collaborator

Can we remove hipblaslt_gemm_algo_count?

We could replace it with something like:
online_tune_hipblaslt_gemm(..., max_num_tune=50), which directly returns the best algo_index (feel free to decide the exact function name).

This would mean that when Level 2 is enabled, it triggers online_tune_hipblaslt_gemm, obtains the best result, and writes it into the cache.

The idea is that the upper-level framework doesn’t need to be aware of how many algos each backend has. It only needs to specify a maximum tuning budget and retrieve the best-performing result for the current backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants