feat: Online tuning for hipblaslt gemm by Z-Y00 · Pull Request #277 · AMD-AGI/Primus-Turbo

Z-Y00 · 2026-04-10T00:53:03Z

No description provided.

RuibinCheung · 2026-04-21T02:42:07Z


-GlobalBackendManager.set_auto_tune(True)  # or set PRIMUS_TURBO_AUTO_TUNE=1
+# Level 1: backend selection only (same as the old True / PRIMUS_TURBO_AUTO_TUNE=1)
+GlobalBackendManager.set_auto_tune(1)


I think the reuse of set_auto_tune will cause confused. I suggest that you can add another an API named GlobalBackendManager.set_tune_level() to control tuning level.

When set_auto_tune(False), disable auto tune.

When set_auto_tune(True) and set_tune_level(1), use backend selection only

When set_auto_tune(True) and set_tune_level(2), use backend selection and hipblaslt multi-algo tuning.

RuibinCheung · 2026-04-21T02:43:34Z

 from primus_turbo.pytorch.core.backend import GlobalBackendManager

-GlobalBackendManager.set_auto_tune(True)  # or set PRIMUS_TURBO_AUTO_TUNE=1
+# Level 1: backend selection only (same as the old True / PRIMUS_TURBO_AUTO_TUNE=1)


Also add an environmental variable to control auto tune level.

xiaobochen-amd · 2026-04-21T08:43:44Z

The new feature added in primus_turbo/pytorch/core/backend.py::AutoKernelDispatcher is too biased toward hipblaslt. It’s not recommended to add it here. With a different backend, it may no longer be an algo_index.

xiaobochen-amd · 2026-04-21T08:46:04Z

This benchmark file can be removed. We should try to consolidate everything into the existing bench_gemm_turbo.py as much as possible.

xiaobochen-amd · 2026-04-21T09:01:50Z

Can we remove hipblaslt_gemm_algo_count?

We could replace it with something like:
online_tune_hipblaslt_gemm(..., max_num_tune=50), which directly returns the best algo_index (feel free to decide the exact function name).

This would mean that when Level 2 is enabled, it triggers online_tune_hipblaslt_gemm, obtains the best result, and writes it into the cache.

The idea is that the upper-level framework doesn’t need to be aware of how many algos each backend has. It only needs to specify a maximum tuning budget and retrieve the best-performing result for the current backend.

Z-Y00 requested review from wenxie-amd and xiaobochen-amd as code owners April 10, 2026 00:53

Z-Y00 force-pushed the online_tune branch 4 times, most recently from 3ff9df4 to 7bb0243 Compare April 17, 2026 21:59

feat: Online tuning for hipblaslt gemm

db92df1

Z-Y00 force-pushed the online_tune branch from 7bb0243 to db92df1 Compare April 17, 2026 22:10

xiaobochen-amd requested a review from RuibinCheung April 21, 2026 02:30

RuibinCheung requested changes Apr 21, 2026

View reviewed changes

xiaobochen-amd reviewed Apr 21, 2026

View reviewed changes

jasainio mentioned this pull request Apr 30, 2026

opt(gemm): add hipBLASLt algorithm cache and thread-local workspace #321

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Online tuning for hipblaslt gemm#277

feat: Online tuning for hipblaslt gemm#277
Z-Y00 wants to merge 1 commit into
AMD-AGI:mainfrom
Z-Y00:online_tune

Z-Y00 commented Apr 10, 2026 •

edited

Loading

Uh oh!

RuibinCheung Apr 21, 2026

Uh oh!

RuibinCheung Apr 21, 2026

Uh oh!

xiaobochen-amd commented Apr 21, 2026

Uh oh!

xiaobochen-amd Apr 21, 2026

Uh oh!

xiaobochen-amd commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Z-Y00 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RuibinCheung Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

RuibinCheung Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

xiaobochen-amd commented Apr 21, 2026

Uh oh!

xiaobochen-amd Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

xiaobochen-amd commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Z-Y00 commented Apr 10, 2026 •

edited

Loading