feat: Online tuning for hipblaslt gemm#277
Conversation
3ff9df4 to
7bb0243
Compare
|
|
||
| GlobalBackendManager.set_auto_tune(True) # or set PRIMUS_TURBO_AUTO_TUNE=1 | ||
| # Level 1: backend selection only (same as the old True / PRIMUS_TURBO_AUTO_TUNE=1) | ||
| GlobalBackendManager.set_auto_tune(1) |
There was a problem hiding this comment.
I think the reuse of set_auto_tune will cause confused. I suggest that you can add another an API named GlobalBackendManager.set_tune_level() to control tuning level.
- When
set_auto_tune(False), disable auto tune. - When
set_auto_tune(True)andset_tune_level(1), use backend selection only - When
set_auto_tune(True)andset_tune_level(2), use backend selection and hipblaslt multi-algo tuning.
| from primus_turbo.pytorch.core.backend import GlobalBackendManager | ||
|
|
||
| GlobalBackendManager.set_auto_tune(True) # or set PRIMUS_TURBO_AUTO_TUNE=1 | ||
| # Level 1: backend selection only (same as the old True / PRIMUS_TURBO_AUTO_TUNE=1) |
There was a problem hiding this comment.
Also add an environmental variable to control auto tune level.
|
The new feature added in primus_turbo/pytorch/core/backend.py::AutoKernelDispatcher is too biased toward hipblaslt. It’s not recommended to add it here. With a different backend, it may no longer be an algo_index. |
There was a problem hiding this comment.
This benchmark file can be removed. We should try to consolidate everything into the existing bench_gemm_turbo.py as much as possible.
|
Can we remove hipblaslt_gemm_algo_count? We could replace it with something like: This would mean that when Level 2 is enabled, it triggers online_tune_hipblaslt_gemm, obtains the best result, and writes it into the cache. The idea is that the upper-level framework doesn’t need to be aware of how many algos each backend has. It only needs to specify a maximum tuning budget and retrieve the best-performing result for the current backend. |
No description provided.