Skip to content

ci: use python 3.12 venv for test-kernels install step#913

Draft
mgehre-amd wants to merge 2 commits intogfx11from
matthias.ci-test-kernels-venv
Draft

ci: use python 3.12 venv for test-kernels install step#913
mgehre-amd wants to merge 2 commits intogfx11from
matthias.ci-test-kernels-venv

Conversation

@mgehre-amd
Copy link
Copy Markdown

@mgehre-amd mgehre-amd commented Apr 30, 2026

Runners started to inject "ln -sf /usr/bin/python3 /usr/local/bin/python" into the docker container. Reverse this to keep our CI running.

@eble-amd eble-amd force-pushed the matthias.ci-test-kernels-venv branch from 1eccb1b to 0029352 Compare May 1, 2026 19:25
@eble-amd
Copy link
Copy Markdown

eble-amd commented May 1, 2026

The last push was a rebase.

@eble-amd
Copy link
Copy Markdown

eble-amd commented May 1, 2026

After the rebase, it looks like this change is an improvement, but now there seems to be a problem accessing the GPU. This is the focal point in the log:

AssertionError: Invalid device id
PyTorch 2.11.0+rocm7.13.0a20260501, HIP: 7.13.26173

@mgehre-amd mgehre-amd force-pushed the matthias.ci-test-kernels-venv branch from 0029352 to f9e5787 Compare May 4, 2026 06:07
@mgehre-amd mgehre-amd requested a review from roberteg16 May 4, 2026 06:49
@mgehre-amd mgehre-amd marked this pull request as ready for review May 4, 2026 06:49
Comment thread .github/workflows/build-rocm-wheels.yml Outdated
@mgehre-amd mgehre-amd force-pushed the matthias.ci-test-kernels-venv branch from f9e5787 to 37adf78 Compare May 4, 2026 07:35
@mgehre-amd mgehre-amd force-pushed the matthias.ci-test-kernels-venv branch 6 times, most recently from 72efe1c to 0c91f4d Compare May 4, 2026 14:39
The self-hosted runner's container startup command runs
`ln -sf /usr/bin/python3 /usr/local/bin/python`, repointing the image's
python3.12 symlink at Debian's PEP-668 externally-managed python3.11.
This breaks /usr/local/bin/rocm-smi (shebang #!/usr/local/bin/python ->
ModuleNotFoundError: rocm_sdk_core) and any later `python ...` call in
the workflow. Restore the symlink to python3.12 as the very first step
so subsequent steps see the image's intended interpreter.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
@mgehre-amd mgehre-amd force-pushed the matthias.ci-test-kernels-venv branch from 0dc29e2 to e24fb5f Compare May 4, 2026 16:12
@mgehre-amd mgehre-amd marked this pull request as draft May 4, 2026 16:38
@mgehre-amd
Copy link
Copy Markdown
Author

The runner configuration was updated, so we don't need this PR anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants