ci(docker): install triton_kernels for ATOM_USE_TRITON_MOE#687
Merged
Conversation
`triton_kernels` is the companion pure-Python package shipped under `python/triton_kernels` in the ROCm/triton checkout. ATOM imports it when `ATOM_USE_TRITON_MOE=1` (DeepSeek-V4 launch path, fused_moe_triton.py, moe.py). Without it the model fails at import with `ModuleNotFoundError: triton_kernels`. - build_triton stage: pip install python/triton_kernels after main triton install (pure-Python, no extra build deps). - atom_image final stage: copy `triton_kernels` + `triton_kernels-*.dist-info` alongside the existing triton mount-copy. Derived stages are unaffected: - atom_oot only uninstalls/restores `triton` (glob does not match). - atom_sglang reinstalls `triton==3.6.0`, does not touch triton_kernels.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the ATOM Docker base image build to ensure the sibling triton_kernels Python package (from the same ROCm/triton checkout) is installed and present in the final atom_image, preventing runtime ModuleNotFoundError: No module named 'triton_kernels' for Triton-MoE code paths.
Changes:
- Install
./python/triton_kernelsin thebuild_tritonstage after installing Triton, and validate it imports. - Copy
triton_kernels/and its*.dist-info/frombuild_tritoninto the finalatom_imagevenv alongsidetriton/, and validate both imports.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+314
to
+315
| # Companion `triton_kernels` package (pure-Python). Required by ATOM when | ||
| # `ATOM_USE_TRITON_MOE=1` (e.g. DeepSeek-V4 launch path). Lives under |
| # Triton: copy package from build stage into current venv | ||
| # Use RUN --mount to avoid COPY glob issues, and preserve mori already in venv | ||
| # Also copy the companion `triton_kernels` package (pure-Python, built in the | ||
| # same stage) — required by ATOM_USE_TRITON_MOE=1 model paths. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Install the companion
triton_kernelspackage in the ATOM Docker base image so models that opt into the Triton MoE path (e.g. DeepSeek-V4 withATOM_USE_TRITON_MOE=1) can import it.triton_kernelsis shipped as a sibling pure-Python package underpython/triton_kernelsin the sameROCm/tritoncheckout we already clone in thebuild_tritonstage; it is not installed bypip install .of the main triton package and was therefore missing from images.Without this change, launching V4-Pro fails at import:
(callers:
atom/model_ops/fused_moe_triton.py,atom/model_ops/moe.py:692-697).Changes
build_tritonstage:pip install ./python/triton_kernelsafter the main triton install (pure-Python, no extra build deps).atom_imagefinal stage: extend the existing--mount=from=build_tritoncp block to also copytriton_kernels/andtriton_kernels-*.dist-info/alongsidetriton/.python -c "import triton_kernels; ..."validation so the build fails early if the copy regresses.Derived stages — verified safe
atom_oot: triton backup/restore globs only matchtriton/andtriton-*.dist-info;triton_kernels(separate package) survives untouched.atom_sglang: reinstallstriton==3.6.0, does not touchtriton_kernels.Test plan
docker build --target atom_imagereaches the new validation step and printstriton_kernels: ....python -c "import triton_kernels; print(triton_kernels.__file__)"resolves inside the image.ATOM_USE_TRITON_MOE=1in the freshly built image — noModuleNotFoundErrorand GSM8K-50 ≥ 0.94 (matches V4 baseline noted in PR feat(deepseek_v4): PR1 skeleton — end-to-end inference with triton MoE #650).