Auto-build flash-attn wheels on push, upload to S3#910
Draft
mgehre-amd wants to merge 4 commits intogfx11from
Draft
Auto-build flash-attn wheels on push, upload to S3#910mgehre-amd wants to merge 4 commits intogfx11from
mgehre-amd wants to merge 4 commits intogfx11from
Conversation
Replace the GitHub Releases / gh-pages publishing path with a direct upload to s3://aig-embd-gfx11-wheels/simple/flash-attn/ (the same PEP 503 index used by build-rocm-wheels.yml). Each push to gfx11 runs a check job that resolves the upstream Dao-AILab/flash-attention `main` HEAD and queries S3; the build job is skipped when a wheel matching the upstream short SHA already exists. The wheel version is derived from `git describe` against the latest v2.* tag, e.g. `2.8.4.dev472+gb995b246` for 472 commits past v2.8.3, and becomes plain `2.8.3` (or `2.8.4`) again once upstream lands a new tag. Changes: - Switch source from the v2.* release-list to upstream `main` because the latest release (v2.8.3, Aug 2025) is too old to include the gfx11 improvements we want. - Drop schedule and workflow_dispatch triggers (the workflow file is not on the default branch, so neither would actually fire). - Drop the create-release and publish-to-gh-pages jobs. - Drop the FLASH_ATTN_LOCAL_VERSION suffix; the SHA in the version string is enough to identify the build. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Adds pull_request trigger so PRs targeting gfx11 exercise the build, and gates the upload-wheel job on github.event_name == 'push' so PR runs validate the build without populating the S3 index. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
b91bd8a to
ce06bc5
Compare
Upstream commit 3f94643 ("[AMD] Migrate to Triton Backend to Aiter")
introduced a hard triton==3.5.1 pin and moved the AMD Triton backend
out of the flash_attn package into aiter. This breaks ROCm users:
the triton pin downgrades their ROCm triton, and the wheel is no
longer self-contained.
Pin to bbe25ba (the parent commit) which still bundles
flash_attn_triton_amd/ and has no triton version constraint.
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Remove the push-only guard so the upload-wheel job runs on both push and pull_request events, enabling upload from this PR. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.