Skip to content

Auto-build flash-attn wheels on push, upload to S3#910

Draft
mgehre-amd wants to merge 4 commits intogfx11from
matthias.flash-attn-s3-auto
Draft

Auto-build flash-attn wheels on push, upload to S3#910
mgehre-amd wants to merge 4 commits intogfx11from
matthias.flash-attn-s3-auto

Conversation

@mgehre-amd
Copy link
Copy Markdown

@mgehre-amd mgehre-amd commented Apr 30, 2026

  • Revert trigger on PR

mgehre-amd added 2 commits May 4, 2026 18:05
Replace the GitHub Releases / gh-pages publishing path with a direct
upload to s3://aig-embd-gfx11-wheels/simple/flash-attn/ (the same PEP 503
index used by build-rocm-wheels.yml).

Each push to gfx11 runs a check job that resolves the upstream
Dao-AILab/flash-attention `main` HEAD and queries S3; the build job is
skipped when a wheel matching the upstream short SHA already exists.

The wheel version is derived from `git describe` against the latest
v2.* tag, e.g. `2.8.4.dev472+gb995b246` for 472 commits past v2.8.3, and
becomes plain `2.8.3` (or `2.8.4`) again once upstream lands a new tag.

Changes:
- Switch source from the v2.* release-list to upstream `main` because
  the latest release (v2.8.3, Aug 2025) is too old to include the gfx11
  improvements we want.
- Drop schedule and workflow_dispatch triggers (the workflow file is
  not on the default branch, so neither would actually fire).
- Drop the create-release and publish-to-gh-pages jobs.
- Drop the FLASH_ATTN_LOCAL_VERSION suffix; the SHA in the version
  string is enough to identify the build.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Adds pull_request trigger so PRs targeting gfx11 exercise the build,
and gates the upload-wheel job on github.event_name == 'push' so PR
runs validate the build without populating the S3 index.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
@mgehre-amd mgehre-amd changed the base branch from matthias.pep503-index to gfx11 May 4, 2026 16:09
@mgehre-amd mgehre-amd force-pushed the matthias.flash-attn-s3-auto branch from b91bd8a to ce06bc5 Compare May 4, 2026 16:09
mgehre-amd added 2 commits May 4, 2026 18:42
Upstream commit 3f94643 ("[AMD] Migrate to Triton Backend to Aiter")
introduced a hard triton==3.5.1 pin and moved the AMD Triton backend
out of the flash_attn package into aiter. This breaks ROCm users:
the triton pin downgrades their ROCm triton, and the wheel is no
longer self-contained.

Pin to bbe25ba (the parent commit) which still bundles
flash_attn_triton_amd/ and has no triton version constraint.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Remove the push-only guard so the upload-wheel job runs on both
push and pull_request events, enabling upload from this PR.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant