Skip to content

fa3 hopper direct fp32 output_buffer#2

Open
SuperGoodGame wants to merge 1 commit into
cmriat:feature/hopper_attention_with_sinkfrom
SuperGoodGame:feature/fp32_direct
Open

fa3 hopper direct fp32 output_buffer#2
SuperGoodGame wants to merge 1 commit into
cmriat:feature/hopper_attention_with_sinkfrom
SuperGoodGame:feature/fp32_direct

Conversation

@SuperGoodGame
Copy link
Copy Markdown
Collaborator

@SuperGoodGame SuperGoodGame commented Mar 12, 2026

为了直出fp32 的output 打的补丁,+~200行,lock有3.3k 可以忽略

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 12, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ecff3365-b6b2-4ddb-a111-36bb2986598f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67392a572c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hopper/setup.py
# Choose between flash_api.cpp and flash_api_stable.cpp based on torch version
torch_version = parse(torch.__version__)
target_version = parse("2.9.0.dev20250830")
target_version = parse("99.0.0")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore reachable torch-version gate for API source

Setting target_version to 99.0.0 makes the torch_version >= target_version branch effectively unreachable, so the build will never select flash_api_stable.cpp even though this section is explicitly version-gated for Torch compatibility. This regresses Torch 2.9+ environments to always compiling flash_api.cpp, bypassing the compatibility path and risking build/runtime breakage on newer Torch releases.

Useful? React with 👍 / 👎.

Comment thread hopper/setup.py
# Cutlass 3.8 will expect the new data types in cuda.h from CTK 12.8, which we don't have.
# For CUDA 13.0+, use system nvcc instead of downloading CUDA 12.x toolchain
if bare_metal_version >= Version("12.3") and bare_metal_version < Version("13.0") and bare_metal_version != Version("12.8"):
if False:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Re-enable CUDA toolchain download condition

Replacing the CUDA-version check with if False disables the entire block that downloads pinned nvcc/ptxas, updates PATH, and sets PYTORCH_NVCC; this removes the repository’s fallback toolchain setup for CUDA 12.x builds. In environments that depended on this path (for example, missing or mismatched system nvcc), extension builds can now fail or silently use an unintended compiler toolchain.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant