fa3 hopper direct fp32 output_buffer#2
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 67392a572c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Choose between flash_api.cpp and flash_api_stable.cpp based on torch version | ||
| torch_version = parse(torch.__version__) | ||
| target_version = parse("2.9.0.dev20250830") | ||
| target_version = parse("99.0.0") |
There was a problem hiding this comment.
Restore reachable torch-version gate for API source
Setting target_version to 99.0.0 makes the torch_version >= target_version branch effectively unreachable, so the build will never select flash_api_stable.cpp even though this section is explicitly version-gated for Torch compatibility. This regresses Torch 2.9+ environments to always compiling flash_api.cpp, bypassing the compatibility path and risking build/runtime breakage on newer Torch releases.
Useful? React with 👍 / 👎.
| # Cutlass 3.8 will expect the new data types in cuda.h from CTK 12.8, which we don't have. | ||
| # For CUDA 13.0+, use system nvcc instead of downloading CUDA 12.x toolchain | ||
| if bare_metal_version >= Version("12.3") and bare_metal_version < Version("13.0") and bare_metal_version != Version("12.8"): | ||
| if False: |
There was a problem hiding this comment.
Re-enable CUDA toolchain download condition
Replacing the CUDA-version check with if False disables the entire block that downloads pinned nvcc/ptxas, updates PATH, and sets PYTORCH_NVCC; this removes the repository’s fallback toolchain setup for CUDA 12.x builds. In environments that depended on this path (for example, missing or mismatched system nvcc), extension builds can now fail or silently use an unintended compiler toolchain.
Useful? React with 👍 / 👎.
为了直出fp32 的output 打的补丁,+~200行,lock有3.3k 可以忽略