Skip to content

Benchmark flash attention DSL 140tflops and ptoas smoke test version#635

Draft
MirkoDeVita98 wants to merge 1 commit intohw-native-sys:mainfrom
MirkoDeVita98:ir_fa_test
Draft

Benchmark flash attention DSL 140tflops and ptoas smoke test version#635
MirkoDeVita98 wants to merge 1 commit intohw-native-sys:mainfrom
MirkoDeVita98:ir_fa_test

Conversation

@MirkoDeVita98
Copy link
Copy Markdown

No description provided.

@MirkoDeVita98 MirkoDeVita98 marked this pull request as draft May 7, 2026 11:45
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a FlashAttention sample, including PTO kernel definitions, C++ host wrappers, a compilation script, and a benchmarking suite. The review feedback highlights critical safety concerns in the C++ wrappers where return values from runtime calls are ignored, potentially leading to null pointer dereferences. Additionally, the feedback suggests improving the portability and reliability of the build script by avoiding hardcoded system paths and include directories, recommending the use of environment variables instead.

Comment on lines +24 to +25
(void)rtGetC2cCtrlAddr(reinterpret_cast<uint64_t *>(&fftsAddr), &fftsLen);
(void)fftsLen;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The return value of rtGetC2cCtrlAddr is ignored. If this runtime call fails, fftsAddr will remain nullptr, which can lead to a crash or undefined behavior when passed to the kernel. It is safer to check the return code and handle the failure.

    if (rtGetC2cCtrlAddr(reinterpret_cast<uint64_t *>(&fftsAddr), &fftsLen) != 0) {
        return;
    }

Comment on lines +24 to +25
(void)rtGetC2cCtrlAddr(reinterpret_cast<uint64_t *>(&fftsAddr), &fftsLen);
(void)fftsLen;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The return value of rtGetC2cCtrlAddr is ignored. If this runtime call fails, fftsAddr will remain nullptr, which can lead to a crash or undefined behavior when passed to the kernel. It is safer to check the return code and handle the failure.

    if (rtGetC2cCtrlAddr(reinterpret_cast<uint64_t *>(&fftsAddr), &fftsLen) != 0) {
        return;
    }


ptoas --pto-arch=a3 --pto-level=level3 --enable-insert-sync \
fa_patched_s1_256_q3072_s0_8192.pto \
>/tmp/compiler_team_fa.cpp
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using hardcoded paths in /tmp for intermediate build artifacts and shared libraries can lead to conflicts or permission issues in multi-user environments. Consider using a local build directory or allowing the output path to be configurable via an environment variable.

>/tmp/compiler_team_fa.cpp

bisheng \
-I/sources/pto-isa/include \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The include path /sources/pto-isa/include is hardcoded, which makes the script dependent on a specific environment setup. Consider using an environment variable (e.g., $PTO_ISA_INCLUDE) to make the script more portable.

@reedhecre
Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

Summary

PR #635 存在 3 个 P2 问题:benchmark 不会因结果错误而失败、两个 launcher 吞掉了 FFTS 地址获取失败、编译脚本把 pto-isa 头文件路径硬编码为作者容器布局。

Findings

  1. P2 benchmark 脚本在正确性失败时仍然成功退出 test/samples/FlashAttention/compile_and_run/benchmark_flashattention.py:89

check_close()assert_close 失败转换成字符串 FAILED,而 main() 只是把这个状态打印出来后正常结束。这样即使 PTO 输出已经和 fp32 参考或 torch_npu 基线不一致,python3 benchmark_flashattention.py 仍会返回 0,自动化 smoke test/CI 无法据此拦住真实的正确性回归。

  1. P2 FFTS 控制地址获取失败会被吞掉后继续发核 test/samples/FlashAttention/compile_and_run/caller.cpp:24

这里直接忽略了 rtGetC2cCtrlAddr 的返回值,并且无论成功与否都把 fftsAddr 传给 call_both。一旦运行时没有拿到有效的 C2C/FFTS 控制缓冲区,就会带着空指针或无效地址继续执行,结果可能是崩溃、设备侧非法访问或静默错误。仓库里生成的 NPU validation 用例会对这个调用显式报错并中止,这里退化了现有运行时契约;同样的问题也出现在 caller_140tflops.cpp:24

  1. P2 编译脚本把 PTO-ISA 头文件路径硬编码成 `/sources/pto-isa/include` test/samples/FlashAttention/compile_and_run/compile_flashattention.sh:19

compile_flashattention.sh 只能在作者那套容器目录布局下工作,既不接受 PTO_ISA_ROOT,也不像仓库里其他 sample 脚本那样做路径探测。只要 pto-isa 不在 /sources/pto-isa/includebisheng 就会直接编译失败;当前评审工作区里这个路径就不存在,因此 README 里的 bash compile_flashattention.sh 无法按文档执行。这会把新 sample 变成明确的环境兼容性问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants