Benchmark flash attention DSL 140tflops and ptoas smoke test version#635
Benchmark flash attention DSL 140tflops and ptoas smoke test version#635MirkoDeVita98 wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a FlashAttention sample, including PTO kernel definitions, C++ host wrappers, a compilation script, and a benchmarking suite. The review feedback highlights critical safety concerns in the C++ wrappers where return values from runtime calls are ignored, potentially leading to null pointer dereferences. Additionally, the feedback suggests improving the portability and reliability of the build script by avoiding hardcoded system paths and include directories, recommending the use of environment variables instead.
| (void)rtGetC2cCtrlAddr(reinterpret_cast<uint64_t *>(&fftsAddr), &fftsLen); | ||
| (void)fftsLen; |
There was a problem hiding this comment.
The return value of rtGetC2cCtrlAddr is ignored. If this runtime call fails, fftsAddr will remain nullptr, which can lead to a crash or undefined behavior when passed to the kernel. It is safer to check the return code and handle the failure.
if (rtGetC2cCtrlAddr(reinterpret_cast<uint64_t *>(&fftsAddr), &fftsLen) != 0) {
return;
}| (void)rtGetC2cCtrlAddr(reinterpret_cast<uint64_t *>(&fftsAddr), &fftsLen); | ||
| (void)fftsLen; |
There was a problem hiding this comment.
The return value of rtGetC2cCtrlAddr is ignored. If this runtime call fails, fftsAddr will remain nullptr, which can lead to a crash or undefined behavior when passed to the kernel. It is safer to check the return code and handle the failure.
if (rtGetC2cCtrlAddr(reinterpret_cast<uint64_t *>(&fftsAddr), &fftsLen) != 0) {
return;
}|
|
||
| ptoas --pto-arch=a3 --pto-level=level3 --enable-insert-sync \ | ||
| fa_patched_s1_256_q3072_s0_8192.pto \ | ||
| >/tmp/compiler_team_fa.cpp |
There was a problem hiding this comment.
| >/tmp/compiler_team_fa.cpp | ||
|
|
||
| bisheng \ | ||
| -I/sources/pto-isa/include \ |
Codex Review该评论由 review 机器人自动更新。
SummaryPR #635 存在 3 个 P2 问题:benchmark 不会因结果错误而失败、两个 launcher 吞掉了 FFTS 地址获取失败、编译脚本把 pto-isa 头文件路径硬编码为作者容器布局。 Findings
这里直接忽略了
|
No description provided.