Add FA PTO lit regression cases by zhangstevenunity · Pull Request #609 · hw-native-sys/PTOAS

zhangstevenunity · 2026-04-30T07:32:47Z

No description provided.

zhangstevenunity · 2026-04-30T07:34:24Z

/run a3 test/lit/pto/fa_perf.pto

reedhecre · 2026-04-30T07:35:04Z

已接收 /run a3 test/lit/pto/fa_perf，A3 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a3
当前状态：板测器空闲，这条请求会在本轮轮询启动。
指定用例：test/lit/pto/fa_perf

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

reedhecre · 2026-04-30T07:37:24Z

A3 板测成功

触发方式：manual
源码提交：10c08b6ca232
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260430_153506_manual_pr609.log
结果 TSV：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260430_153506_manual_pr609.tsv
手动指令：/run a3 test/lit/pto/fa_perf
触发人：zhangstevenunity
指定用例：test/lit/pto/fa_perf
触发评论：Add FA PTO lit regression cases #609 (comment)

HecreReed · 2026-04-30T07:39:27Z

/run a3 test/lit/pto/fa.pto

reedhecre · 2026-04-30T07:40:04Z

已接收 /run a3 test/lit/pto/fa，A3 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a3
当前状态：板测器空闲，这条请求会在本轮轮询启动。
指定用例：test/lit/pto/fa

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

gemini-code-assist

Code Review

This pull request introduces new test files for the pto service. The review identified critical issues regarding incorrect pipe initialization parameters, specifically the dir_mask and tensor view shapes. Additionally, several opportunities for code cleanup were noted, such as removing redundant constant definitions and moving loop-invariant constants outside of loops.

gemini-code-assist · 2026-04-30T07:42:05Z

+    %qk_slot_desc = pto.make_tensor_view %21, shape = [%c128, %c256], strides = [%c256, %c1] : !pto.tensor_view<128x256xf32>
+    pto.aiv_initialize_pipe{id = 25, dir_mask = 1, slot_size = 131072} (gm_slot_tensor = %qk_slot_desc : !pto.tensor_view<128x256xf32>)
+    %pv_slot_desc = pto.make_tensor_view %22, shape = [%c128, %c128_0], strides = [%c128_0, %c1] : !pto.tensor_view<128x128xf32>
+    pto.aiv_initialize_pipe{id = 27, dir_mask = 1, slot_size = 65536} (gm_slot_tensor = %pv_slot_desc : !pto.tensor_view<128x128xf32>)


The dir_mask for pipe 27 in vector_kernel appears to be incorrect. This kernel acts as a consumer for pipe 27 (as shown by tpop_from_aic), so the dir_mask should be 2 (consumer), not 1 (producer).

pto.aiv_initialize_pipe{id = 27, dir_mask = 2, slot_size = 65536} (gm_slot_tensor = %pv_slot_desc : !pto.tensor_view<128x128xf32>)

gemini-code-assist · 2026-04-30T07:42:06Z

+    %pv_slot_desc = pto.make_tensor_view %22, shape = [%c64, %c128_0], strides = [%c128_0, %c1] : !pto.tensor_view<64x128xf32>
+    pto.aiv_initialize_pipe{id = 27, dir_mask = 1, slot_size = 65536} (gm_slot_tensor = %pv_slot_desc : !pto.tensor_view<64x128xf32>)


There appear to be two inconsistencies in the initialization of pipe 27:

The dir_mask should be 2 (consumer), not 1, because this kernel consumes from pipe 27 (see tpop_from_aic calls).

The gm_slot_tensor shape is 64x128xf32, which mismatches the producer's (cube_kernel) shape of 128x128xf32 for the same pipe. The global memory layout of a pipe slot should be consistent.

%pv_slot_desc = pto.make_tensor_view %22, shape = [%c128, %c128_0], strides = [%c128_0, %c1] : !pto.tensor_view<128x128xf32> pto.aiv_initialize_pipe{id = 27, dir_mask = 2, slot_size = 65536} (gm_slot_tensor = %pv_slot_desc : !pto.tensor_view<128x128xf32>)

gemini-code-assist · 2026-04-30T07:42:06Z

+      %c256_13 = arith.constant 256 : index
+      %c0_14 = arith.constant 0 : index
+      %51 = arith.addi %c256_13, %c0_14 : index
+      %52 = pto.partition_view %41, offsets = [%c0, %51], sizes = [%c128_0, %c128_1] : !pto.tensor_view<?x?xf16> -> !pto.partition_tensor_view<128x128xf16>


The constant %c256_13 is redefined inside the loop, and then used in a redundant addition with zero. You can simplify this by using the existing %c256 constant (defined at line 9) directly in the pto.partition_view operation.

%52 = pto.partition_view %41, offsets = [%c0, %c256], sizes = [%c128_0, %c128_1] : !pto.tensor_view<?x?xf16> -> !pto.partition_tensor_view<128x128xf16>

gemini-code-assist · 2026-04-30T07:42:06Z

+    %42 = pto.make_tensor_view %arg1, shape = [%c2048, %c128_0], strides = [%c128_0, %c1] : !pto.tensor_view<?x?xf32>
+    scf.for %arg2 = %14 to %18 step %c1 {
+      %43 = arith.muli %arg2, %c128 : index
+      %c394752_i64 = arith.constant 394752 : i64


The constant %c394752_i64 is loop-invariant and can be defined once outside the scf.for loop at line 297. This applies to other similar constant definitions inside this loop as well (e.g., lines 318, 342, 349, etc.).

gemini-code-assist · 2026-04-30T07:42:06Z

+      %c256_13 = arith.constant 256 : index
+      %c0_14 = arith.constant 0 : index
+      %51 = arith.addi %c256_13, %c0_14 : index
+      %52 = pto.partition_view %41, offsets = [%c0, %51], sizes = [%c128_0, %c128_1] : !pto.tensor_view<?x?xf16> -> !pto.partition_tensor_view<128x128xf16>


The constant %c256_13 is redefined inside the loop, and then used in a redundant addition with zero. You can simplify this by using the existing %c256 constant (defined at line 9) directly in the pto.partition_view operation.

%52 = pto.partition_view %41, offsets = [%c0, %c256], sizes = [%c128_0, %c128_1] : !pto.tensor_view<?x?xf16> -> !pto.partition_tensor_view<128x128xf16>

gemini-code-assist · 2026-04-30T07:42:06Z

+    %42 = pto.make_tensor_view %arg1, shape = [%c2048, %c128_0], strides = [%c128_0, %c1] : !pto.tensor_view<?x?xf32>
+    scf.for %arg2 = %14 to %18 step %c1 {
+      %43 = arith.muli %arg2, %c128 : index
+      %c394752_i64 = arith.constant 394752 : i64


The constant %c394752_i64 is loop-invariant and can be defined once outside the scf.for loop at line 340. This applies to other similar constant definitions inside this loop as well.

reedhecre · 2026-04-30T07:42:26Z

A3 板测成功

触发方式：manual
源码提交：10c08b6ca232
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260430_154005_manual_pr609.log
结果 TSV：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260430_154005_manual_pr609.tsv
手动指令：/run a3 test/lit/pto/fa
触发人：HecreReed
指定用例：test/lit/pto/fa
触发评论：Add FA PTO lit regression cases #609 (comment)

reedhecre · 2026-04-30T07:44:38Z

Codex Review

该评论由 review 机器人自动更新。

PR: Add FA PTO lit regression cases #609 Add FA PTO lit regression cases
Author: zhangstevenunity
Base/Head: main / codex/add-fa-lit-tests
Head SHA: 9f12bae0e2ba
Trigger: PR 有新提交
Generated At: 2026-05-07T11:50:41Z
Previous Head SHA: 5100115c10e7
Status: completed

Summary

PR #609 adds FA lit cases, but they only assert successful compilation and do not actually guard the FA preload/split lowering behavior the PR is trying to regress.

Findings

P2 The new FA lit cases are compile-smoke tests only and never assert the fixed lowering behavior test/lit/pto/fa_perf.pto:1

Both test/lit/pto/fa_perf.pto and test/lit/pto/fa_perf_smoke.pto run ptoas --enable-insert-sync and discard the output to /dev/null, so they only catch hard parse/assert failures. The branch history already had to fix FA-specific preload ordering and PV split-mode details, but those are lowering-contract bugs that can still produce compilable output. As written, a future regression that emits the wrong split axis / wrong pipe entry shape / missing inserted syncs will still pass these tests, so the new cases do not actually lock in the regression they were added to cover.

HecreReed · 2026-04-30T07:58:27Z

/run a3 ../lit/pto/fa.pto

reedhecre · 2026-04-30T07:59:03Z

已接收 /run a3 ../lit/pto/fa，A3 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a3
当前状态：板测器空闲，这条请求会在本轮轮询启动。
指定用例：../lit/pto/fa

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

reedhecre · 2026-04-30T08:01:25Z

A3 板测成功

触发方式：manual
源码提交：10c08b6ca232
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260430_155905_manual_pr609.log
结果 TSV：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260430_155905_manual_pr609.tsv
手动指令：/run a3 ../lit/pto/fa
触发人：HecreReed
指定用例：../lit/pto/fa
触发评论：Add FA PTO lit regression cases #609 (comment)

HecreReed · 2026-04-30T08:27:15Z

/run a3 test/lit/pto/fa_perf.pto

reedhecre · 2026-04-30T08:28:03Z

已接收 /run a3 test/lit/pto/fa_perf.pto，A3 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a3
当前状态：板测器空闲，这条请求会在本轮轮询启动。
直接PTO：test/lit/pto/fa_perf.pto

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

reedhecre · 2026-04-30T08:30:18Z

A3 板测失败

触发方式：manual
源码提交：10c08b6ca232
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260430_162805_manual_pr609.log
手动指令：/run a3 test/lit/pto/fa_perf.pto
触发人：HecreReed
直接PTO：test/lit/pto/fa_perf.pto
触发评论：Add FA PTO lit regression cases #609 (comment)
失败阶段：generate-direct-pto-fa_perf / exit=1

日志尾部

/runtime/runs/20260430_162805_manual_pr609/repo/test/lit/pto/fa_perf.pto":664:18): error: unexpected 'addr' operand: only supported when --pto-level=level3
loc("/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_162805_manual_pr609/repo/test/lit/pto/fa_perf.pto":665:18): error: unexpected 'addr' operand: only supported when --pto-level=level3
loc("/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_162805_manual_pr609/repo/test/lit/pto/fa_perf.pto":689:21): error: unexpected 'addr' operand: only supported when --pto-level=level3
loc("/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_162805_manual_pr609/repo/test/lit/pto/fa_perf.pto":690:21): error: unexpected 'addr' operand: only supported when --pto-level=level3
loc("/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_162805_manual_pr609/repo/test/lit/pto/fa_perf.pto":691:21): error: unexpected 'addr' operand: only supported when --pto-level=level3
loc("/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_162805_manual_pr609/repo/test/lit/pto/fa_perf.pto":692:21): error: unexpected 'addr' operand: only supported when --pto-level=level3
loc("/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_162805_manual_pr609/repo/test/lit/pto/fa_perf.pto":693:21): error: unexpected 'addr' operand: only supported when --pto-level=level3
loc("/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_162805_manual_pr609/repo/test/lit/pto/fa_perf.pto":715:13): error: unexpected 'addr' operand: only supported when --pto-level=level3
loc("/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_162805_manual_pr609/repo/test/lit/pto/fa_perf.pto":723:13): error: unexpected 'addr' operand: only supported when --pto-level=level3
===== END STAGE generate-direct-pto-fa_perf rc=1 @ 2026-04-30 16:30:17 =====

HecreReed · 2026-04-30T08:30:52Z

/run a3 test/lit/pto/fa_perf.pto --pto-level=level3

reedhecre · 2026-04-30T08:31:04Z

已接收 /run a3 test/lit/pto/fa_perf.pto --pto-level=level3，A3 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a3
当前状态：板测器空闲，这条请求会在本轮轮询启动。
直接PTO：test/lit/pto/fa_perf.pto
PTOAS 参数：--pto-level=level3

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

reedhecre · 2026-04-30T08:33:25Z

A3 板测失败

触发方式：manual
源码提交：10c08b6ca232
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260430_163105_manual_pr609.log
手动指令：/run a3 test/lit/pto/fa_perf.pto --pto-level=level3
触发人：HecreReed
直接PTO：test/lit/pto/fa_perf.pto
PTOAS 参数：--pto-level=level3
触发评论：Add FA PTO lit regression cases #609 (comment)
失败阶段：board-validation / exit=1

日志尾部

odel: posix
InstalledDir: /usr/local/Ascend/cann-8.5.0/bin
[2026-04-30 16:33:23] ASCEND_HOME_PATH=/usr/local/Ascend/cann-8.5.0
[2026-04-30 16:33:23] Detected A3 board from simulator dir fallback: /usr/local/Ascend/cann-8.5.0/aarch64-linux/simulator/Ascend910B1/lib
[2026-04-30 16:33:23] SIM_SOC_VERSION=Ascend910A
[2026-04-30 16:33:23] PTOAS_BOARD_IS_A3=1
[2026-04-30 16:33:23] === NPU Device Check ===
uid=1038(zhongxuan) gid=1038(zhongxuan) groups=1038(zhongxuan),10(wheel)
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  0 Apr 20 21:14 /dev/davinci0
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  1 Apr 20 21:14 /dev/davinci1
crw-rw---- 1 HwHiAiUser HwHiAiUser 509, 10 Apr 20 21:14 /dev/davinci10
crw-rw---- 1 HwHiAiUser HwHiAiUser 509, 11 Apr 20 21:14 /dev/davinci11
crw-rw---- 1 HwHiAiUser HwHiAiUser 509, 12 Apr 20 21:14 /dev/davinci12
crw-rw---- 1 HwHiAiUser HwHiAiUser 509, 13 Apr 20 21:14 /dev/davinci13
crw-rw---- 1 HwHiAiUser HwHiAiUser 509, 14 Apr 20 21:14 /dev/davinci14
crw-rw---- 1 HwHiAiUser HwHiAiUser 509, 15 Apr 20 21:14 /dev/davinci15
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  2 Apr 20 21:14 /dev/davinci2
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  3 Apr 20 21:14 /dev/davinci3
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  4 Apr 20 21:14 /dev/davinci4
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  5 Apr 20 21:14 /dev/davinci5
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  6 Apr 20 21:14 /dev/davinci6
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  7 Apr 20 21:14 /dev/davinci7
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  8 Apr 20 21:14 /dev/davinci8
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  9 Apr 20 21:14 /dev/davinci9
crw-rw---- 1 HwHiAiUser HwHiAiUser 510,  0 Apr 20 21:14 /dev/davinci_manager
[2026-04-30 16:33:23] ERROR: /dev/davinciauto not found
===== END STAGE board-validation rc=1 @ 2026-04-30 16:33:23 =====

learning-chip · 2026-04-30T08:38:40Z

A3 板测成功

What is the driver program (C++ main entry) to run this on-board test?

I tried launching with torch-npu here in ir_ref/launch_kernel

But got run-time error: (I am using ptoas 0.36 release to generate the cpp)

Traceback (most recent call last):
  File "/workdir/pto-dsl/examples/aot/flash_attention/ir_ref/launch_kernel/./run.py", line 599, in <module>
    main()
  File "/workdir/pto-dsl/examples/aot/flash_attention/ir_ref/launch_kernel/./run.py", line 575, in main
    test_flash(default_lib, device, num_tiles=first_nt)
  File "/workdir/pto-dsl/examples/aot/flash_attention/ir_ref/launch_kernel/./run.py", line 377, in test_flash
    torch.npu.synchronize()
  File "/usr/local/python3.11.14/lib/python3.11/site-packages/torch_npu/npu/utils.py", line 72, in synchronize
    return torch_npu._C._npu_synchronize()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: npuSynchronizeDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:575 NPU function error: AclrtSynchronizeDeviceWithTimeout, error code is 507015
[ERROR] 2026-04-30-08:36:53 (PID:14525, Device:0, RankID:-1) ERR00100 PTA call acl api failed
[Error]: The aicore execution is abnormal. 
        Rectify the fault based on the error information in the ascend log.
EZ9999: Inner Error!
EZ9999[PID: 14525] 2026-04-30-08:36:53.485.397 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 78, there is an exception of fftsplus aivector error, core id is 27, error code = 0, dump info: pc start: 0x1240000035a0, current: 0x124000002284, vec error info: 0x8600008021, mte error info: 0x1e06000084, ifu error info: 0x7fe30f8e00040, ccu error info: 0x40e0080004000097, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000080.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:347]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x6000084, fixp_error1 info: 0x1e, fsmId:0, tslot:6, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:360]
       Kernel task happen error, retCode=0x26, [aicore exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1493]
       rtDeviceSynchronizeWithTimeout execution failed, reason=aicore exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:61]
       wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]

[W430 08:36:53.748261061 compiler_depend.ts:595] Warning: NPU warning, error code is 507015[Error]: 
[Error]: The aicore execution is abnormal. 
        Rectify the fault based on the error information in the ascend log.
EE9999: Inner Error!
EE9999[PID: 14525] 2026-04-30-08:36:53.489.962 (EE9999):  rtDeviceSynchronizeWithTimeout execution failed, reason=aicore exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:61]
        TraceBack (most recent call last):
       wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
 (function npuSynchronizeUsedDevices)
[W430 08:36:53.749083207 compiler_depend.ts:577] Warning: NPU warning, error code is 507015[Error]: 
[Error]: The aicore execution is abnormal. 
        Rectify the fault based on the error information in the ascend log.
EE9999: Inner Error!
EE9999[PID: 14525] 2026-04-30-08:36:53.490.887 (EE9999):  rtDeviceSynchronizeWithTimeout execution failed, reason=aicore exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:61]
        TraceBack (most recent call last):
       wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
 (function npuSynchronizeDevice)

In comparison the manual C++ runs fine cpp_ref/split_pipe

My test environment is this Dockerfile as used by huawei-csl/pto-dsl#130

HecreReed · 2026-04-30T09:01:55Z

/run a3 test/lit/pto/fa_perf.pto --pto-level=level3

reedhecre · 2026-04-30T09:02:04Z

已接收 /run a3 test/lit/pto/fa_perf.pto --pto-level=level3，A3 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a3
当前状态：板测器空闲，这条请求会在本轮轮询启动。
直接PTO：test/lit/pto/fa_perf.pto
PTOAS 参数：--pto-level=level3

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

HecreReed · 2026-04-30T09:02:38Z

A3 板测成功

What is the driver program (C++ main entry) to run this on-board test?

I tried launching with torch-npu here in ir_ref/launch_kernel

But got run-time error: (I am using ptoas 0.36 release to generate the cpp)

Traceback (most recent call last):
  File "/workdir/pto-dsl/examples/aot/flash_attention/ir_ref/launch_kernel/./run.py", line 599, in <module>
    main()
  File "/workdir/pto-dsl/examples/aot/flash_attention/ir_ref/launch_kernel/./run.py", line 575, in main
    test_flash(default_lib, device, num_tiles=first_nt)
  File "/workdir/pto-dsl/examples/aot/flash_attention/ir_ref/launch_kernel/./run.py", line 377, in test_flash
    torch.npu.synchronize()
  File "/usr/local/python3.11.14/lib/python3.11/site-packages/torch_npu/npu/utils.py", line 72, in synchronize
    return torch_npu._C._npu_synchronize()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: npuSynchronizeDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:575 NPU function error: AclrtSynchronizeDeviceWithTimeout, error code is 507015
[ERROR] 2026-04-30-08:36:53 (PID:14525, Device:0, RankID:-1) ERR00100 PTA call acl api failed
[Error]: The aicore execution is abnormal. 
        Rectify the fault based on the error information in the ascend log.
EZ9999: Inner Error!
EZ9999[PID: 14525] 2026-04-30-08:36:53.485.397 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 78, there is an exception of fftsplus aivector error, core id is 27, error code = 0, dump info: pc start: 0x1240000035a0, current: 0x124000002284, vec error info: 0x8600008021, mte error info: 0x1e06000084, ifu error info: 0x7fe30f8e00040, ccu error info: 0x40e0080004000097, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000080.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:347]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x6000084, fixp_error1 info: 0x1e, fsmId:0, tslot:6, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:360]
       Kernel task happen error, retCode=0x26, [aicore exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1493]
       rtDeviceSynchronizeWithTimeout execution failed, reason=aicore exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:61]
       wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]

[W430 08:36:53.748261061 compiler_depend.ts:595] Warning: NPU warning, error code is 507015[Error]: 
[Error]: The aicore execution is abnormal. 
        Rectify the fault based on the error information in the ascend log.
EE9999: Inner Error!
EE9999[PID: 14525] 2026-04-30-08:36:53.489.962 (EE9999):  rtDeviceSynchronizeWithTimeout execution failed, reason=aicore exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:61]
        TraceBack (most recent call last):
       wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
 (function npuSynchronizeUsedDevices)
[W430 08:36:53.749083207 compiler_depend.ts:577] Warning: NPU warning, error code is 507015[Error]: 
[Error]: The aicore execution is abnormal. 
        Rectify the fault based on the error information in the ascend log.
EE9999: Inner Error!
EE9999[PID: 14525] 2026-04-30-08:36:53.490.887 (EE9999):  rtDeviceSynchronizeWithTimeout execution failed, reason=aicore exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:61]
        TraceBack (most recent call last):
       wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
 (function npuSynchronizeDevice)

In comparison the manual C++ runs fine cpp_ref/split_pipe

My test environment is this Dockerfile as used by huawei-csl/pto-dsl#130

not successful just now，bug because github robot

reedhecre · 2026-04-30T09:04:26Z

A3 板测失败

触发方式：manual
源码提交：8624ed577953
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260430_170205_manual_pr609.log
手动指令：/run a3 test/lit/pto/fa_perf.pto --pto-level=level3
触发人：HecreReed
直接PTO：test/lit/pto/fa_perf.pto
PTOAS 参数：--pto-level=level3
触发评论：Add FA PTO lit regression cases #609 (comment)
失败阶段：board-validation / exit=1

日志尾部

odel: posix
InstalledDir: /usr/local/Ascend/cann-8.5.0/bin
[2026-04-30 17:04:25] ASCEND_HOME_PATH=/usr/local/Ascend/cann-8.5.0
[2026-04-30 17:04:25] Detected A3 board from simulator dir fallback: /usr/local/Ascend/cann-8.5.0/aarch64-linux/simulator/Ascend910B1/lib
[2026-04-30 17:04:25] SIM_SOC_VERSION=Ascend910A
[2026-04-30 17:04:25] PTOAS_BOARD_IS_A3=1
[2026-04-30 17:04:25] === NPU Device Check ===
uid=1038(zhongxuan) gid=1038(zhongxuan) groups=1038(zhongxuan),10(wheel)
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  0 Apr 20 21:14 /dev/davinci0
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  1 Apr 20 21:14 /dev/davinci1
crw-rw---- 1 HwHiAiUser HwHiAiUser 509, 10 Apr 20 21:14 /dev/davinci10
crw-rw---- 1 HwHiAiUser HwHiAiUser 509, 11 Apr 20 21:14 /dev/davinci11
crw-rw---- 1 HwHiAiUser HwHiAiUser 509, 12 Apr 20 21:14 /dev/davinci12
crw-rw---- 1 HwHiAiUser HwHiAiUser 509, 13 Apr 20 21:14 /dev/davinci13
crw-rw---- 1 HwHiAiUser HwHiAiUser 509, 14 Apr 20 21:14 /dev/davinci14
crw-rw---- 1 HwHiAiUser HwHiAiUser 509, 15 Apr 20 21:14 /dev/davinci15
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  2 Apr 20 21:14 /dev/davinci2
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  3 Apr 20 21:14 /dev/davinci3
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  4 Apr 20 21:14 /dev/davinci4
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  5 Apr 20 21:14 /dev/davinci5
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  6 Apr 20 21:14 /dev/davinci6
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  7 Apr 20 21:14 /dev/davinci7
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  8 Apr 20 21:14 /dev/davinci8
crw-rw---- 1 HwHiAiUser HwHiAiUser 509,  9 Apr 20 21:14 /dev/davinci9
crw-rw---- 1 HwHiAiUser HwHiAiUser 510,  0 Apr 20 21:14 /dev/davinci_manager
[2026-04-30 17:04:25] ERROR: /dev/davinciauto not found
===== END STAGE board-validation rc=1 @ 2026-04-30 17:04:25 =====

HecreReed · 2026-04-30T09:16:39Z

/run a3 test/lit/pto/fa.pto --pto-level=level3

reedhecre · 2026-04-30T09:17:04Z

已接收 /run a3 test/lit/pto/fa.pto --pto-level=level3，A3 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a3
当前状态：板测器空闲，这条请求会在本轮轮询启动。
直接PTO：test/lit/pto/fa.pto
PTOAS 参数：--pto-level=level3

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

reedhecre · 2026-04-30T09:19:18Z

A3 板测失败

触发方式：manual
源码提交：8624ed577953
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260430_171705_manual_pr609.log
手动指令：/run a3 test/lit/pto/fa.pto --pto-level=level3
触发人：HecreReed
直接PTO：test/lit/pto/fa.pto
PTOAS 参数：--pto-level=level3
触发评论：Add FA PTO lit regression cases #609 (comment)
失败阶段：board-validation / exit=127

日志尾部

026-04-30 17:19:15 =====
direct PTO generated: test/lit/pto/fa.pto -> test/samples/ManualPto/fa-pto.cpp testcase=fa
pto-isa vendor cache hit: repo=https://gitcode.com/cann/pto-isa.git requested_commit=662d7f2a916d6bbde3109ce4a16ed5c28f5d900a actual_commit=662d7f2a916d6bbde3109ce4a16ed5c28f5d900a

===== STAGE board-validation @ 2026-04-30 17:19:16 =====
task-submit cwd=/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_171705_manual_pr609/payload
task-submit env-file=/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_171705_manual_pr609/board-validation.env
task-submit run-script:
set -euo pipefail
cd /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_171705_manual_pr609/payload
export DEVICE_ID=${TASK_DEVICE:-auto}
bash ./test/npu_validation/scripts/run_remote_npu_validation.sh
task-submit wrapped-command: bash -lc "set -euo pipefail; cd /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_171705_manual_pr609/payload; export DEVICE_ID=${TASK_DEVICE:-auto}; bash ./test/npu_validation/scripts/run_remote_npu_validation.sh"
task-submit submit-cmd: /usr/local/bin/task-submit --device auto --max-time 0 --env-file /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_171705_manual_pr609/board-validation.env 'bash -lc "set -euo pipefail; cd /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_171705_manual_pr609/payload; export DEVICE_ID=${TASK_DEVICE:-auto}; bash ./test/npu_validation/scripts/run_remote_npu_validation.sh"'
task_20260430_171916_34230069245
task-submit task-id: task_20260430_171916_34230069245
等待任务执行: task_20260430_171916_34230069245 (Ctrl+C 终止任务)
/bin/bash: line 1: npu-lock: command not found
=== 任务失败 (exit=127) ===
task-submit wait rc=127
completed (exit=127)
===== END STAGE board-validation rc=127 @ 2026-04-30 17:19:17 =====

HecreReed · 2026-04-30T09:25:19Z

/run a3 test/lit/pto/fa.pto --pto-level=level3

reedhecre · 2026-04-30T09:26:04Z

已接收 /run a3 test/lit/pto/fa.pto --pto-level=level3，A3 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a3
当前状态：板测器空闲，这条请求会在本轮轮询启动。
直接PTO：test/lit/pto/fa.pto
PTOAS 参数：--pto-level=level3

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

reedhecre · 2026-04-30T09:28:26Z

A3 板测失败

触发方式：manual
源码提交：8624ed577953
结果汇总：OK 0 / FAIL 1 / SKIP 0
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260430_172606_manual_pr609.log
手动指令：/run a3 test/lit/pto/fa.pto --pto-level=level3
触发人：HecreReed
直接PTO：test/lit/pto/fa.pto
PTOAS 参数：--pto-level=level3
触发评论：Add FA PTO lit regression cases #609 (comment)
失败阶段：board-validation / exit=1

失败用例

fa (run, exit=2)

reedhecre · 2026-04-30T09:28:28Z

A3 板测失败详情：PR #609

fa

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:201:5: error: use of undeclared identifier 'TALLOC'
    TALLOC<TPipe<0, Direction::DIR_C2V, 131072, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 256>, pto::Stride<32768, 32768, 32768, 256, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v40, v75);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:239:5: error: use of undeclared identifier 'TALLOC'
    TALLOC<TPipe<0, Direction::DIR_C2V, 131072, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 256>, pto::Stride<32768, 32768, 32768, 256, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v40, v86);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:279:7: error: use of undeclared identifier 'TALLOC'
      TALLOC<TPipe<2, Direction::DIR_C2V, 65536, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 128>, pto::Stride<16384, 16384, 16384, 128, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v43, v102);
      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:317:7: error: use of undeclared identifier 'TALLOC'
      TALLOC<TPipe<0, Direction::DIR_C2V, 131072, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 256>, pto::Stride<32768, 32768, 32768, 256, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v40, v113);
      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:348:7: error: use of undeclared identifier 'TALLOC'
      TALLOC<TPipe<2, Direction::DIR_C2V, 65536, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 128>, pto::Stride<16384, 16384, 16384, 128, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v43, v123);
      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:386:7: error: use of undeclared identifier 'TALLOC'
      TALLOC<TPipe<0, Direction::DIR_C2V, 131072, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 256>, pto::Stride<32768, 32768, 32768, 256, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v40, v134);
      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:419:5: error: use of undeclared identifier 'TALLOC'
    TALLOC<TPipe<2, Direction::DIR_C2V, 65536, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 128>, pto::Stride<16384, 16384, 16384, 128, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v43, v143);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:443:5: error: use of undeclared identifier 'TALLOC'
    TALLOC<TPipe<2, Direction::DIR_C2V, 65536, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 128>, pto::Stride<16384, 16384, 16384, 128, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v43, v149);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:576:5: error: no matching function for call to 'TFREE'
    TFREE<TPipe<0, Direction::DIR_C2V, 131072, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 256>, pto::Stride<32768, 32768, 32768, 256, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v43, v62);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1728:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Split'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1736:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'WaitEvents'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:620:5: error: no matching function for call to 'TFREE'
    TFREE<TPipe<0, Direction::DIR_C2V, 131072, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 256>, pto::Stride<32768, 32768, 32768, 256, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v43, v70);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1728:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Split'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1736:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'WaitEvents'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:635:5: error: no matching function for call to 'TFREE'
    TFREE<TPipe<2, Direction::DIR_C2V, 65536, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 128>, pto::Stride<16384, 16384, 16384, 128, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v46, v79);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1728:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Split'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1736:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'WaitEvents'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:675:5: error: no matching function for call to 'TFREE'
    TFREE<TPipe<0, Direction::DIR_C2V, 131072, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 256>, pto::Stride<32768, 32768, 32768, 256, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v43, v85);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1728:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Split'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1736:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'WaitEvents'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:692:5: error: no matching function for call to 'TFREE'
    TFREE<TPipe<2, Direction::DIR_C2V, 65536, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 128>, pto::Stride<16384, 16384, 16384, 128, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v46, v92);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1728:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Split'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1736:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'WaitEvents'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:730:5: error: no matching function for call to 'TFREE'
    TFREE<TPipe<0, Direction::DIR_C2V, 131072, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 256>, pto::Stride<32768, 32768, 32768, 256, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v43, v98);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1728:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Split'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1736:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'WaitEvents'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:750:7: error: no matching function for call to 'TFREE'
      TFREE<TPipe<2, Direction::DIR_C2V, 65536, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 128>, pto::Stride<16384, 16384, 16384, 128, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v46, v105);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1728:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Split'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1736:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'WaitEvents'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:788:7: error: no matching function for call to 'TFREE'
      TFREE<TPipe<0, Direction::DIR_C2V, 131072, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 256>, pto::Stride<32768, 32768, 32768, 256, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v43, v111);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1728:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Split'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1736:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'WaitEvents'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:805:7: error: no matching function for call to 'TFREE'
      TFREE<TPipe<2, Direction::DIR_C2V, 65536, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 128>, pto::Stride<16384, 16384, 16384, 128, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v46, v117);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1728:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Split'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1736:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'WaitEvents'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:843:7: error: no matching function for call to 'TFREE'
      TFREE<TPipe<0, Direction::DIR_C2V, 131072, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 256>, pto::Stride<32768, 32768, 32768, 256, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v43, v123);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1728:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Split'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1736:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'WaitEvents'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/npu_validation/ManualPto/fa/fa_kernel.cpp:863:5: error: no matching function for call to 'TFREE'
    TFREE<TPipe<2, Direction::DIR_C2V, 65536, 8, 8, true>, GlobalTensor<float, pto::Shape<1, 1, 1, 128, 128>, pto::Stride<16384, 16384, 16384, 128, 1>, pto::Layout::ND>, TileSplitAxis::TILE_NO_SPLIT>(v46, v129);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1728:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Split'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/payload/pto-isa/include/pto/common/pto_instr.hpp:1736:22: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'WaitEvents'
PTO_INST RecordEvent TFREE(Pipe &pipe, WaitEvents &... events)
                     ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
gmake[2]: *** [CMakeFiles/fa_kernel.dir/build.make:76: CMakeFiles/fa_kernel.dir/fa_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/fa_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-04-30 17:28:23] ERROR: testcase failed (exit 2): fa
[2026-04-30 17:28:23] === SUMMARY ===
[2026-04-30 17:28:23] OK=0 FAIL=1 SKIP=0
[2026-04-30 17:28:23] RESULTS_TSV=/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260430_172606_manual_pr609/remote_npu_validation_results.tsv
[npu-lock] 已释放设备 4 的锁
=== 任务失败 (exit=1) ===
task-submit wait rc=1
completed (exit=1)

zhangstevenunity · 2026-04-30T12:36:19Z

pto-isa-feature-subtile/tests/npu/a2a3/src/st/testcase/fa_ptoas_gm_pipe_smoke/main.cpp
/**
Copyright (c) 2026 Huawei Technologies Co., Ltd.
This program is free software, you can redistribute it and/or modify it under the terms and conditions of
CANN Open Software License Agreement Version 2.0 (the "License").
Please refer to the License for details. You may not use this file except in compliance with the License.
THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
See LICENSE in the root of the software repository for the full text of the License.
*/

#include "acl/acl.h"
#include "runtime/rt.h"
#include "test_common.h"
#include <gtest/gtest.h>

using namespace std;
using namespace PtoTestCommon;

template <int32_t tilingKey>
void LaunchFaPtoasGmPipe(uint8_t *ffts, uint8_t *q, uint8_t *kt, uint8_t *v, uint8_t *pFifo, uint8_t *out,
uint8_t *qkFifo, uint8_t *pvFifo, void *stream);

class FaPtoasGmPipeTest : public testing::Test {
protected:
void SetUp() override {}
void TearDown() override {}
};

static std::string GetGoldenDir()
{
const testing::TestInfo *testInfo = testing::UnitTest::GetInstance()->current_test_info();
return "../" + std::string(testInfo->test_suite_name()) + "." + testInfo->name();
}

TEST_F(FaPtoasGmPipeTest, case_half_128x4096)
{
constexpr uint32_t s0 = 128;
constexpr uint32_t s1 = 4096;
constexpr uint32_t head = 128;
constexpr uint32_t blocks = 1;
constexpr size_t blockPFifoHalf = 262144;
constexpr size_t blockQkFifoFloats = 262144;
constexpr size_t blockPvFifoFloats = 131072;

size_t qSize = static_cast<size_t>(s0) * head * sizeof(aclFloat16);
size_t ktSize = static_cast<size_t>(s1) * head * sizeof(aclFloat16);
size_t vSize = static_cast<size_t>(s1) * head * sizeof(aclFloat16);
size_t outElems = static_cast<size_t>(s0) * head;
size_t outSize = outElems * sizeof(float);
size_t pFifoSize = static_cast<size_t>(blocks) * blockPFifoHalf * sizeof(aclFloat16);
size_t qkFifoSize = static_cast<size_t>(blocks) * blockQkFifoFloats * sizeof(float);
size_t pvFifoSize = static_cast<size_t>(blocks) * blockPvFifoFloats * sizeof(float);

aclInit(nullptr);
aclrtSetDevice(0);

aclrtStream stream;
aclrtCreateStream(&stream);

uint8_t *qHost = nullptr;
uint8_t *ktHost = nullptr;
uint8_t *vHost = nullptr;
uint8_t *outHost = nullptr;
aclrtMallocHost(reinterpret_cast<void **>(&qHost), qSize);
aclrtMallocHost(reinterpret_cast<void **>(&ktHost), ktSize);
aclrtMallocHost(reinterpret_cast<void **>(&vHost), vSize);
aclrtMallocHost(reinterpret_cast<void **>(&outHost), outSize);

uint8_t *qDevice = nullptr;
uint8_t *ktDevice = nullptr;
uint8_t *vDevice = nullptr;
uint8_t *outDevice = nullptr;
uint8_t *pFifoDevice = nullptr;
uint8_t *qkFifoDevice = nullptr;
uint8_t *pvFifoDevice = nullptr;
aclrtMalloc(reinterpret_cast<void **>(&qDevice), qSize, ACL_MEM_MALLOC_HUGE_FIRST);
aclrtMalloc(reinterpret_cast<void **>(&ktDevice), ktSize, ACL_MEM_MALLOC_HUGE_FIRST);
aclrtMalloc(reinterpret_cast<void **>(&vDevice), vSize, ACL_MEM_MALLOC_HUGE_FIRST);
aclrtMalloc(reinterpret_cast<void **>(&outDevice), outSize, ACL_MEM_MALLOC_HUGE_FIRST);
aclrtMalloc(reinterpret_cast<void **>(&pFifoDevice), pFifoSize, ACL_MEM_MALLOC_HUGE_FIRST);
aclrtMalloc(reinterpret_cast<void **>(&qkFifoDevice), qkFifoSize, ACL_MEM_MALLOC_HUGE_FIRST);
aclrtMalloc(reinterpret_cast<void **>(&pvFifoDevice), pvFifoSize, ACL_MEM_MALLOC_HUGE_FIRST);

ReadFile(GetGoldenDir() + "/q.bin", qSize, qHost, qSize);
ReadFile(GetGoldenDir() + "/kt.bin", ktSize, ktHost, ktSize);
ReadFile(GetGoldenDir() + "/v.bin", vSize, vHost, vSize);

aclrtMemcpy(qDevice, qSize, qHost, qSize, ACL_MEMCPY_HOST_TO_DEVICE);
aclrtMemcpy(ktDevice, ktSize, ktHost, ktSize, ACL_MEMCPY_HOST_TO_DEVICE);
aclrtMemcpy(vDevice, vSize, vHost, vSize, ACL_MEMCPY_HOST_TO_DEVICE);
aclrtMemset(outDevice, outSize, 0, outSize);
aclrtMemset(pFifoDevice, pFifoSize, 0, pFifoSize);
aclrtMemset(qkFifoDevice, qkFifoSize, 0, qkFifoSize);
aclrtMemset(pvFifoDevice, pvFifoSize, 0, pvFifoSize);

uint64_t ffts = 0;
uint32_t fftsLen = 0;
rtGetC2cCtrlAddr(&ffts, &fftsLen);

LaunchFaPtoasGmPipe<1>(reinterpret_cast<uint8_t *>(ffts), qDevice, ktDevice, vDevice, pFifoDevice, outDevice,
                       qkFifoDevice, pvFifoDevice, stream);
aclrtSynchronizeStream(stream);

aclrtMemcpy(outHost, outSize, outDevice, outSize, ACL_MEMCPY_DEVICE_TO_HOST);
WriteFile(GetGoldenDir() + "/output.bin", outHost, outSize);

std::vector<float> golden(outElems);
std::vector<float> actual(outElems);
ReadFile(GetGoldenDir() + "/golden.bin", outSize, golden.data(), outSize);
ReadFile(GetGoldenDir() + "/output.bin", outSize, actual.data(), outSize);

bool ret = ResultCmp<float>(golden, actual, 0.01f);

aclrtFree(qDevice);
aclrtFree(ktDevice);
aclrtFree(vDevice);
aclrtFree(outDevice);
aclrtFree(pFifoDevice);
aclrtFree(qkFifoDevice);
aclrtFree(pvFifoDevice);
aclrtFreeHost(qHost);
aclrtFreeHost(ktHost);
aclrtFreeHost(vHost);
aclrtFreeHost(outHost);
aclrtDestroyStream(stream);
aclrtResetDevice(0);
aclFinalize();

EXPECT_TRUE(ret);

}

zhangstevenunity · 2026-04-30T12:37:27Z

pto-isa-feature-subtile/tests/npu/a2a3/src/st/testcase/fa_ptoas_gm_pipe_smoke/fa_ptoas_gm_pipe_smoke_kernel.cpp
/**
Copyright (c) 2026 Huawei Technologies Co., Ltd.
This program is free software, you can redistribute it and/or modify it under the terms and conditions of
CANN Open Software License Agreement Version 2.0 (the "License").
Please refer to the License for details. You may not use this file except in compliance with the License.
THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
See LICENSE in the root of the software repository for the full text of the License.
*/

#include "fa_perf_smoke_c220.inc"

template <int32_t tilingKey>
void LaunchFaPtoasGmPipe(uint8_t *ffts, uint8_t *q, uint8_t *kt, uint8_t *v, uint8_t *pFifo, uint8_t *out,
uint8_t *qkFifo, uint8_t *pvFifo, void *stream)
{
if constexpr (tilingKey == 1) {
call_both<<<1, nullptr, stream>>>(
reinterpret_cast<int64_t *>(ffts), reinterpret_cast<half *>(q), reinterpret_cast<half *>(kt),
reinterpret_cast<half *>(v), reinterpret_cast<half *>(pFifo), reinterpret_cast<float *>(out),
reinterpret_cast<float *>(qkFifo), reinterpret_cast<float *>(pvFifo));
}
}

template void LaunchFaPtoasGmPipe<1>(uint8_t *ffts, uint8_t *q, uint8_t *kt, uint8_t *v, uint8_t *pFifo,
uint8_t *out, uint8_t *qkFifo, uint8_t *pvFifo, void *stream);

HecreReed · 2026-05-07T11:22:17Z

/run a5

reedhecre · 2026-05-07T11:26:59Z

已接收 /run a5，A5 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a5
当前状态：板测器空闲，这条请求会在本轮轮询启动。

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

reedhecre · 2026-05-07T11:31:29Z

A5 板测失败

触发方式：manual
源码提交：1c38346a560b
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/root/ptoas-board-monitor-a5/logs/20260507_192705_manual_pr609.log
手动指令：/run a5
触发人：HecreReed
触发评论：Add FA PTO lit regression cases #609 (comment)
失败阶段：sample-build-and-test / exit=1

日志尾部

erated: test_intercore_sync_a5_functional-pto.cpp
Sync(test_intercore_sync_a5_ptoisa_vec.py) OK   generated: test_intercore_sync_a5_ptoisa_vec-pto.cpp
Sync(test_intercore_sync_a5.py) OK   generated: test_intercore_sync_a5-pto.cpp
Sync(test_mem_inject_sync_basic.py) OK   generated: test_mem_inject_sync_basic-pto.cpp
Sync(test_set_wait_unified_api.py) OK   generated: test_set_wait_unified_api-pto.cpp
Sync(test_tmov_col_major_16x1_align_a5.pto) OK   generated: test_tmov_col_major_16x1_align_a5.cpp
Sync(test_tmov_col_major_16x1_align_a5.py) OK   generated: test_tmov_col_major_16x1_align_a5-pto.cpp
Sync(test_tmov_row_major_1x16_control_a5.pto) OK   generated: test_tmov_row_major_1x16_control_a5.cpp
Sync(test_tmov_row_major_1x16_control_a5.py) OK   generated: test_tmov_row_major_1x16_control_a5-pto.cpp
Sync(tmatmulk_autosync_a5.py) OK   generated: tmatmulk_autosync_a5-pto.cpp
TileSetGetValue(tile_getval_mat_invalid.py) XFAIL python failed as expected
TileSetGetValue(tileSetGetValue.py) OK   generated: tileSetGetValue-pto.cpp
TInsert(tinsert_fp.py) OK   generated: tinsert_fp-pto.cpp
TInsert(tinsert.py) OK   generated: tinsert-pto.cpp
TPrefetch(tprefetch.py) OK   generated: tprefetch-pto.cpp
Trans(trans.py) OK   generated: trans-pto.cpp
Trap(trap.py) OK   generated: trap-pto.cpp
TTri(ttri.py) OK   generated: ttri-pto.cpp
VectorAddition(vadd_pto_ir.py) OK   generated: vadd_pto_ir-pto.cpp
VectorAddition(vadd_validshape_hyper.py) OK   generated: vadd_validshape_hyper-pto.cpp
VectorAddition(vectorAddition.py) OK   generated: vectorAddition-pto.cpp
Xors(xors.py) OK   generated: xors-pto.cpp
Xor(xor.py)  OK   generated: xor-pto.cpp
-----------------------------
OK=217  FAIL=2  SKIP=21
=============================
===== END STAGE sample-build-and-test rc=1 @ 2026-05-07 19:31:33 =====

HecreReed · 2026-05-07T11:33:46Z

/run a5

reedhecre · 2026-05-07T11:33:58Z

已接收 /run a5，A5 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a5
当前状态：板测器空闲，这条请求会在本轮轮询启动。

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

reedhecre · 2026-05-07T11:38:28Z

A5 板测失败

触发方式：manual
源码提交：1c38346a560b
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/root/ptoas-board-monitor-a5/logs/20260507_193405_manual_pr609.log
手动指令：/run a5
触发人：HecreReed
触发评论：Add FA PTO lit regression cases #609 (comment)
失败阶段：sample-build-and-test / exit=1

日志尾部

erated: test_intercore_sync_a5_functional-pto.cpp
Sync(test_intercore_sync_a5_ptoisa_vec.py) OK   generated: test_intercore_sync_a5_ptoisa_vec-pto.cpp
Sync(test_intercore_sync_a5.py) OK   generated: test_intercore_sync_a5-pto.cpp
Sync(test_mem_inject_sync_basic.py) OK   generated: test_mem_inject_sync_basic-pto.cpp
Sync(test_set_wait_unified_api.py) OK   generated: test_set_wait_unified_api-pto.cpp
Sync(test_tmov_col_major_16x1_align_a5.pto) OK   generated: test_tmov_col_major_16x1_align_a5.cpp
Sync(test_tmov_col_major_16x1_align_a5.py) OK   generated: test_tmov_col_major_16x1_align_a5-pto.cpp
Sync(test_tmov_row_major_1x16_control_a5.pto) OK   generated: test_tmov_row_major_1x16_control_a5.cpp
Sync(test_tmov_row_major_1x16_control_a5.py) OK   generated: test_tmov_row_major_1x16_control_a5-pto.cpp
Sync(tmatmulk_autosync_a5.py) OK   generated: tmatmulk_autosync_a5-pto.cpp
TileSetGetValue(tile_getval_mat_invalid.py) XFAIL python failed as expected
TileSetGetValue(tileSetGetValue.py) OK   generated: tileSetGetValue-pto.cpp
TInsert(tinsert_fp.py) OK   generated: tinsert_fp-pto.cpp
TInsert(tinsert.py) OK   generated: tinsert-pto.cpp
TPrefetch(tprefetch.py) OK   generated: tprefetch-pto.cpp
Trans(trans.py) OK   generated: trans-pto.cpp
Trap(trap.py) OK   generated: trap-pto.cpp
TTri(ttri.py) OK   generated: ttri-pto.cpp
VectorAddition(vadd_pto_ir.py) OK   generated: vadd_pto_ir-pto.cpp
VectorAddition(vadd_validshape_hyper.py) OK   generated: vadd_validshape_hyper-pto.cpp
VectorAddition(vectorAddition.py) OK   generated: vectorAddition-pto.cpp
Xors(xors.py) OK   generated: xors-pto.cpp
Xor(xor.py)  OK   generated: xor-pto.cpp
-----------------------------
OK=217  FAIL=2  SKIP=21
=============================
===== END STAGE sample-build-and-test rc=1 @ 2026-05-07 19:38:33 =====

Add FA PTO lit regression cases

6e729da

zhangstevenunity marked this pull request as ready for review April 30, 2026 07:32

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

Fix FA perf PV pipe split mode

45f82f6

Add FA perf smoke PTO lit case

87a07ef

Delete test/lit/pto/fa.pto

5100115

MirkoDeVita98 mentioned this pull request May 6, 2026

[Bug] FA lit regression test with S1_TILE=512 crashes at runtime, splitting to S1_TILE=256 works #629

Open

Align FA perf PTO preload schedule

9f12bae

		%pv_slot_desc = pto.make_tensor_view %22, shape = [%c64, %c128_0], strides = [%c128_0, %c1] : !pto.tensor_view<64x128xf32>
		pto.aiv_initialize_pipe{id = 27, dir_mask = 1, slot_size = 65536} (gm_slot_tensor = %pv_slot_desc : !pto.tensor_view<64x128xf32>)

Conversation

zhangstevenunity commented Apr 30, 2026

Uh oh!

zhangstevenunity commented Apr 30, 2026

Uh oh!

reedhecre commented Apr 30, 2026

Uh oh!

reedhecre commented Apr 30, 2026

A3 板测成功

Uh oh!

HecreReed commented Apr 30, 2026

Uh oh!

reedhecre commented Apr 30, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

reedhecre commented Apr 30, 2026

A3 板测成功

Uh oh!

reedhecre commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codex Review

Summary

Findings

Uh oh!

HecreReed commented Apr 30, 2026

Uh oh!

reedhecre commented Apr 30, 2026

Uh oh!

reedhecre commented Apr 30, 2026

A3 板测成功

Uh oh!

HecreReed commented Apr 30, 2026

Uh oh!

reedhecre commented Apr 30, 2026

Uh oh!

reedhecre commented Apr 30, 2026

A3 板测失败

日志尾部

Uh oh!

HecreReed commented Apr 30, 2026

Uh oh!

reedhecre commented Apr 30, 2026

Uh oh!

reedhecre commented Apr 30, 2026

A3 板测失败

日志尾部

Uh oh!

learning-chip commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HecreReed commented Apr 30, 2026

Uh oh!

reedhecre commented Apr 30, 2026

Uh oh!

HecreReed commented Apr 30, 2026

Uh oh!

reedhecre commented Apr 30, 2026

A3 板测失败

日志尾部

reedhecre commented Apr 30, 2026 •

edited

Loading

learning-chip commented Apr 30, 2026 •

edited

Loading