Skip to content

fix: keep scalar bitwise src/dst in distinct storage#631

Open
HecreReed wants to merge 3 commits intohw-native-sys:mainfrom
HecreReed:codex/issue614-planmemory-distinct
Open

fix: keep scalar bitwise src/dst in distinct storage#631
HecreReed wants to merge 3 commits intohw-native-sys:mainfrom
HecreReed:codex/issue614-planmemory-distinct

Conversation

@HecreReed
Copy link
Copy Markdown
Collaborator

Summary

  • teach PlanMemory to record semantic non-reuse pairs for pto.tands, pto.tors, and pto.txors
  • tighten the shared scalar-bitwise verifier helper so different SSA values with the same proven local tile storage are also rejected
  • add lit coverage for both the PlanMemory reuse regression and the verifier regression

Details

This follows issue #614 direction 2: keep the ISA/frontend distinct-storage constraint and make PTOAS planning honor it.

PlanMemory already had semantic conflict tracking for scratch-vs-dst style cases, but it did not explicitly model the src/dst storage constraint for scalar bitwise ops. That allowed memory planning to potentially co-locate those buffers if liveness alone looked reusable.

The verifier side also only rejected src == dst, which missed aliasing forms such as zero-offset subviews of the same statically addressed local tile.

Validation

  • git diff --check
  • compiled changed translation units in a local validation build:
    • lib/PTO/IR/PTO.cpp.o
    • lib/PTO/Transforms/PTOPlanMemory.cpp.o
  • full local ptoas build is currently blocked by pre-existing unrelated -Werror failures in PTOToEmitC.cpp, PTOTypeDefs.cpp, and PTOSyncUtils.cpp

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logic to ensure that scalar bitwise operations (TAndS, TOrS, TXorS) use distinct local storage for their source and destination operands. This is implemented by adding a static analysis helper to trace tile storage and updating the memory planning pass to record semantic conflicts between these operands. A review comment identifies a critical type mismatch in the IR verification logic where an attribute is passed to a function expecting a value, which would result in a compilation error.

Comment thread lib/PTO/IR/PTO.cpp
auto addressSpace = getPTOMemorySpaceEnum(allocTile.getResult().getType());
if (!isLocalStorageSpace(addressSpace) || !allocTile.getAddr())
return std::nullopt;
auto addr = getConstantIntegerValue(allocTile.getAddr());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The getConstantIntegerValue function expects a Value as an argument, but allocTile.getAddr() returns an IntegerAttr (attribute). This will cause a compilation error. Please use the appropriate way to retrieve the integer value from the attribute, for example, allocTile.getAddr().getInt().

@reedhecre
Copy link
Copy Markdown

reedhecre commented May 7, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: fix: keep scalar bitwise src/dst in distinct storage #631 fix: keep scalar bitwise src/dst in distinct storage
  • Author: HecreReed
  • Base/Head: main / codex/issue614-planmemory-distinct
  • Head SHA: 01c9fe0e4f3b
  • Trigger: PR 有新提交
  • Generated At: 2026-05-07T13:03:08Z
  • Previous Head SHA: 3930c67dfc72
  • Status: completed

Summary

PR #631 introduces a deterministic sample/CI regression by forcing the mgather/mscatter Python samples to emit A5-targeted modules.

Findings

  1. P1 Hard-coding `pto.target_arch = a5` breaks the default sample CI contract for mgather/mscatter test/samples/Mgather/mgather.py:20

Adding m.operation.attributes["pto.target_arch"] = "a5" here (and the parallel change in test/samples/Mscatter/mscatter.py) changes these generators from the default "A3 unless --pto-arch is passed" behavior to "always emit A5 modules". The default sample CI runs bash test/samples/runop.sh --enablebc all with no --pto-arch override (.github/workflows/ci.yml:235-237), and runop.sh still treats mgather/mscatter as A3 XFAILs when no arch flag is present (test/samples/runop.sh:177-184 and :382-385). Because ptoas auto-detects pto.target_arch from the textual module when the CLI arch is absent (tools/ptoas/ptoas.cpp:1003-1046), these samples now compile successfully instead of failing, so the harness reaches expected failure but succeeded (test/samples/runop.sh:470-473). This is a deterministic CI regression in the default sample workflow.

@HecreReed HecreReed marked this pull request as ready for review May 7, 2026 07:07
@HecreReed
Copy link
Copy Markdown
Collaborator Author

/run a5

@reedhecre
Copy link
Copy Markdown

已接收 /run a5,A5 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:manual
  • 源码提交:161897e22d60
  • 结果汇总:OK 0 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260507_194105_manual_pr631.log
  • 手动指令:/run a5
  • 触发人:HecreReed
  • 触发评论:fix: keep scalar bitwise src/dst in distinct storage #631 (comment)
  • 失败阶段:sample-build-and-test / exit=1

日志尾部

erated: test_intercore_sync_a5_functional-pto.cpp
Sync(test_intercore_sync_a5_ptoisa_vec.py) OK   generated: test_intercore_sync_a5_ptoisa_vec-pto.cpp
Sync(test_intercore_sync_a5.py) OK   generated: test_intercore_sync_a5-pto.cpp
Sync(test_mem_inject_sync_basic.py) OK   generated: test_mem_inject_sync_basic-pto.cpp
Sync(test_set_wait_unified_api.py) OK   generated: test_set_wait_unified_api-pto.cpp
Sync(test_tmov_col_major_16x1_align_a5.pto) OK   generated: test_tmov_col_major_16x1_align_a5.cpp
Sync(test_tmov_col_major_16x1_align_a5.py) OK   generated: test_tmov_col_major_16x1_align_a5-pto.cpp
Sync(test_tmov_row_major_1x16_control_a5.pto) OK   generated: test_tmov_row_major_1x16_control_a5.cpp
Sync(test_tmov_row_major_1x16_control_a5.py) OK   generated: test_tmov_row_major_1x16_control_a5-pto.cpp
Sync(tmatmulk_autosync_a5.py) OK   generated: tmatmulk_autosync_a5-pto.cpp
TileSetGetValue(tile_getval_mat_invalid.py) XFAIL python failed as expected
TileSetGetValue(tileSetGetValue.py) OK   generated: tileSetGetValue-pto.cpp
TInsert(tinsert_fp.py) OK   generated: tinsert_fp-pto.cpp
TInsert(tinsert.py) OK   generated: tinsert-pto.cpp
TPrefetch(tprefetch.py) OK   generated: tprefetch-pto.cpp
Trans(trans.py) OK   generated: trans-pto.cpp
Trap(trap.py) OK   generated: trap-pto.cpp
TTri(ttri.py) OK   generated: ttri-pto.cpp
VectorAddition(vadd_pto_ir.py) OK   generated: vadd_pto_ir-pto.cpp
VectorAddition(vadd_validshape_hyper.py) OK   generated: vadd_validshape_hyper-pto.cpp
VectorAddition(vectorAddition.py) OK   generated: vectorAddition-pto.cpp
Xors(xors.py) OK   generated: xors-pto.cpp
Xor(xor.py)  OK   generated: xor-pto.cpp
-----------------------------
OK=217  FAIL=2  SKIP=21
=============================
===== END STAGE sample-build-and-test rc=1 @ 2026-05-07 19:45:33 =====

@HecreReed
Copy link
Copy Markdown
Collaborator Author

/run a5

@reedhecre
Copy link
Copy Markdown

已接收 /run a5,A5 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre
Copy link
Copy Markdown

A5 板测失败

失败用例

  • quant (run, exit=2)
  • quant_asym (run, exit=2)
  • partarg (run, exit=2)
  • mscatter (run, exit=1)
  • mgather (run, exit=1)
  • abs (run, exit=2)

@reedhecre
Copy link
Copy Markdown

A5 板测失败详情:PR #631

quant

stage=run info=exit=2

[ERROR] Mismatch: golden_v3.bin vs v3.bin, max diff=9.0 at idx=206 (golden=-10, out=-1, dtype=int8)
[ERROR] compare failed
[2026-05-07 22:08:06] ERROR: testcase failed (exit 2): quant
quant_asym

stage=run info=exit=2

[ERROR] Mismatch: golden_v4.bin vs v4.bin, max diff=10.0 at idx=721 (golden=0, out=10, dtype=uint8)
[ERROR] compare failed
[2026-05-07 22:08:47] ERROR: testcase failed (exit 2): quant_asym
partarg

stage=run info=exit=2

/tmp/ptoas-board-monitor-a5/runs/20260507_201906_manual_pr631/npu_validation/Partarg/partarg/partarg_kernel.cpp:122:3: error: use of undeclared identifier 'TPARTARGMAX'
  TPARTARGMAX(v38, v34, v35, v39, v36, v37);
  ^
/tmp/ptoas-board-monitor-a5/runs/20260507_201906_manual_pr631/npu_validation/Partarg/partarg/partarg_kernel.cpp:123:3: error: use of undeclared identifier 'TPARTARGMIN'
  TPARTARGMIN(v38, v34, v35, v39, v36, v37);
  ^
2 errors generated.
gmake[2]: *** [CMakeFiles/partarg_kernel.dir/build.make:76: CMakeFiles/partarg_kernel.dir/partarg_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/partarg_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-05-07 22:50:31] ERROR: testcase failed (exit 2): partarg
mscatter

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/tmp/ptoas-board-monitor-a5/runs/20260507_201906_manual_pr631/npu_validation/Mscatter/mscatter/main.cpp:99)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1013085] 2026-05-07-22:55:25.903.120 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 27, there is an aivec error exception, core id is 0, error code = 334, dump info: pc start: 0x100040800000, current: 0x1000408000f0, sc error info: 0xffffffffffff, su error info: 0xe6f7d23d139c7bd7,0xcc3fd0e410009bfd, mte error info: 0x1fd3f5c60007eff1, vec error info: 0x408001e000390037, cube error info: 0, l1 error info: 0, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(334) errorStr: The data returned by the BIU to the VEC is incorrect. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=1, stream_id=61, report_stream_id=61, task_id=0, flip_num=0, fault kernel_name=_Z18mscatter_kernel_2dPiS_S_, fault kernel info ext=_Z18mscatter_kernel_2dPiS_S_, program id=0, hash=279618682955286547.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-07 22:56:01] ERROR: testcase failed (exit 1): mscatter
mgather

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/tmp/ptoas-board-monitor-a5/runs/20260507_201906_manual_pr631/npu_validation/Mgather/mgather/main.cpp:99)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1018553] 2026-05-07-23:00:17.712.265 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 28, there is an aivec error exception, core id is 0, error code = 334, dump info: pc start: 0x100040800000, current: 0x100040800108, sc error info: 0xffffffffffff, su error info: 0xe6f7d23d139c7bd7,0xcc3fd0e410009bfd, mte error info: 0x1fd3f5c60007eff1, vec error info: 0x408001f000390033, cube error info: 0, l1 error info: 0, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(334) errorStr: The data returned by the BIU to the VEC is incorrect. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=1, stream_id=61, report_stream_id=61, task_id=0, flip_num=0, fault kernel_name=_Z17mgather_kernel_2dPiS_S_, fault kernel info ext=_Z17mgather_kernel_2dPiS_S_, program id=0, hash=14980436151442853146.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-07 23:00:53] ERROR: testcase failed (exit 1): mgather
abs

stage=run info=exit=2

[ERROR] Mismatch: golden_v2.bin vs v2.bin, max diff=nan at idx=14 (golden=nan, out=nan, dtype=float32)
[ERROR] compare failed
[2026-05-08 00:03:53] ERROR: testcase failed (exit 2): abs
[2026-05-08 00:03:53] === SUMMARY ===
[2026-05-08 00:03:53] OK=201 FAIL=6 SKIP=1
[2026-05-08 00:03:53] RESULTS_TSV=/tmp/ptoas-board-monitor-a5/runs/20260507_201906_manual_pr631/remote_npu_validation_results.tsv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants