Fix loop-tail synthetic wait ordering#645
Fix loop-tail synthetic wait ordering#645TaoTao-real wants to merge 2 commits intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses synchronization issues where event IDs could be incorrectly reused across outstanding sync pairs, specifically targeting issue #622. The primary change in SyncEventIdAllocation.cpp ensures that synthetic tail waits are inserted at the beginning of the pipeAfter list, allowing them to be emitted before post-loop sets. Additionally, the PR updates several regression tests and introduces a new test case to verify the fix for event ID overlap in mixed-width pipelines. I have no feedback to provide.
Codex Review该评论由 review 机器人自动更新。
Summary发现 1 个 P2 问题:loop-tail synthetic wait 改为前插后,会在 Findings
把 reallocated loop-tail synthetic wait 改成 |
Summary
LOOP_ENDanchor.issue622reproducer and adjust related lit checks to match the repaired ordering.Motivation
QK_PRELOAD=4FlashAttention DSL kernel builds but deadlocks under auto-sync #622.UpdateBackwardMatchSyncappended the synthetic tail wait toLOOP_END.pipeAfter. When the same anchor also carried a sunk post-loop local set reusing the same(srcPipe, dstPipe, eventId), the generated C++ could emit:set(E0)set(E0)wait(E0)wait(E0)set(E0)wait(E0)set(E0)wait(E0)Design
SyncEventIdAllocation::UpdateBackwardMatchSync, insert the synthetic tail wait withpush_frontinstead ofpush_backonLOOP_END.pipeAfter.push_backonLOOP_BEGIN.pipeBefore), which is already the conservative order forPREsyncs.Testing
ptoas --pto-arch=a3 --enable-insert-sync test/lit/pto/issue428_cube_sync_regression.ptoptoas --pto-arch=a3 --pto-level=level3 --enable-insert-sync test/lit/pto/issue564_k_loop_mte1_mte2_wait_regression.ptoptoas --pto-arch=a3 --pto-level=level3 --enable-insert-sync test/lit/pto/issue622_v_mte2_eventid_overlap_reproducer.ptoissue622changes from overlapping same-key use to serialized reuse at loop tail.Risk / Rollback
LOOP_ENDanchor with sunk local sets.