Skip to content

Fix redundant sync pruning by pipe pair#627

Merged
zhangstevenunity merged 1 commit intohw-native-sys:mainfrom
TaoTao-real:codex/fix-remove-redundant-pipepair
May 9, 2026
Merged

Fix redundant sync pruning by pipe pair#627
zhangstevenunity merged 1 commit intohw-native-sys:mainfrom
TaoTao-real:codex/fix-remove-redundant-pipepair

Conversation

@TaoTao-real
Copy link
Copy Markdown
Contributor

@TaoTao-real TaoTao-real commented May 6, 2026

Summary

  • Fix redundant sync pruning so it reasons by pipe pair semantics rather than requiring identical dependency roots.
  • Add a focused regression for redundant wait/set removal on the same pipe pair.
  • Refresh the issue564 regression expectation to match the corrected pruning behavior.

Motivation

  • Fixes the case where an inner complete sync pair should cover an outer sync on the same pipe pair even when their recorded dependency roots differ.
  • The old pruning logic was too conservative:
    • it required eventIdNum == 1
    • it required identical dependency roots via hasSameSyncDepRoots(...)
    • it required the same forEndIndex
  • That prevented removal of truly redundant syncs whose semantics are already fully covered by another sync pair on the same (srcPipe, dstPipe) path.

Design

  • Keep compensation syncs protected.
  • Remove the root-buffer equality requirement during redundant sync matching.
  • Relax matching so it is based on pipe-pair semantics:
    • same srcPipe
    • same dstPipe
    • compatible eventIdNum (relatedSync->eventIdNum > setFlag->eventIdNum still rejected)
  • Update the design note to document that wait redundancy pruning is based on pipe pair semantics instead of exact dep-root equality.
  • Add issue226_remove_redundant_pipe_pair.pto to guard the intended behavior.

Testing

  • Added regression: test/lit/pto/issue226_remove_redundant_pipe_pair.pto
  • Updated regression: test/lit/pto/issue564_k_loop_mte1_mte2_wait_regression.pto

Risk / Rollback

  • Risk: broader redundant-sync pruning could remove syncs that were previously kept conservatively.
  • Rollback: revert commit fdcd8cb.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the RemoveRedundantSync pass by relaxing the criteria for pruning redundant synchronization operations. Specifically, it removes the requirement for matching depRootBuffers and allows pruning of multi-buffer synchronizations based on pipe pair semantics, provided the inner synchronization's eventIdNum is not greater than the outer one. The changes include updates to the design documentation, header comments, and the implementation in RemoveRedundantSync.cpp, along with a new regression test and updates to existing test expectations. I have no feedback to provide.

@reedhecre
Copy link
Copy Markdown

reedhecre commented May 6, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: Fix redundant sync pruning by pipe pair #627 Fix redundant sync pruning by pipe pair
  • Author: TaoTao-real
  • Base/Head: main / codex/fix-remove-redundant-pipepair
  • Head SHA: fdcd8cb472ea
  • Trigger: PR 有新提交
  • Generated At: 2026-05-08T08:54:42Z
  • Previous Head SHA: 0a20cb73ae2d
  • Status: completed

Summary

Broadened pipe-pair pruning can incorrectly remove loop-tagged sync pairs that still drive loop-specific event allocation/backward-match generation.

Findings

  1. P2 Loop-tagged sync pairs can now be pruned by ordinary same-pipe pairs lib/PTO/Transforms/InsertSync/RemoveRedundantSync.cpp:216

InsertBackForSync emits real sync pairs with forEndIndex before redundant-pruning runs, and SyncEventIdAllocation later treats those pairs specially: it extends their lifetime scope and generates backward-match/head-tail sync from them. After this PR, CanMatchedSync no longer requires relatedSync->GetForEndIndex() == setFlag->GetForEndIndex(), so a plain inner same-pipe pair can now mark one of those loop-tagged pairs uselessSync. Once that happens, AllocateEventId skips the pair entirely, so the loop-specific carry-sync path is lost. This is a correctness risk for loop-carried deps that share a pipe pair with an ordinary local dep.

@TaoTao-real TaoTao-real force-pushed the codex/fix-remove-redundant-pipepair branch 2 times, most recently from 78db797 to fdcd8cb Compare May 8, 2026 08:03
@TaoTao-real
Copy link
Copy Markdown
Contributor Author

I re-checked this review finding against the current PR head (fdcd8cb472ea) and I think this is not an actual blocker for the current implementation.

The concern would be valid if this PR allowed redundant-sync pruning to scan around loop back-edges. In that case, a loop-tagged forEndIndex pair could indeed be incorrectly covered by an ordinary same-pipe pair, and then SyncEventIdAllocation would skip the uselessSync pair before generating the loop head/tail backward-match syncs.

However, the current PR explicitly disables that path:

if (begin < end) {
  return CheckRepeatSync(begin, end, syncFinder, setFlag);
} else {
  // Back-edge pruning is intentionally disabled in correctness-first mode.
  (void)forEndIndex;
  return false;
}

So real loop-carried/back-edge pairs (setIndex >= waitIndex) are not pruned by RemoveRedundantSync in this PR.

I also validated this on a clean worktree checked out at fdcd8cb472ea:

  • Rebuilt ptoas locally from the PR head.
  • issue564_k_loop_mte1_mte2_wait_regression.pto passes FileCheck.
  • issue454_nested_loop_same_pipe_pair_regression.pto passes FileCheck.
  • Scanned all test/lit/pto/*.pto debug output for forward forEnd pairs (forEnd with setIndex < waitIndex); found 0 cases.

The remaining risky shape would be a forward loop-tagged pair like:

set_flag(P, Q, carry_forEnd);
...
set_flag(P, Q, local);
wait_flag(P, Q, local);
...
wait_flag(P, Q, carry_forEnd);

But current InsertSync generation does not appear to produce such forward forEnd pairs in the tested suite. The loop-carried pairs that matter for backward-match/head-tail generation are back-edge-shaped and are protected by the return false above.

So I believe the review warning is a false positive for this PR as currently written. If we later re-enable circular/back-edge redundant-sync pruning, this concern should be revisited with a dedicated regression.

@zhangstevenunity zhangstevenunity merged commit eeeb1f4 into hw-native-sys:main May 9, 2026
22 checks passed
@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:merged
  • 源码提交:eeeb1f4854e9
  • 结果汇总:OK 0 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260509_115907_merged_pr627.log
  • 失败阶段:board-validation-qwen / exit=1

日志尾部

 DEVICE_ID=1
[2026-05-09 12:01:21] PTO_ISA_REPO=https://gitcode.com/cann/pto-isa.git
[2026-05-09 12:01:21] PTO_ISA_COMMIT=662d7f2a916d6bbde3109ce4a16ed5c28f5d900a
[2026-05-09 12:01:21] ROOT_DIR=/tmp/ptoas-board-monitor-a5/runs/20260509_115907_merged_pr627/payload
[2026-05-09 12:01:21] Sourcing /root/.bash_profile
[2026-05-09 12:01:22] Sourcing /root/.bashrc
[2026-05-09 12:01:23] Sourcing /usr/local/Ascend/cann/set_env.sh
[2026-05-09 12:01:23] === Tool Versions ===
root
localhost.localdomain
Linux localhost.localdomain 6.6.0 #2 SMP Mon Feb  2 09:24:17 CST 2026 aarch64 aarch64 aarch64 GNU/Linux
Python 3.13.9
cmake version 3.27.9

CMake suite maintained and supported by Kitware (kitware.com/cmake).
GNU Make 4.4.1
Built for aarch64-openEuler-linux-gnu
Copyright (C) 1988-2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
/usr/local/Ascend/cann-9.0.0/bin/bisheng
2026-03-07T14:16:12+08:00 clang version 15.0.5
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/Ascend/cann-9.0.0/bin
[2026-05-09 12:01:23] ASCEND_HOME_PATH=/usr/local/Ascend/cann-9.0.0
[2026-05-09 12:01:23] Detected A3 board from simulator dir fallback: /usr/local/Ascend/cann-9.0.0/aarch64-linux/simulator/Ascend910B1/lib
[2026-05-09 12:01:23] SIM_SOC_VERSION=Ascend950
[2026-05-09 12:01:23] PTOAS_BOARD_IS_A3=1
[2026-05-09 12:01:23] === NPU Device Check ===
uid=0(root) gid=0(root) groups=0(root),1001(HwHiAiUser)
crw-rw---- 1 HwHiAiUser HwHiAiUser 507, 0 May  9 11:48 /dev/davinci_manager
[2026-05-09 12:01:23] ERROR: /dev/davinci1 not found
===== END STAGE board-validation-qwen rc=1 @ 2026-05-09 12:01:23 =====

@reedhecre
Copy link
Copy Markdown

A3 板测完成(有跳过)

  • 触发方式:merged
  • 源码提交:eeeb1f4854e9
  • 结果汇总:OK 207 / FAIL 0 / SKIP 1
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260509_115205_merged_pr627.log
  • 结果 TSV:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260509_115205_merged_pr627.tsv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants