Skip to content

Enhance PlanMemory reuse liveness#608

Open
zhangstevenunity wants to merge 1 commit intomainfrom
codex/plan-memory-reuse-reserve
Open

Enhance PlanMemory reuse liveness#608
zhangstevenunity wants to merge 1 commit intomainfrom
codex/plan-memory-reuse-reserve

Conversation

@zhangstevenunity
Copy link
Copy Markdown
Collaborator

@zhangstevenunity zhangstevenunity commented Apr 30, 2026

Summary

  • split buffer liveness into multiple intervals so write-only overwrites can end the previous live range and allow safe reuse in later stages
  • allow loop-live buffers to delay generation until the first write-only overwrite only when the containing scf.for is statically known to execute at least once
  • add regression coverage for issue601 loop first-write reuse, right-buffer next-write reuse, and dynamic zero-trip loop conservatism

Validation

  • ninja -C build-codex-plan-memory tools/ptoas/ptoas -j8
  • focused llvm-lit for the three reuse/zero-trip tests: PASS
  • full llvm-lit -q build-codex-plan-memory/test/lit: PASS (165 tests)

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements automatic sizing and validation for reserve_buffer operations and improves memory liveness analysis to allow buffer reuse by delaying generation until the first write in loops. It also updates the memory planner to respect reserved capacities and supports multiple lifetime intervals per buffer. A performance concern was raised regarding the quadratic complexity of the function walk used to calculate buffer sizes.

Comment thread lib/PTO/Transforms/PTOPlanMemory.cpp Outdated
Comment on lines +250 to +317
computeAutoReserveBufferSizeBytes(func::FuncOp funcOp,
ReserveBufferOp reserveOp) {
std::optional<int64_t> fifoLocalSizeBytes;
bool failedToCompute = false;

auto updateFromFifo = [&](Operation *op, int64_t slotSizeBytes,
IntegerAttr localSlotNumAttr) -> LogicalResult {
if (!localSlotNumAttr)
return success();

int64_t currentSizeBytes = 0;
if (failed(computeFifoLocalBufferSizeBytes(
op, slotSizeBytes, localSlotNumAttr, currentSizeBytes)))
return failure();

// One reserve_buffer normally feeds one FIFO. If the IR shares it across
// multiple pipe init ops, reserve enough for the largest local buffer.
fifoLocalSizeBytes =
fifoLocalSizeBytes ? std::max(*fifoLocalSizeBytes, currentSizeBytes)
: currentSizeBytes;
return success();
};

WalkResult walkResult = funcOp.walk([&](Operation *op) -> WalkResult {
if (auto initOp = dyn_cast<InitializeL2G2LPipeOp>(op)) {
if (!isReserveBufferAddr(initOp.getLocalAddr(), reserveOp) &&
!isReserveBufferAddr(initOp.getPeerLocalAddr(), reserveOp))
return WalkResult::advance();
if (failed(updateFromFifo(initOp.getOperation(), initOp.getSlotSize(),
initOp.getLocalSlotNumAttr()))) {
failedToCompute = true;
return WalkResult::interrupt();
}
return WalkResult::advance();
}

if (auto initOp = dyn_cast<AicInitializePipeOp>(op)) {
if (!isReserveBufferAddr(initOp.getC2vConsumerBuf(), reserveOp) &&
!isReserveBufferAddr(initOp.getV2cConsumerBuf(), reserveOp))
return WalkResult::advance();
if (failed(updateFromFifo(initOp.getOperation(), initOp.getSlotSize(),
initOp.getLocalSlotNumAttr()))) {
failedToCompute = true;
return WalkResult::interrupt();
}
return WalkResult::advance();
}

if (auto initOp = dyn_cast<AivInitializePipeOp>(op)) {
if (!isReserveBufferAddr(initOp.getC2vConsumerBuf(), reserveOp) &&
!isReserveBufferAddr(initOp.getV2cConsumerBuf(), reserveOp))
return WalkResult::advance();
if (failed(updateFromFifo(initOp.getOperation(), initOp.getSlotSize(),
initOp.getLocalSlotNumAttr()))) {
failedToCompute = true;
return WalkResult::interrupt();
}
return WalkResult::advance();
}

return WalkResult::advance();
});

(void)walkResult;
if (failedToCompute)
return failure();
return fifoLocalSizeBytes.value_or(reserveOp.getSize());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function computeAutoReserveBufferSizeBytes performs a full walk of the funcOp for every reserve_buffer operation. While the number of reserved buffers is typically small, this could lead to quadratic complexity relative to the number of operations in the function. Consider pre-collecting pipe initialization operations if performance becomes an issue in large kernels.

@reedhecre
Copy link
Copy Markdown

reedhecre commented Apr 30, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: Enhance PlanMemory reuse liveness #608 Enhance PlanMemory reuse liveness
  • Author: zhangstevenunity
  • Base/Head: main / codex/plan-memory-reuse-reserve
  • Head SHA: f0ad0d25cbb0
  • Trigger: PR 有新提交
  • Generated At: 2026-04-30T06:45:32Z
  • Previous Head SHA: d7f300fbde91
  • Status: completed

Summary

PR #608 的 overwrite-based reuse 对部分 alias/view 不安全,可能提前释放父 buffer 并造成错误结果。

Findings

  1. P1 view/subview 写入会被误判为整个 buffer 的重新定义 lib/PTO/Transforms/PTOPlanMemory.cpp:749

IsWriteOnlyDpsInitForAlias 只要看到 alias set 中某个值是 write-only 的 PTO_DpsInitOpInterface 就会把它当成“整个 buffer 被重定义”。但这个 pass 现有的 alias 集合已经包含了 memref.subview / reinterpret_cast / bind_tile 等非整块别名,因此写一个 slice/view 也会触发 CanDelayLoopEntryGenUntilFirstWriteCanKillBeforeNextOverwrite 和同 op 的 kill+gen,把父 buffer 的整块地址提前释放出去重用。对于后续还要读取未覆盖区域的场景,这会直接产生静默错误结果。新增测试只覆盖整块写,没有覆盖这个回归面。

@zhangstevenunity zhangstevenunity marked this pull request as ready for review April 30, 2026 06:07
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fbcc7b796c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +773 to +775
CanDelayLoopEntryGenUntilFirstWrite(forOp, Buffer)) {
delayedLoopEntryGenBuffers[Buffer] = true;
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve loop-entry liveness for zero-trip loops

Delaying loop-entry generation here assumes the loop body executes at least once, but scf.for may legally have zero iterations (e.g., dynamic bounds where lb >= ub). In that case no in-loop overwrite occurs, yet this code marks the live-in buffer as delayable and skips generating it at loop entry, which can let PlanMemory reuse that storage too early and corrupt a value that is read after the loop. The optimization should only apply when non-empty execution is provable, or else keep the loop-entry lifetime.

Useful? React with 👍 / 👎.

@zhangstevenunity zhangstevenunity force-pushed the codex/plan-memory-reuse-reserve branch from fbcc7b7 to d7f300f Compare April 30, 2026 06:21
@zhangstevenunity zhangstevenunity changed the title Fix PlanMemory reserve sizing and reuse liveness Enhance PlanMemory reuse liveness Apr 30, 2026
@zhangstevenunity zhangstevenunity force-pushed the codex/plan-memory-reuse-reserve branch from d7f300f to f0ad0d2 Compare April 30, 2026 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants