Enhance PlanMemory reuse liveness#608
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements automatic sizing and validation for reserve_buffer operations and improves memory liveness analysis to allow buffer reuse by delaying generation until the first write in loops. It also updates the memory planner to respect reserved capacities and supports multiple lifetime intervals per buffer. A performance concern was raised regarding the quadratic complexity of the function walk used to calculate buffer sizes.
| computeAutoReserveBufferSizeBytes(func::FuncOp funcOp, | ||
| ReserveBufferOp reserveOp) { | ||
| std::optional<int64_t> fifoLocalSizeBytes; | ||
| bool failedToCompute = false; | ||
|
|
||
| auto updateFromFifo = [&](Operation *op, int64_t slotSizeBytes, | ||
| IntegerAttr localSlotNumAttr) -> LogicalResult { | ||
| if (!localSlotNumAttr) | ||
| return success(); | ||
|
|
||
| int64_t currentSizeBytes = 0; | ||
| if (failed(computeFifoLocalBufferSizeBytes( | ||
| op, slotSizeBytes, localSlotNumAttr, currentSizeBytes))) | ||
| return failure(); | ||
|
|
||
| // One reserve_buffer normally feeds one FIFO. If the IR shares it across | ||
| // multiple pipe init ops, reserve enough for the largest local buffer. | ||
| fifoLocalSizeBytes = | ||
| fifoLocalSizeBytes ? std::max(*fifoLocalSizeBytes, currentSizeBytes) | ||
| : currentSizeBytes; | ||
| return success(); | ||
| }; | ||
|
|
||
| WalkResult walkResult = funcOp.walk([&](Operation *op) -> WalkResult { | ||
| if (auto initOp = dyn_cast<InitializeL2G2LPipeOp>(op)) { | ||
| if (!isReserveBufferAddr(initOp.getLocalAddr(), reserveOp) && | ||
| !isReserveBufferAddr(initOp.getPeerLocalAddr(), reserveOp)) | ||
| return WalkResult::advance(); | ||
| if (failed(updateFromFifo(initOp.getOperation(), initOp.getSlotSize(), | ||
| initOp.getLocalSlotNumAttr()))) { | ||
| failedToCompute = true; | ||
| return WalkResult::interrupt(); | ||
| } | ||
| return WalkResult::advance(); | ||
| } | ||
|
|
||
| if (auto initOp = dyn_cast<AicInitializePipeOp>(op)) { | ||
| if (!isReserveBufferAddr(initOp.getC2vConsumerBuf(), reserveOp) && | ||
| !isReserveBufferAddr(initOp.getV2cConsumerBuf(), reserveOp)) | ||
| return WalkResult::advance(); | ||
| if (failed(updateFromFifo(initOp.getOperation(), initOp.getSlotSize(), | ||
| initOp.getLocalSlotNumAttr()))) { | ||
| failedToCompute = true; | ||
| return WalkResult::interrupt(); | ||
| } | ||
| return WalkResult::advance(); | ||
| } | ||
|
|
||
| if (auto initOp = dyn_cast<AivInitializePipeOp>(op)) { | ||
| if (!isReserveBufferAddr(initOp.getC2vConsumerBuf(), reserveOp) && | ||
| !isReserveBufferAddr(initOp.getV2cConsumerBuf(), reserveOp)) | ||
| return WalkResult::advance(); | ||
| if (failed(updateFromFifo(initOp.getOperation(), initOp.getSlotSize(), | ||
| initOp.getLocalSlotNumAttr()))) { | ||
| failedToCompute = true; | ||
| return WalkResult::interrupt(); | ||
| } | ||
| return WalkResult::advance(); | ||
| } | ||
|
|
||
| return WalkResult::advance(); | ||
| }); | ||
|
|
||
| (void)walkResult; | ||
| if (failedToCompute) | ||
| return failure(); | ||
| return fifoLocalSizeBytes.value_or(reserveOp.getSize()); | ||
| } |
There was a problem hiding this comment.
The function computeAutoReserveBufferSizeBytes performs a full walk of the funcOp for every reserve_buffer operation. While the number of reserved buffers is typically small, this could lead to quadratic complexity relative to the number of operations in the function. Consider pre-collecting pipe initialization operations if performance becomes an issue in large kernels.
Codex Review该评论由 review 机器人自动更新。
SummaryPR #608 的 overwrite-based reuse 对部分 alias/view 不安全,可能提前释放父 buffer 并造成错误结果。 Findings
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fbcc7b796c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| CanDelayLoopEntryGenUntilFirstWrite(forOp, Buffer)) { | ||
| delayedLoopEntryGenBuffers[Buffer] = true; | ||
| continue; |
There was a problem hiding this comment.
Preserve loop-entry liveness for zero-trip loops
Delaying loop-entry generation here assumes the loop body executes at least once, but scf.for may legally have zero iterations (e.g., dynamic bounds where lb >= ub). In that case no in-loop overwrite occurs, yet this code marks the live-in buffer as delayable and skips generating it at loop entry, which can let PlanMemory reuse that storage too early and corrupt a value that is read after the loop. The optimization should only apply when non-empty execution is provable, or else keep the loop-entry lifetime.
Useful? React with 👍 / 👎.
fbcc7b7 to
d7f300f
Compare
d7f300f to
f0ad0d2
Compare
Summary
scf.foris statically known to execute at least onceValidation
ninja -C build-codex-plan-memory tools/ptoas/ptoas -j8llvm-litfor the three reuse/zero-trip tests: PASSllvm-lit -q build-codex-plan-memory/test/lit: PASS (165 tests)