Add: cross-class @pl.jit.inline example#239
Add: cross-class @pl.jit.inline example#239bumble0918 wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR adds a new example directory demonstrating cross-file, cross-class ChangesCross-class inline projection example
Sequence DiagramsequenceDiagram
participant User as Test Runner
participant JIT as proj_residual<br/>@pl.jit
participant LinearKern as linear<br/>@pl.jit.inline
participant EltwiseKern as residual_add<br/>@pl.jit.inline
participant Golden as golden reference
User->>JIT: invoke proj_residual(x, w, hidden_states)
JIT->>LinearKern: call linear(x, w, intermediate)
LinearKern-->>JIT: intermediate = x @ w
JIT->>EltwiseKern: call residual_add(intermediate, hidden_states, out)
EltwiseKern-->>JIT: out = intermediate + hidden_states
User->>Golden: compute x @ w + hidden_states
Golden-->>User: golden result
User->>User: validate JIT output matches golden
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/advanced/cross_class_proj/proj_lib.py`:
- Around line 49-50: The loop using pl.pipeline(0, HIDDEN // K_PROJ_CHUNK,
stage=2) silently drops any remainder of HIDDEN divided by K_PROJ_CHUNK; add a
fail-fast check before that loop to ensure HIDDEN % K_PROJ_CHUNK == 0 (or raise
a clear error) so the reduction over k0 = kb * K_PROJ_CHUNK doesn't skip tail
elements; locate the check near the use of HIDDEN, K_PROJ_CHUNK and pl.pipeline
in proj_lib.py and raise a ValueError (or assert) with a descriptive message if
the remainder is non-zero.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 13f45823-2fb4-42d9-af17-25e2e83d18a9
📒 Files selected for processing (5)
examples/advanced/cross_class_proj/__init__.pyexamples/advanced/cross_class_proj/config.pyexamples/advanced/cross_class_proj/eltwise_lib.pyexamples/advanced/cross_class_proj/main.pyexamples/advanced/cross_class_proj/proj_lib.py
| for kb in pl.pipeline(0, HIDDEN // K_PROJ_CHUNK, stage=2): | ||
| k0 = kb * K_PROJ_CHUNK |
There was a problem hiding this comment.
Guard against silent K-tail truncation in the reduction loop.
At Line 49, floor-division (HIDDEN // K_PROJ_CHUNK) drops any remainder, which would silently skip part of K if constants change. Add a fail-fast check before the loop.
Proposed fix
class Projections:
@@
def linear(
x: pl.Tensor[[BATCH, HIDDEN], pl.BF16],
w: pl.Tensor[[HIDDEN, HIDDEN], pl.BF16],
y: pl.Out[pl.Tensor[[BATCH, HIDDEN], pl.FP32]],
):
"""``y = x @ w`` — N parallel, K reduction pipelined inside each scope."""
+ if HIDDEN % K_PROJ_CHUNK != 0:
+ raise ValueError("HIDDEN must be divisible by K_PROJ_CHUNK to avoid dropping K tail.")
for n0 in pl.parallel(0, HIDDEN, N_OUT_CHUNK):
with pl.at(level=pl.Level.CORE_GROUP, name_hint="linear"):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/advanced/cross_class_proj/proj_lib.py` around lines 49 - 50, The
loop using pl.pipeline(0, HIDDEN // K_PROJ_CHUNK, stage=2) silently drops any
remainder of HIDDEN divided by K_PROJ_CHUNK; add a fail-fast check before that
loop to ensure HIDDEN % K_PROJ_CHUNK == 0 (or raise a clear error) so the
reduction over k0 = kb * K_PROJ_CHUNK doesn't skip tail elements; locate the
check near the use of HIDDEN, K_PROJ_CHUNK and pl.pipeline in proj_lib.py and
raise a ValueError (or assert) with a descriptive message if the remainder is
non-zero.
There was a problem hiding this comment.
Code Review
This pull request introduces a new advanced example demonstrating how to use @pl.jit.inline across different classes and files. It includes a configuration file, library modules for projection and elementwise operations, and a main script to run the JIT kernel. A review comment identified a redundant tensor allocation in the linear projection function that should be removed to optimize resource usage.
| acc = pl.create_tensor([BATCH, N_OUT_CHUNK], dtype=pl.FP32) | ||
| for kb in pl.pipeline(0, HIDDEN // K_PROJ_CHUNK, stage=2): |
There was a problem hiding this comment.
The allocation of acc at line 48 is redundant because acc is immediately reassigned in the first iteration of the kb loop (when k0 == 0) at line 54. Removing this unnecessary pl.create_tensor call saves resources and simplifies the code.
| acc = pl.create_tensor([BATCH, N_OUT_CHUNK], dtype=pl.FP32) | |
| for kb in pl.pipeline(0, HIDDEN // K_PROJ_CHUNK, stage=2): | |
| for kb in pl.pipeline(0, HIDDEN // K_PROJ_CHUNK, stage=2): |
Demonstrates splitting inlined kernels across files and classes while retaining @pl.jit dep auto-discovery via module-level aliases. - proj_lib.py: tiled matmul helper (BF16 x BF16 -> FP32) on Projections - eltwise_lib.py: residual-add helper on Elementwise - main.py: entry function that inlines both helpers into a single kernel - config.py: shared tiling constants (BATCH, HIDDEN, chunk sizes)
288fe20 to
a20d42f
Compare
Demonstrates splitting inlined kernels across files and classes while retaining @pl.jit dep auto-discovery via module-level aliases.