Skip to content

Add: cross-class @pl.jit.inline example#239

Open
bumble0918 wants to merge 1 commit intohw-native-sys:mainfrom
bumble0918:feature/2026-05-09
Open

Add: cross-class @pl.jit.inline example#239
bumble0918 wants to merge 1 commit intohw-native-sys:mainfrom
bumble0918:feature/2026-05-09

Conversation

@bumble0918
Copy link
Copy Markdown
Contributor

Demonstrates splitting inlined kernels across files and classes while retaining @pl.jit dep auto-discovery via module-level aliases.

  • proj_lib.py: tiled matmul helper (BF16 x BF16 -> FP32) on Projections
  • eltwise_lib.py: residual-add helper on Elementwise
  • main.py: entry function that inlines both helpers into a single kernel
  • config.py: shared tiling constants (BATCH, HIDDEN, chunk sizes)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

Review Change Stack

Warning

Rate limit exceeded

@bumble0918 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 8 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 80ad14cc-60cb-4dbc-9262-528320758f3e

📥 Commits

Reviewing files that changed from the base of the PR and between 288fe20 and a20d42f.

📒 Files selected for processing (5)
  • examples/advanced/cross_class_proj/__init__.py
  • examples/advanced/cross_class_proj/config.py
  • examples/advanced/cross_class_proj/eltwise_lib.py
  • examples/advanced/cross_class_proj/main.py
  • examples/advanced/cross_class_proj/proj_lib.py
📝 Walkthrough

Walkthrough

This PR adds a new example directory demonstrating cross-file, cross-class @pl.jit.inline usage. It includes five new files: package initialization, shared configuration constants, two kernel libraries (projection and elementwise addition), and a main orchestration script with test harness. The example chains inlinable kernels across module boundaries.

Changes

Cross-class inline projection example

Layer / File(s) Summary
Package & Configuration
examples/advanced/cross_class_proj/__init__.py, examples/advanced/cross_class_proj/config.py
Package docstring documents the cross-class inlining pattern. Configuration module exports BATCH, HIDDEN, and per-operation chunk-size constants (N_OUT_CHUNK, K_PROJ_CHUNK, ADD_OUT_CHUNK).
Projection Kernel
examples/advanced/cross_class_proj/proj_lib.py
Projections class implements linear() as a tiled, pipelined matrix multiplication using pl.parallel over output chunks and pl.pipeline over reduction chunks. Module-level linear alias enables bare-name invocation for dependency discovery.
Elementwise Kernel
examples/advanced/cross_class_proj/eltwise_lib.py
Elementwise class implements residual_add() as a tiled elementwise sum of FP32 and BF16-to-FP32 tensors using pl.parallel and pl.at for core placement. Module-level residual_add alias enables bare-name invocation.
Orchestration & Testing
examples/advanced/cross_class_proj/main.py
Main entry point defines proj_residual JIT function that chains linear() then residual_add() kernels. Includes build_tensor_specs() for randomized input generation, golden_proj_residual() reference implementation, and CLI-driven test execution via run_jit.

Sequence Diagram

sequenceDiagram
    participant User as Test Runner
    participant JIT as proj_residual<br/>@pl.jit
    participant LinearKern as linear<br/>@pl.jit.inline
    participant EltwiseKern as residual_add<br/>@pl.jit.inline
    participant Golden as golden reference
    User->>JIT: invoke proj_residual(x, w, hidden_states)
    JIT->>LinearKern: call linear(x, w, intermediate)
    LinearKern-->>JIT: intermediate = x @ w
    JIT->>EltwiseKern: call residual_add(intermediate, hidden_states, out)
    EltwiseKern-->>JIT: out = intermediate + hidden_states
    User->>Golden: compute x @ w + hidden_states
    Golden-->>User: golden result
    User->>User: validate JIT output matches golden
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A Cross-Class Tale

Five files spring forth in harmony so true,
With kernels inlined across the module view,
Projections multiply and add they blend,
A pattern shown, from start unto the end! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a cross-class @pl.jit.inline example, which is the primary purpose of this PR.
Description check ✅ Passed The description is well-related to the changeset, providing specific details about the four files added (proj_lib.py, eltwise_lib.py, main.py, config.py) and explaining the core purpose of demonstrating inlined kernel splitting across files and classes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/advanced/cross_class_proj/proj_lib.py`:
- Around line 49-50: The loop using pl.pipeline(0, HIDDEN // K_PROJ_CHUNK,
stage=2) silently drops any remainder of HIDDEN divided by K_PROJ_CHUNK; add a
fail-fast check before that loop to ensure HIDDEN % K_PROJ_CHUNK == 0 (or raise
a clear error) so the reduction over k0 = kb * K_PROJ_CHUNK doesn't skip tail
elements; locate the check near the use of HIDDEN, K_PROJ_CHUNK and pl.pipeline
in proj_lib.py and raise a ValueError (or assert) with a descriptive message if
the remainder is non-zero.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 13f45823-2fb4-42d9-af17-25e2e83d18a9

📥 Commits

Reviewing files that changed from the base of the PR and between 23fe87c and 288fe20.

📒 Files selected for processing (5)
  • examples/advanced/cross_class_proj/__init__.py
  • examples/advanced/cross_class_proj/config.py
  • examples/advanced/cross_class_proj/eltwise_lib.py
  • examples/advanced/cross_class_proj/main.py
  • examples/advanced/cross_class_proj/proj_lib.py

Comment on lines +49 to +50
for kb in pl.pipeline(0, HIDDEN // K_PROJ_CHUNK, stage=2):
k0 = kb * K_PROJ_CHUNK
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard against silent K-tail truncation in the reduction loop.

At Line 49, floor-division (HIDDEN // K_PROJ_CHUNK) drops any remainder, which would silently skip part of K if constants change. Add a fail-fast check before the loop.

Proposed fix
 class Projections:
@@
     def linear(
         x: pl.Tensor[[BATCH, HIDDEN], pl.BF16],
         w: pl.Tensor[[HIDDEN, HIDDEN], pl.BF16],
         y: pl.Out[pl.Tensor[[BATCH, HIDDEN], pl.FP32]],
     ):
         """``y = x @ w`` — N parallel, K reduction pipelined inside each scope."""
+        if HIDDEN % K_PROJ_CHUNK != 0:
+            raise ValueError("HIDDEN must be divisible by K_PROJ_CHUNK to avoid dropping K tail.")
         for n0 in pl.parallel(0, HIDDEN, N_OUT_CHUNK):
             with pl.at(level=pl.Level.CORE_GROUP, name_hint="linear"):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/advanced/cross_class_proj/proj_lib.py` around lines 49 - 50, The
loop using pl.pipeline(0, HIDDEN // K_PROJ_CHUNK, stage=2) silently drops any
remainder of HIDDEN divided by K_PROJ_CHUNK; add a fail-fast check before that
loop to ensure HIDDEN % K_PROJ_CHUNK == 0 (or raise a clear error) so the
reduction over k0 = kb * K_PROJ_CHUNK doesn't skip tail elements; locate the
check near the use of HIDDEN, K_PROJ_CHUNK and pl.pipeline in proj_lib.py and
raise a ValueError (or assert) with a descriptive message if the remainder is
non-zero.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new advanced example demonstrating how to use @pl.jit.inline across different classes and files. It includes a configuration file, library modules for projection and elementwise operations, and a main script to run the JIT kernel. A review comment identified a redundant tensor allocation in the linear projection function that should be removed to optimize resource usage.

Comment on lines +48 to +49
acc = pl.create_tensor([BATCH, N_OUT_CHUNK], dtype=pl.FP32)
for kb in pl.pipeline(0, HIDDEN // K_PROJ_CHUNK, stage=2):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The allocation of acc at line 48 is redundant because acc is immediately reassigned in the first iteration of the kb loop (when k0 == 0) at line 54. Removing this unnecessary pl.create_tensor call saves resources and simplifies the code.

Suggested change
acc = pl.create_tensor([BATCH, N_OUT_CHUNK], dtype=pl.FP32)
for kb in pl.pipeline(0, HIDDEN // K_PROJ_CHUNK, stage=2):
for kb in pl.pipeline(0, HIDDEN // K_PROJ_CHUNK, stage=2):

Demonstrates splitting inlined kernels across files and classes while
retaining @pl.jit dep auto-discovery via module-level aliases.

- proj_lib.py: tiled matmul helper (BF16 x BF16 -> FP32) on Projections
- eltwise_lib.py: residual-add helper on Elementwise
- main.py: entry function that inlines both helpers into a single kernel
- config.py: shared tiling constants (BATCH, HIDDEN, chunk sizes)
@bumble0918 bumble0918 force-pushed the feature/2026-05-09 branch from 288fe20 to a20d42f Compare May 9, 2026 02:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant