Skip to content

update roofline chart for qk_norm#123

Merged
Laurawly merged 3 commits intomainfrom
qk_norm
Mar 18, 2026
Merged

update roofline chart for qk_norm#123
Laurawly merged 3 commits intomainfrom
qk_norm

Conversation

@Laurawly
Copy link
Contributor

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 18, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the GB300 Q/K-norm benchmark documentation to include a roofline-style bandwidth view (Oink vs CuTeDSL) alongside the existing steady-state CUDA-graph replay latency tables.

Changes:

  • Adds a new pre-generated GB300 BF16 Q/K-norm roofline SVG plot (Oink vs CuTeDSL baseline).
  • Updates oink/README.md to explain the roofline (“useful-bandwidth”) view and embeds the new SVG.
  • Extends the README takeaways with an explicit roofline-based interpretation.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.

File Description
oink/benchmarks/media/gb300_bf16_qk_norm_oink_vs_cutedsl_roofline.svg Adds the new roofline visualization asset for the GB300 Q/K-norm comparison.
oink/README.md Documents and embeds the new roofline plot; clarifies interpretation vs latency medians.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

one read + one write of the fused `[M, N]` tensor. This is the physically
meaningful view for comparing against the measured practical GB300 BF16 stream
roof, whereas the steady-state CUDA-graph replay medians below are better read
as a latency view.
@Laurawly Laurawly merged commit 68608d6 into main Mar 18, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants