Skip to content

@noinline average_bulk_microphysics_tendencies to reduce register pressure#713

Open
petebachant wants to merge 2 commits into
mainfrom
pb/perf
Open

@noinline average_bulk_microphysics_tendencies to reduce register pressure#713
petebachant wants to merge 2 commits into
mainfrom
pb/perf

Conversation

@petebachant
Copy link
Copy Markdown
Member

This kernel is now the hottest in prog EDMF 1M AMIP by a long shot, and this change produces a ~10% speedup (kernel analysis notebook). Disclaimer: Explanatory comments written by Claude--I don't yet have a deep understanding of what's going on here!

@petebachant petebachant requested a review from dennisYatunin May 8, 2026 00:06
@petebachant petebachant moved this to In review in Performance May 8, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.02%. Comparing base (14ef3e6) to head (96c5bcf).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #713   +/-   ##
=======================================
  Coverage   92.02%   92.02%           
=======================================
  Files          54       54           
  Lines        2321     2321           
=======================================
  Hits         2136     2136           
  Misses        185      185           
Components Coverage Δ
src 92.99% <100.00%> (ø)
ext 69.47% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@dennisYatunin dennisYatunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude's explanation is a bit suspicious given that the quadrature loop isn't being unrolled, but a 10% speedup sounds great! I'll think about how we can turn this into a simpler example for ClimaCore's compiler stress tests.

@petebachant petebachant self-assigned this May 11, 2026
@petebachant
Copy link
Copy Markdown
Member Author

petebachant commented May 11, 2026

Hypothesis: If Claude is correct, the issue comes from the quadrature loop being unrolled, so we de-unroll that we may be able to get these benefits without the performance hit on CPU.

Might be possible by dropping quadrature order from the type. Move from type into the value. Type can be int. Value is number of quadrature loops.

@trontrytel
Copy link
Copy Markdown
Member

Is this PR something that should be merged or closed?

@petebachant
Copy link
Copy Markdown
Member Author

I opened CliMA/ClimaAtmos.jl#4503 to retain the GPU performance gains and move changes to Atmos and avoid the 1.12 regression, but that one is a little uglier. Any preference from your end?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

3 participants