@noinline average_bulk_microphysics_tendencies to reduce register pressure#713
@noinline average_bulk_microphysics_tendencies to reduce register pressure#713petebachant wants to merge 2 commits into
@noinline average_bulk_microphysics_tendencies to reduce register pressure#713Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #713 +/- ##
=======================================
Coverage 92.02% 92.02%
=======================================
Files 54 54
Lines 2321 2321
=======================================
Hits 2136 2136
Misses 185 185
🚀 New features to boost your workflow:
|
dennisYatunin
left a comment
There was a problem hiding this comment.
Claude's explanation is a bit suspicious given that the quadrature loop isn't being unrolled, but a 10% speedup sounds great! I'll think about how we can turn this into a simpler example for ClimaCore's compiler stress tests.
|
Hypothesis: If Claude is correct, the issue comes from the quadrature loop being unrolled, so we de-unroll that we may be able to get these benefits without the performance hit on CPU. Might be possible by dropping quadrature order from the type. Move from type into the value. Type can be int. Value is number of quadrature loops. |
|
Is this PR something that should be merged or closed? |
|
I opened CliMA/ClimaAtmos.jl#4503 to retain the GPU performance gains and move changes to Atmos and avoid the 1.12 regression, but that one is a little uglier. Any preference from your end? |
This kernel is now the hottest in prog EDMF 1M AMIP by a long shot, and this change produces a ~10% speedup (kernel analysis notebook). Disclaimer: Explanatory comments written by Claude--I don't yet have a deep understanding of what's going on here!