Tweak MPMS materialization to handle some edge cases#624
Merged
inducer merged 3 commits intoinducer:mainfrom Jan 12, 2026
Merged
Tweak MPMS materialization to handle some edge cases#624inducer merged 3 commits intoinducer:mainfrom
inducer merged 3 commits intoinducer:mainfrom
Conversation
3e5b747 to
3326f49
Compare
Collaborator
Author
|
FWIW, here's the data from KS3D (with some slight tweaks to make it more like the actual prediction case) running serially on Tuolumne:
Turns out these tweaks don't really help reduce the timestep time. I'm guessing since they're reducing the flop count (as well as potentially increasing the number of memory accesses), they're just reducing arithmetic intensity? I'm tempted to abandon Fix 2 since it has such a big effect on compile time. Fix 1 might be worth keeping though... |
3326f49 to
5423002
Compare
Collaborator
Author
|
I removed fix 2 from this PR for now. It's still available at majosm/refine-mpms-materialization-part2. Otherwise, this looks ready for review (still best to look at it commit by commit). |
inducer
reviewed
Dec 15, 2025
1339d7a to
6b5356f
Compare
6b5356f to
4c38567
Compare
4c38567 to
7f273f9
Compare
… handle indexing with heavy reuse add more explanation for MPMS reuse tweak
7f273f9 to
b55b35b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a couple of tweaks to the MPMS materialization scheme to try to handle the two major cases of arrays not being materialized when they should be (i.e., they contribute a lot of excess computation), as found when experimenting with mirgecom's gas-in-box with the flop counting from #623. The two cases are:
len(materialized_predecessors) < 2) that are used by a large number of materialized nodes. In mirgecom simulations, this occurs most frequently on powers of temperature, where the subexpression only depends on one materialized array (temperature) and has a single computational node (**), but due to a large amount of reuse (up to 30x) it contributes a lot of flops. I attempted to fix this by modifying the MPMS materialization criterion slightly (nsuccessors > 1 and len(materialized_predecessors) > 1changed tonsuccessors > 1 and nsuccessors + len(materialized_predecessors) >= 4). This may have the side effect of materializing some operations in the DAG that don't strictly need to be (Reshapes). Not sure yet how much this matters.Results for
gas-in-box:Results for
wave(which wasn't that bad to begin with):I'd still like to try this on the full prediction case to see if it behaves any differently from gas-in-box, and also to see how these changes affect timestep time.
Note 1: I moved the materialization code to a new module, because I was originally planning to add this as a separate materializer. I ultimately decided to tweak the existing one instead, but I left the new module. I figure
transform/__init__.pyis too crowded anyway, it could use some splitting up.Note 2: Due to the module change, it's likely best to read (and merge) this PR commit by commit.