Reduce cost of bounds checks in transforms#43
Reduce cost of bounds checks in transforms#43kornelski wants to merge 3 commits intoFirefoxGraphics:mainfrom
Conversation
b8d408f to
7585ddd
Compare
| struct ClutOnly { | ||
| clut: Box<[f32]>, | ||
| grid_size: u16, | ||
| grid_size: u8, |
There was a problem hiding this comment.
Was the motivation for this? Avoiding the casts?
There was a problem hiding this comment.
That's the natural size for it, as read from the input.
I hoped it would also explain to LLVM that grid_size.pow(4) can't overflow, but unfortunately that didn't have any effect.
|
Is this code showing up in profiles for you? It should only be used when building a lookup table during transform creation. The actual color transformation should be using a fast path. |
|
The first 3 commits are queued for landing. https://bugzilla.mozilla.org/show_bug.cgi?id=1947889 The other 3 seem ok but I'd like to understand the motivation a little better before landing them. |
chunks_exact(3).nth() has a fast path, and needs 1 check for 3 pixels
|
Yes, on small images (<640px) the time to build the lookup table takes more time than the transformation itself. Reduction in bounds checks also made the code a bit smaller. |
|
I've landed the grid size patch. The other two commits cause: when running |
|
I'll make it so that it runs tests in github actions when I get a chance. |
According to llvm-mca estimate, the 4x3 transform function
before:
uOps Per Cycle: 2.70
IPC: 1.97
Block RThroughput: 159.3
after:
uOps Per Cycle: 3.48
IPC: 2.69
Block RThroughput: 279.8