GH-138245: Perform boolean guards by testing a single bit, rather than a full pointer comparison.#143810
Conversation
…inter comparison.
| int bit = get_test_bit_for_bools(); | ||
| if (bit) { | ||
| REPLACE_OP(this_instr, | ||
| test_bit_set_in_true(bit) ? |
There was a problem hiding this comment.
Please check this once at the context initialization and set it in the context, then fetch the corresponding op to use from the context.
There was a problem hiding this comment.
I thought about that, but decided that having simpler, stateless code was better than a saving a few cycles out of many thousands optimizing a trace.
These functions are only called 2 or 3 times per trace (on average) and are really small and fast.
| int bit = get_test_bit_for_bools(); | ||
| if (bit) { | ||
| REPLACE_OP(this_instr, | ||
| test_bit_set_in_true(bit) ? |
|
Also do you have performance numbers? Even microbenchmarks are fine. I see a pretty big difference (around 10%) in Richards. |
No. It didn't seem worth measuring as I didn't expect the impact to be above the noise.
I'm surprised it makes much difference. |
Sorry that was a fluke, I can't reproduce it anymore. |
…er than a full pointer comparison. (pythonGH-143810)
This reduces the overhead of performing boolean guards in jitted code.
On Aarch64 reduces the size of the stencil from 5 to 2 instructions.
GUARD_IS_FALSE_POP_r10:_GUARD_BIT_IS_SET_POP_4_r10: