Skip to content

ARM64 JIT: Optimize Group E register conversion#324

Merged
SChernykh merged 1 commit into
tevador:masterfrom
cyrozap:a64-optimize-group-e-conversion
Apr 25, 2026
Merged

ARM64 JIT: Optimize Group E register conversion#324
SChernykh merged 1 commit into
tevador:masterfrom
cyrozap:a64-optimize-group-e-conversion

Conversation

@cyrozap
Copy link
Copy Markdown
Contributor

@cyrozap cyrozap commented Apr 24, 2026

The AND and ORR sequence can be simplified down to a single BIF instruction if we correct the Group E AND mask to additionally clear the lower 22 bits of each double.

This is possible because Group E registers are always loaded and converted from signed 32-bit integers. The int32-to-double conversion process never sets the lower 22 bits of the resulting double, so it doesn't matter whether or not we clear them with the mask. And since we're able to clear those bits with the mask, we can treat the AND/OR process like a bit-select operation, where the AND mask is used to select between the bits in the OR mask and the bits to keep unchanged.

This change boosts performance by ~0.9% on an Apple M1 Pro, and likely more than that on systems with weaker OoO execution capabilities.

The AND and ORR sequence can be simplified down to a single BIF
instruction if we correct the Group E AND mask to additionally clear the
lower 22 bits of each double.

This is possible because Group E registers are always loaded and
converted from signed 32-bit integers. The int32-to-double conversion
process never sets the lower 22 bits of the resulting double, so it
doesn't matter whether or not we clear them with the mask. And since
we're able to clear those bits with the mask, we can treat the AND/OR
process like a bit-select operation, where the AND mask is used to
select between the bits in the OR mask and the bits to keep unchanged.

This change boosts performance by ~0.9% on an Apple M1 Pro, and likely
more than that on systems with weaker OoO execution capabilities.
@SChernykh
Copy link
Copy Markdown
Collaborator

Good find. I didn't know about this ARM64 instruction. I'll test it tomorrow.

Copy link
Copy Markdown
Contributor

@coffnix coffnix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compiled the RandomX library with the submitted AArch64 patches, including the Group E register conversion optimization that replaces the AND+ORR sequence with a single BIF instruction and adjusts the AND mask, and deployed it system-wide, while also building Monero using a custom patch that removes the in-tree RandomX (external/randomx) and forces linkage against the system-provided librandomx, confirming that monerod correctly links against the patched /usr/lib64/librandomx.so (verified via lsof and /proc//maps), after which I performed runtime validation under real workload conditions by running local mining with full CPU utilization (start_mining with all available threads), observing stable and expected hashrate (~1.7 kH/s on a 12-thread ARMv9.2A system) with no crashes, SIGILL, or memory faults, and further validated integration with P2Pool, where the RandomX hasher successfully allocated the full dataset (~2.5 GiB), updated cache and dataset across multiple threads, and synchronized the sidechain without any errors such as invalid shares, hashing failures, or rejected work, indicating that the modified JIT AArch64 path, SIMD masking changes, and BIF-based optimization behave correctly under concurrent load and do not introduce observable consensus or stability issues in practical mining scenarios on a Linux system using an external RandomX library.

thanks for your patch @cyrozap

@SChernykh
Copy link
Copy Markdown
Collaborator

Tested it on my phone - everything compiled fine, hashes matched both on RandomX and RandomX v2.

@SChernykh SChernykh merged commit 9fbab82 into tevador:master Apr 25, 2026
18 checks passed
SChernykh added a commit to SChernykh/xmrig that referenced this pull request Apr 25, 2026
@cyrozap cyrozap deleted the a64-optimize-group-e-conversion branch April 26, 2026 02:05
SChernykh added a commit that referenced this pull request May 9, 2026
See #324

Co-authored-by: cyrozap <220973+cyrozap@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants