ARM64 JIT: Optimize Group E register conversion by cyrozap · Pull Request #324 · tevador/RandomX

cyrozap · 2026-04-24T19:38:48Z

The AND and ORR sequence can be simplified down to a single BIF instruction if we correct the Group E AND mask to additionally clear the lower 22 bits of each double.

This is possible because Group E registers are always loaded and converted from signed 32-bit integers. The int32-to-double conversion process never sets the lower 22 bits of the resulting double, so it doesn't matter whether or not we clear them with the mask. And since we're able to clear those bits with the mask, we can treat the AND/OR process like a bit-select operation, where the AND mask is used to select between the bits in the OR mask and the bits to keep unchanged.

This change boosts performance by ~0.9% on an Apple M1 Pro, and likely more than that on systems with weaker OoO execution capabilities.

The AND and ORR sequence can be simplified down to a single BIF instruction if we correct the Group E AND mask to additionally clear the lower 22 bits of each double. This is possible because Group E registers are always loaded and converted from signed 32-bit integers. The int32-to-double conversion process never sets the lower 22 bits of the resulting double, so it doesn't matter whether or not we clear them with the mask. And since we're able to clear those bits with the mask, we can treat the AND/OR process like a bit-select operation, where the AND mask is used to select between the bits in the OR mask and the bits to keep unchanged. This change boosts performance by ~0.9% on an Apple M1 Pro, and likely more than that on systems with weaker OoO execution capabilities.

SChernykh · 2026-04-24T21:34:11Z

Good find. I didn't know about this ARM64 instruction. I'll test it tomorrow.

coffnix

I compiled the RandomX library with the submitted AArch64 patches, including the Group E register conversion optimization that replaces the AND+ORR sequence with a single BIF instruction and adjusts the AND mask, and deployed it system-wide, while also building Monero using a custom patch that removes the in-tree RandomX (external/randomx) and forces linkage against the system-provided librandomx, confirming that monerod correctly links against the patched /usr/lib64/librandomx.so (verified via lsof and /proc//maps), after which I performed runtime validation under real workload conditions by running local mining with full CPU utilization (start_mining with all available threads), observing stable and expected hashrate (~1.7 kH/s on a 12-thread ARMv9.2A system) with no crashes, SIGILL, or memory faults, and further validated integration with P2Pool, where the RandomX hasher successfully allocated the full dataset (~2.5 GiB), updated cache and dataset across multiple threads, and synchronized the sidechain without any errors such as invalid shares, hashing failures, or rejected work, indicating that the modified JIT AArch64 path, SIMD masking changes, and BIF-based optimization behave correctly under concurrent load and do not introduce observable consensus or stability issues in practical mining scenarios on a Linux system using an external RandomX library.

thanks for your patch @cyrozap

SChernykh · 2026-04-25T08:38:58Z

Tested it on my phone - everything compiled fine, hashes matched both on RandomX and RandomX v2.

Based on tevador/RandomX#324

See #324 Co-authored-by: cyrozap <220973+cyrozap@users.noreply.github.com>

coffnix reviewed Apr 25, 2026

View reviewed changes

SChernykh approved these changes Apr 25, 2026

View reviewed changes

SChernykh merged commit 9fbab82 into tevador:master Apr 25, 2026
18 checks passed

SChernykh added a commit to SChernykh/xmrig that referenced this pull request Apr 25, 2026

ARM64 JIT: Optimize Group E register conversion

5347458

Based on tevador/RandomX#324

SChernykh mentioned this pull request Apr 25, 2026

ARM64 JIT: Optimize Group E register conversion xmrig/xmrig#3805

Merged

cyrozap deleted the a64-optimize-group-e-conversion branch April 26, 2026 02:05

SChernykh mentioned this pull request May 9, 2026

ARM64 JIT: Optimize Group E register conversion (v1.x) #327

Merged

SChernykh added a commit that referenced this pull request May 9, 2026

ARM64 JIT: Optimize Group E register conversion (#327)

496e00b

See #324 Co-authored-by: cyrozap <220973+cyrozap@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM64 JIT: Optimize Group E register conversion#324

ARM64 JIT: Optimize Group E register conversion#324
SChernykh merged 1 commit into
tevador:masterfrom
cyrozap:a64-optimize-group-e-conversion

cyrozap commented Apr 24, 2026

Uh oh!

SChernykh commented Apr 24, 2026

Uh oh!

coffnix left a comment

Uh oh!

SChernykh commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cyrozap commented Apr 24, 2026

Uh oh!

SChernykh commented Apr 24, 2026

Uh oh!

coffnix left a comment

Choose a reason for hiding this comment

Uh oh!

SChernykh commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants