Skip to content

Operand/VReg encoding: widen VReg index by doubling Operand size.#257

Open
cfallin wants to merge 1 commit intobytecodealliance:mainfrom
cfallin:we-have-run-out-of-bits-and-it-is-time-to-order-more
Open

Operand/VReg encoding: widen VReg index by doubling Operand size.#257
cfallin wants to merge 1 commit intobytecodealliance:mainfrom
cfallin:we-have-run-out-of-bits-and-it-is-time-to-order-more

Conversation

@cfallin
Copy link
Member

@cfallin cfallin commented Feb 15, 2026

This addresses several longstanding issues we've had in Cranelift where very large inputs can cause the frontend or mid-end to run out of VRegs when generating or mutating IR.

For example, in a recent Wasmtime meeting we agreed that the implementation limits (e.g., 7654321 bytes for a single Wasm function body) are really minimum requirements for us to support as well as maximum bounds for producers to make use of; any input that conforms to those limits should be compilable (given enough CPU time and memory) by the Cranelift backend.

The main limiter on the number of virtual registers we could support in one function body was this crate, the register allocator, and its choice to pack an Operand into 32 bits. The operand bundles together a virtual register index with other dimensions, like def/use, early/late, and constraints (fixed reg, reg, stack, reuse-input, etc.). We got the other fields to fit into 11 bits, leaving 21 for the virtual register: 2^21, or 2M (2097152), virtual registers to use.

This was a performance optimization, as we had found that the memory traffic and cache pressure caused by a larger Operand were measurable and we wanted to avoid this overhead.

Nevertheless, as the limit continues to come up and the need is clear, it seems to be time to re-evaluate. This PR thus bumps the limit to 2^30 VRegs by expanding Operand to 64 bits. The Operand bitpacking itself allows for 2^32 VRegs, but VReg has to carry its register class around (2 bits) and encodes into a u32, so 30 bits it is. A billion VRegs should be enough for anybody. (Let's hope!)

Performance impact: I measured on spidermonkey-json.wasm and bz2.wasm with Sightglass+Wasmtime+Cranelift. I found 0-2% compile time regression on bz2 (mean ~1%), and 0-1% regression on SpiderMonkey (mean ~0.4%). That's not nothing, but it's worth it to solve this issue, IMHO.

This addresses several longstanding issues we've had in Cranelift where
very large inputs can cause the frontend or mid-end to run out of VRegs
when generating or mutating IR.

For example, in a recent Wasmtime meeting we agreed that the
implementation limits (e.g., 7654321 bytes for a single Wasm function
body) are really minimum requirements for us to support as well as
maximum bounds for producers to make use of; any input that conforms to
those limits should be compilable (given enough CPU time and memory) by
the Cranelift backend.

The main limiter on the number of virtual registers we could support in
one function body was this crate, the register allocator, and its choice
to pack an `Operand` into 32 bits. The operand bundles together a
virtual register index with other dimensions, like def/use, early/late,
and constraints (fixed reg, reg, stack, reuse-input, etc.). We got the
other fields to fit into 11 bits, leaving 21 for the virtual register:
2^21, or 2M (2097152), virtual registers to use.

This was a performance optimization, as we had found that the memory
traffic and cache pressure caused by a larger `Operand` were measurable
and we wanted to avoid this overhead.

Nevertheless, as the limit continues to come up and the need is clear,
it seems to be time to re-evaluate. This PR thus bumps the limit to 2^30
VRegs by expanding `Operand` to 64 bits. The `Operand` bitpacking itself
allows for 2^32 VRegs, but `VReg` has to carry its register class around
(2 bits) and encodes into a `u32`, so 30 bits it is. A billion VRegs
should be enough for anybody. (Let's hope!)

Performance impact: I measured on spidermonkey-json.wasm and bz2.wasm
with Sightglass+Wasmtime+Cranelift. I found 0-2% compile time regression
on bz2 (mean ~1%), and 0-1% regression on SpiderMonkey (mean ~0.4%).
That's not nothing, but it's worth it to solve this issue, IMHO.
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise I'm pretty unfamiliar with all of this, so I can give a review from a style/rust/etc perspective but I can't review much from a semantics perspective in a broader sense. It makes sense that this is widening the Operand type but I would basically just be rubber-stamping your own judgement on this in terms of whether it's the best way to achieve this and such (not that I have any better ideas myself). Basically it might be good to have @fitzgen look at this too as I think he's more familiar with the register allocator than I.

Review-wise though there's a lot of magic numbers that are updated here as things are shifted around, and I'd personally find it easier to give things symbolic names which are all derived from one another to make it more obvious what's being affected, but I understand that might be best done as a follow-up as opposed to here too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants