Operand/VReg encoding: widen VReg index by doubling Operand size.#257
Conversation
This addresses several longstanding issues we've had in Cranelift where very large inputs can cause the frontend or mid-end to run out of VRegs when generating or mutating IR. For example, in a recent Wasmtime meeting we agreed that the implementation limits (e.g., 7654321 bytes for a single Wasm function body) are really minimum requirements for us to support as well as maximum bounds for producers to make use of; any input that conforms to those limits should be compilable (given enough CPU time and memory) by the Cranelift backend. The main limiter on the number of virtual registers we could support in one function body was this crate, the register allocator, and its choice to pack an `Operand` into 32 bits. The operand bundles together a virtual register index with other dimensions, like def/use, early/late, and constraints (fixed reg, reg, stack, reuse-input, etc.). We got the other fields to fit into 11 bits, leaving 21 for the virtual register: 2^21, or 2M (2097152), virtual registers to use. This was a performance optimization, as we had found that the memory traffic and cache pressure caused by a larger `Operand` were measurable and we wanted to avoid this overhead. Nevertheless, as the limit continues to come up and the need is clear, it seems to be time to re-evaluate. This PR thus bumps the limit to 2^30 VRegs by expanding `Operand` to 64 bits. The `Operand` bitpacking itself allows for 2^32 VRegs, but `VReg` has to carry its register class around (2 bits) and encodes into a `u32`, so 30 bits it is. A billion VRegs should be enough for anybody. (Let's hope!) Performance impact: I measured on spidermonkey-json.wasm and bz2.wasm with Sightglass+Wasmtime+Cranelift. I found 0-2% compile time regression on bz2 (mean ~1%), and 0-1% regression on SpiderMonkey (mean ~0.4%). That's not nothing, but it's worth it to solve this issue, IMHO.
alexcrichton
left a comment
There was a problem hiding this comment.
Code-wise I'm pretty unfamiliar with all of this, so I can give a review from a style/rust/etc perspective but I can't review much from a semantics perspective in a broader sense. It makes sense that this is widening the Operand type but I would basically just be rubber-stamping your own judgement on this in terms of whether it's the best way to achieve this and such (not that I have any better ideas myself). Basically it might be good to have @fitzgen look at this too as I think he's more familiar with the register allocator than I.
Review-wise though there's a lot of magic numbers that are updated here as things are shifted around, and I'd personally find it easier to give things symbolic names which are all derived from one another to make it more obvious what's being affected, but I understand that might be best done as a follow-up as opposed to here too
This addresses several longstanding issues we've had in Cranelift where very large inputs can cause the frontend or mid-end to run out of VRegs when generating or mutating IR.
For example, in a recent Wasmtime meeting we agreed that the implementation limits (e.g., 7654321 bytes for a single Wasm function body) are really minimum requirements for us to support as well as maximum bounds for producers to make use of; any input that conforms to those limits should be compilable (given enough CPU time and memory) by the Cranelift backend.
The main limiter on the number of virtual registers we could support in one function body was this crate, the register allocator, and its choice to pack an
Operandinto 32 bits. The operand bundles together a virtual register index with other dimensions, like def/use, early/late, and constraints (fixed reg, reg, stack, reuse-input, etc.). We got the other fields to fit into 11 bits, leaving 21 for the virtual register: 2^21, or 2M (2097152), virtual registers to use.This was a performance optimization, as we had found that the memory traffic and cache pressure caused by a larger
Operandwere measurable and we wanted to avoid this overhead.Nevertheless, as the limit continues to come up and the need is clear, it seems to be time to re-evaluate. This PR thus bumps the limit to 2^30 VRegs by expanding
Operandto 64 bits. TheOperandbitpacking itself allows for 2^32 VRegs, butVReghas to carry its register class around (2 bits) and encodes into au32, so 30 bits it is. A billion VRegs should be enough for anybody. (Let's hope!)Performance impact: I measured on spidermonkey-json.wasm and bz2.wasm with Sightglass+Wasmtime+Cranelift. I found 0-2% compile time regression on bz2 (mean ~1%), and 0-1% regression on SpiderMonkey (mean ~0.4%). That's not nothing, but it's worth it to solve this issue, IMHO.