Restore backbone-only .compile() override on forcefield regressors by timduignan · Pull Request #165 · orbital-materials/orb-models

timduignan · 2026-06-04T20:12:34Z

Summary

Without an override, model.compile(...) falls through to nn.Module.compile() which compiles self.__call__ (= forward). That has two problems:

DirectForcefieldRegressor: predict() calls self.model(batch) directly and never goes through __call__, so the compiled callable is created but never invoked. .compile() is silently a no-op for inference.
ConservativeForcefieldRegressor: predict() does flow through __call__, but compiling the full regressor pulls the energy autograd backward and (for OrbMol-v2) the Coulomb / PME path into the traced graph, which dynamo fragments badly.

This PR restores the small .compile() override on both classes so self.model (the GNS message-passing backbone, where ~all the FLOPs live) is the thing actually compiled; post-backbone work runs eager. This matches what the internal core repo's orbmolv2 branch already does, and was the behaviour before the recent porting commit that exposed this regression.

Numbers

orb_v3_direct_20_omat on a single H100, 3 trials each (variance ~1%):

variant	1k atoms	10k atoms
`compile=True` (pre-fix default)	9.23 ms	38.36 ms
`compile=False` (eager)	9.23 ms	38.32 ms
manual `model.model.compile(...)` (this fix)	7.07 ms	30.04 ms

So the pre-fix .compile() was a literal no-op in inference for direct models, and restoring the backbone compile recovers a 22-24% inference speedup. Conservative models also benefit by getting a clean single-fragment compile graph instead of the fragmented full-regressor trace.

Tests

Adds test_compile_engages_backbone to both tests/forcefield/test_direct_regressor.py and tests/forcefield/test_conservative.py. These assert that model.model._compiled_call_impl is not None after .compile(), so this can't silently re-break if someone removes the override again in a future refactor.

Other changes

README update: the "Full-model compilation" bullet wasn't accurate (we compile the backbone, not the full regressor). Reworded to reflect what's actually happening and the realistic speedup range.

Test plan

pytest tests/forcefield/test_direct_regressor.py::test_compile_engages_backbone tests/forcefield/test_conservative.py::test_compile_engages_backbone passes
Full tests/forcefield/test_{direct_regressor,conservative}.py passes in CI
Existing test_regressor_compile_matches_eager still passes (compiled output equivalent to eager)

🤖 Generated with Claude Code

Without an override, model.compile(...) falls through to nn.Module.compile() which compiles self.__call__ (= forward). For DirectForcefieldRegressor that has no effect at inference: predict() calls self.model(batch) directly and never goes through __call__, so the compiled callable is created but never invoked. For ConservativeForcefieldRegressor predict() does flow through __call__, but compiling the full regressor pulls the energy autograd backward and (for OrbMol-v2) the Coulomb / PME path into the traced graph, which dynamo fragments badly. Restore the override on both classes so that .compile() compiles self.model (the GNS message-passing backbone) where almost all the FLOPs live; post-backbone work runs eager. This matches what the internal core repo's orbmolv2 branch already does, and was the behaviour before the porting commit that exposed this regression. Verified on a single H100, orb_v3_direct_20_omat, 3 trials each (variance ~1%): variant 1k atoms 10k atoms compile=True (pre-fix default) 9.23 ms 38.36 ms compile=False (eager) 9.23 ms 38.32 ms manual model.model.compile() (this fix) 7.07 ms 30.04 ms i.e. the pre-fix .compile() was a no-op in inference for direct models, and the restored backbone compile recovers a 22-24% inference speedup. Adds tests/forcefield/test_{direct,conservative}.py::test_compile_engages_backbone that asserts the backbone has _compiled_call_impl set after .compile(), so this can't silently re-break. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The replacement bullet I added in the previous commit framed the restored backbone compile as a feature with a ~1.3x speedup, which is misleading. Older releases already had backbone compile working via the same override; v0.7.0 silently broke it (df8f4f0) and the previous commit just restores prior behaviour. There's no new speedup to announce, so the cleanest thing is to drop the bullet entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vsimkus · 2026-06-04T21:04:49Z

Aren't we overcorrecting here by removing the full compilation? Especially since it's passed our tests/evaluations for the conservative model. The direct regressor was indeed missing the compiled path via .predict(), but I've already put together a fix and tested it here: https://github.com/orbital-materials/orb/pull/3071

timduignan · 2026-06-05T13:48:28Z

Brilliant thanks yep sorry yeah this is wrong deleting

timduignan requested review from benrhodes26, jg8610, reactiv and vsimkus as code owners June 4, 2026 20:12

timduignan marked this pull request as draft June 4, 2026 20:20

timduignan closed this Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore backbone-only .compile() override on forcefield regressors#165

Restore backbone-only .compile() override on forcefield regressors#165
timduignan wants to merge 2 commits into
orbital-materials:mainfrom
timduignan:restore-backbone-compile-override

timduignan commented Jun 4, 2026

Uh oh!

vsimkus commented Jun 4, 2026

Uh oh!

timduignan commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timduignan commented Jun 4, 2026

Summary

Numbers

Tests

Other changes

Test plan

Uh oh!

vsimkus commented Jun 4, 2026

Uh oh!

timduignan commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants