[mypyc] Enable incremental self-compilation#21369
[mypyc] Enable incremental self-compilation#21369VaggelisD wants to merge 2 commits intopython:masterfrom
Conversation
This comment has been minimized.
This comment has been minimized.
Six fixes on top of python#21299 — all required to compile mypy itself or to install a separate=True wheel via pip. - mypyc/build.py: pip invokes setup.py twice when building a wheel. On the second invocation mypy's incremental cache is fully warm, so we generate no new C source for any group; the resulting extensions ship without their entry points and import as stubs. Fix: when a group emits no C source, reuse the .c file from the previous pass. - mypyc/codegen/{emit,emitfunc}.py: when code in one compiled group reads an attribute on an object whose class lives in another group, the generated cast depends on that other group's struct definitions. We weren't recording the dependency, so the C compiler couldn't see the layout and the build failed. Fix: register the dependency at the cast site. - mypyc/codegen/emitmodule.py + mypyc/build.py: when mypy compiles itself, a generated shim file can share a basename with a runtime C file. The C compiler resolves the runtime include relative to the shim's directory and picks up the shim instead. Fix: emit those includes with the <> form so the search uses -I paths only. The `get_header_deps` regex was tightened to match both quote styles (otherwise headers in <> form drop out of Extension.depends and incremental rebuilds miss layout changes). - mypyc/lib-rt/misc_ops.c: each compiled module gets its own shared library next to it in the package tree. The runtime was computing the module's file path as if a single shared library sat above the whole package, which doubled the package prefix and broke submodule lookups. Fix: detect the per-module case and use only the module's leaf name. - mypyc/irbuild/prepare.py: traits and builtin-derived classes don't get a real C constructor emitted. A clean build sidesteps that, but a fully cached rebuild was taking the direct-call path and producing C that referenced a constructor that doesn't exist. Fix: skip the registration the same way a clean build does. - mypyc/build.py: on every build_ext, setuptools rewrites every compiled .so in the source tree even when nothing changed. On macOS this invalidates the OS signature cache, so every import on the next run pays a re-verification cost. Fix: skip the copy when source and destination already match — takes a 1-line edit rebuild from ~72s to ~6s. setup.py also gets a MYPYC_SEPARATE env knob so CI can exercise the codegen path against mypy itself.
22d5351 to
5aea6ec
Compare
This comment has been minimized.
This comment has been minimized.
| # Trait/builtin-base classes have an ir.ctor FuncDecl | ||
| # but no emitted CPyDef_<ctor>, so a cross-group direct | ||
| # call would hit an undefined symbol. Mirror the same | ||
| # skip in prepare_ext_class_def. | ||
| if not ir.is_trait and not ir.builtin_base: | ||
| mapper.func_to_decl[node.node] = ir.ctor |
There was a problem hiding this comment.
i can't find prepare_ext_class_def, i think the mentioned skip is actually in prepare_init_method.
would it be possible to make ClassIR.ctor optional instead? if we don't actually generate it anyway in this case, setting it to None explicitly could help with bugs like this one in the future.
There was a problem hiding this comment.
Yeah, right now theres a couple usages of "always present but maybe fake".
One restriction I have with making it Optional though is that it might affect other consumers of ir.ctor, is this better off as its own PR or should we do it now?
There was a problem hiding this comment.
it's fine to leave it for another PR.
- mypyc/codegen/emitmodule.py + mypyc/build.py: drop the path reconstruction in mypyc_build; pull the file list straight from the IR cache's src_hashes when a group is fully cached. Covers multi_file mode and group_name=None for free. - mypyc/build.py: drop deps from get_header_deps that don't exist under target_dir. The widened regex picks up system headers like <Python.h>; feeding non-existent paths into Extension.depends forces a full rebuild on every run. - mypyc/lib-rt/misc_ops.c: split the two ternaries with INCREF side effects in CPyImport_SetModuleFile into if/else. - mypyc/irbuild/prepare.py: fix stale comment reference (prepare_ext_class_def -> prepare_init_method).
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
| # Trait/builtin-base classes have an ir.ctor FuncDecl | ||
| # but no emitted CPyDef_<ctor>, so a cross-group direct | ||
| # call would hit an undefined symbol. Mirror the same | ||
| # skip in prepare_ext_class_def. | ||
| if not ir.is_trait and not ir.builtin_base: | ||
| mapper.func_to_decl[node.node] = ir.ctor |
There was a problem hiding this comment.
it's fine to leave it for another PR.
| if ext._needs_stub: | ||
| inplace_stub = self._get_equivalent_stub(ext, inplace_file) | ||
| self._write_stub_file(inplace_stub, ext, compile=True) |
There was a problem hiding this comment.
is this safe to skip when skipping the object file copy?
i don't think the extensions we generate need stubs but the patch technically could affect setup_tools outside of mypycify. so maybe also remove the patch at the end of mypycify?
Six fixes on top of #21299, all required to self-compile mypy or to install a
separate=Truewheel via pip.mypyc/build.py: pip invokessetup.pytwice when building a wheel. On the second invocation mypy's incremental cache is fully warm, so we generate no new C source for any group; the resulting extensions ship without their entry points and import as stubs.mypyc/codegen/{emit,emitfunc}.py: when code in one compiled group reads an attribute on an object whose class lives in another group, the generated cast depends on that other group's struct definitions. We weren't recording the dependency, so the C compiler couldn't see the layout and the build failed.mypyc/codegen/emitmodule.py: when mypy compiles itself, a generated shim file can share a basename with a runtime C file. The C compiler resolves the runtime include relative to the shim's directory and picks up the shim instead.mypyc/lib-rt/misc_ops.c: each compiled module gets its own shared library next to it in the package tree. The runtime was computing the module's file path as if a single shared library sat above the whole package, which doubled the package prefix and broke submodule lookups.mypyc/irbuild/prepare.py: traits and builtin-derived classes don't get a real C constructor emitted. A clean build sidesteps that, but a fully cached rebuild was taking the direct-call path and producing C that referenced a constructor that doesn't exist.mypyc/build.py: on every build_ext, setuptools rewrites every compiled .so in the source tree even when nothing changed. On macOS this invalidates the OS signature cache, so every import on the next run pays a re-verification cost.setuptools limitation though (relevant mypy issue ?)I also added a
MYPYC_SEPARATEenv knob so CI can exercise the codegen path against mypy itself.Benchmarks
Mypy self-compile on macOS,
MYPYC_OPT_LEVEL=0,-j 11. Three scenarios: