Performance: cache ClassDef.ancestors() transitive walk#3048
Open
Pierre-Sassoulas wants to merge 4 commits into
Open
Performance: cache ClassDef.ancestors() transitive walk#3048Pierre-Sassoulas wants to merge 4 commits into
Pierre-Sassoulas wants to merge 4 commits into
Conversation
d62e261 to
4160de2
Compare
The recursive walk in ``ancestors(recurs=True)`` re-resolved shared base classes on every call, amplifying cost on deep MRO chains. Cache the materialized tuple as a ``cached_property`` so each ClassDef pays for its ancestors once, and the cache dies with the instance when the manager drops the AST. ``context`` is intentionally not part of the key — the result is path-independent and the walk's own ``yielded`` set handles cycle prevention. Measured on pandas/core/frame.py (interleaved A/B, n=4): baseline 21.34s ± 0.18 -> patched 20.48s ± 0.13 (-4.0%) Cache hit rate on the same run: 98% (66k hits / 1.2k misses on 1k distinct ClassDefs). Larger speedups expected on codebases with deeper MROs (SQLAlchemy/Pydantic-shaped projects). Closes #1115
The metaclass lookup walked the full MRO on every call, with each recursive call traversing the same ancestor chain. Cache the per-node result on the instance so recursive entries short-circuit. Cycle protection in cyclic class hierarchies is preserved via a ``_COMPUTING_METACLASS`` sentinel that re-entry treats as the cycle break the ``seen`` set provides in the slow path. Measured on synthetic FastAPI/SQLAlchemy/Pydantic target (5.94x ``ancestors`` recursion factor — matches rgant's #1115 profile shape): baseline 10.37s ± 0.09 -> patched 10.17s ± 0.06 (-1.9%) Reduces ``_find_metaclass`` body executions from ~40k to ~650 on the same run (cache hit rate >98%). Pandas (no metaclass-heavy code) shows no measurable change, as expected. References #1115
The transform visitor walks every node in every parsed module and, for each ``Call`` node, ran all 19 ``_builtin_filter_predicate`` partials in sequence — each repeating the same ``isinstance(node.func, Name)`` and ``node.func.name`` lookups. With ~21k Call nodes per pandas/frame run, that's 375k+ predicate calls duplicating identical work. Replace the 19 list entries with one dispatcher that does the ``isinstance``/name lookup once and routes to the right inference function via dict lookup. ``register_builtin_transform`` keeps the same signature so external callers (if any) are unaffected. Measured wall (interleaved A/B, n=3-4): pandas/frame.py: 20.10s +/- 0.21 -> 19.51s +/- 0.07 (-3.0%) deep-nested target: 9.82s +/- 0.10 -> 9.55s +/- 0.10 (-2.8%) References #1115
``clone()`` is called ~85k times per pandas/frame.py pylint run. The existing implementation went through ``InferenceContext()`` whose ``__init__`` runs conditional defaults for fields the clone immediately overwrites. With ``__slots__`` defined, ``__new__`` + direct slot assignment is strictly cheaper. Measured wall (interleaved A/B, n=4 on pandas/frame.py): 19.77s +/- 0.16 -> 19.58s +/- 0.12 (-0.93%) References #1115
4160de2 to
eba3439
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3048 +/- ##
==========================================
- Coverage 93.52% 93.52% -0.01%
==========================================
Files 92 92
Lines 11329 11386 +57
==========================================
+ Hits 10596 10649 +53
- Misses 733 737 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Type of Changes
Description
The recursive walk in
ancestors(recurs=True)re-resolved shared base classes on every call, amplifying cost on deep MRO chains. Cache the materialized tuple as acached_propertyso each ClassDef pays for its ancestors once, and the cache dies with the instance when the manager drops the AST.contextis intentionally not part of the key: the result is path-independent and the walk's ownyieldedset handles cycle prevention.Measured on pandas/core/frame.py (interleaved A/B, n=4):
baseline 21.34s ± 0.18 -> patched 20.48s ± 0.13 (-4.0%)
Cache hit rate on the same run: 98% (66k hits / 1.2k misses on 1k distinct ClassDefs). Larger speedups expected on codebases with deeper MROs (SQLAlchemy/Pydantic-shaped projects).
Closes #1115