Skip to content

Performance: cache ClassDef.ancestors() transitive walk#3048

Open
Pierre-Sassoulas wants to merge 4 commits into
mainfrom
perf/cache-classdef-ancestors
Open

Performance: cache ClassDef.ancestors() transitive walk#3048
Pierre-Sassoulas wants to merge 4 commits into
mainfrom
perf/cache-classdef-ancestors

Conversation

@Pierre-Sassoulas
Copy link
Copy Markdown
Member

Type of Changes

Type
✨ New feature

Description

The recursive walk in ancestors(recurs=True) re-resolved shared base classes on every call, amplifying cost on deep MRO chains. Cache the materialized tuple as a cached_property so each ClassDef pays for its ancestors once, and the cache dies with the instance when the manager drops the AST.

context is intentionally not part of the key: the result is path-independent and the walk's own yielded set handles cycle prevention.

Measured on pandas/core/frame.py (interleaved A/B, n=4):
baseline 21.34s ± 0.18 -> patched 20.48s ± 0.13 (-4.0%)

Cache hit rate on the same run: 98% (66k hits / 1.2k misses on 1k distinct ClassDefs). Larger speedups expected on codebases with deeper MROs (SQLAlchemy/Pydantic-shaped projects).

Closes #1115

@Pierre-Sassoulas Pierre-Sassoulas added this to the 4.2.0 milestone May 10, 2026
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the perf/cache-classdef-ancestors branch 3 times, most recently from d62e261 to 4160de2 Compare May 16, 2026 15:41
The recursive walk in ``ancestors(recurs=True)`` re-resolved shared base
classes on every call, amplifying cost on deep MRO chains. Cache the
materialized tuple as a ``cached_property`` so each ClassDef pays for
its ancestors once, and the cache dies with the instance when the
manager drops the AST.

``context`` is intentionally not part of the key — the result is
path-independent and the walk's own ``yielded`` set handles cycle
prevention.

Measured on pandas/core/frame.py (interleaved A/B, n=4):
  baseline 21.34s ± 0.18  ->  patched 20.48s ± 0.13   (-4.0%)

Cache hit rate on the same run: 98% (66k hits / 1.2k misses on 1k
distinct ClassDefs). Larger speedups expected on codebases with
deeper MROs (SQLAlchemy/Pydantic-shaped projects).

Closes #1115
The metaclass lookup walked the full MRO on every call, with each
recursive call traversing the same ancestor chain. Cache the per-node
result on the instance so recursive entries short-circuit. Cycle
protection in cyclic class hierarchies is preserved via a
``_COMPUTING_METACLASS`` sentinel that re-entry treats as the cycle
break the ``seen`` set provides in the slow path.

Measured on synthetic FastAPI/SQLAlchemy/Pydantic target (5.94x
``ancestors`` recursion factor — matches rgant's #1115 profile shape):

  baseline 10.37s ± 0.09  ->  patched 10.17s ± 0.06   (-1.9%)

Reduces ``_find_metaclass`` body executions from ~40k to ~650 on the
same run (cache hit rate >98%). Pandas (no metaclass-heavy code)
shows no measurable change, as expected.

References #1115
The transform visitor walks every node in every parsed module and, for
each ``Call`` node, ran all 19 ``_builtin_filter_predicate`` partials
in sequence — each repeating the same ``isinstance(node.func, Name)``
and ``node.func.name`` lookups. With ~21k Call nodes per pandas/frame
run, that's 375k+ predicate calls duplicating identical work.

Replace the 19 list entries with one dispatcher that does the
``isinstance``/name lookup once and routes to the right inference
function via dict lookup. ``register_builtin_transform`` keeps the
same signature so external callers (if any) are unaffected.

Measured wall (interleaved A/B, n=3-4):
  pandas/frame.py:    20.10s +/- 0.21  ->  19.51s +/- 0.07   (-3.0%)
  deep-nested target:  9.82s +/- 0.10  ->   9.55s +/- 0.10   (-2.8%)

References #1115
``clone()`` is called ~85k times per pandas/frame.py pylint run. The
existing implementation went through ``InferenceContext()`` whose
``__init__`` runs conditional defaults for fields the clone immediately
overwrites. With ``__slots__`` defined, ``__new__`` + direct slot
assignment is strictly cheaper.

Measured wall (interleaved A/B, n=4 on pandas/frame.py):
  19.77s +/- 0.16  ->  19.58s +/- 0.12   (-0.93%)

References #1115
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the perf/cache-classdef-ancestors branch from 4160de2 to eba3439 Compare May 18, 2026 20:09
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 94.80519% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.52%. Comparing base (056f9ff) to head (eba3439).

Files with missing lines Patch % Lines
astroid/nodes/scoped_nodes/scoped_nodes.py 90.90% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3048      +/-   ##
==========================================
- Coverage   93.52%   93.52%   -0.01%     
==========================================
  Files          92       92              
  Lines       11329    11386      +57     
==========================================
+ Hits        10596    10649      +53     
- Misses        733      737       +4     
Flag Coverage Δ
linux 93.39% <94.80%> (-0.01%) ⬇️
pypy 93.52% <94.80%> (-0.01%) ⬇️
windows 93.50% <94.80%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
astroid/brain/brain_builtin_inference.py 92.87% <100.00%> (+0.18%) ⬆️
astroid/context.py 96.29% <100.00%> (-1.14%) ⬇️
astroid/nodes/scoped_nodes/scoped_nodes.py 93.31% <90.90%> (-0.02%) ⬇️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Astroid calls to ancestors are uncached and slow for templates and generics in ClassDef.ancestors

1 participant