Skip to content

gh-127958: Trace from RESUME in the JIT#145905

Merged
Fidget-Spinner merged 50 commits intopython:mainfrom
Fidget-Spinner:resume_tracing
Mar 16, 2026
Merged

gh-127958: Trace from RESUME in the JIT#145905
Fidget-Spinner merged 50 commits intopython:mainfrom
Fidget-Spinner:resume_tracing

Conversation

@Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Mar 13, 2026

This adds a CACHE entry to RESUME and a RESUME_CHECK_JIT opcode.

Performance numbers are rather promising:

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comprehensions benchmark is so much slower due to RESUME_CHECK_JIT being that much slower than RESUME_CHECK last I checked.

I think that will need to be addressed before we can merge this. I've one suggestion that might help.

@bedevere-app
Copy link

bedevere-app bot commented Mar 13, 2026

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@Fidget-Spinner
Copy link
Member Author

I think that will need to be addressed before we can merge this. I've one suggestion that might help.

From my investigations, this is one of the strangest slowdowns I have found so far:
RESUME_CHECK_JIT without the _JIT_ (ie, exact same code as RESUME_CHECK) is still 10-20% slower on bm_comprehensions.

Leaving RESUME_CHECK_JIT in the interpreter loop but only allowing specialization to RESUME_CHECK restores the original performance.

I was bout to think this is some compiler/CPU magic, but it does show up on AArch64 and on GCC/Clang as well. Luckily, I remembered RESUME is part of generator static magic, and I finally fixed the slowdown in this commit 5ccf8e9

@Fidget-Spinner
Copy link
Member Author

I have made the requested changes; please review again.

@bedevere-app
Copy link

bedevere-app bot commented Mar 15, 2026

Thanks for making the requested changes!

@markshannon: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from markshannon March 15, 2026 10:06
Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I've one comment on what was _QUICKEN_RESUME, but that's all.

Do you have benchmark numbers for this PR with the generator fix added?
I assume they will be better, just curious to see what they look like.

{ .op.code = INTERPRETER_EXIT, .op.arg = 0 }, /* reached on yield */
{ .op.code = RESUME, .op.arg = RESUME_OPARG_DEPTH1_MASK | RESUME_AT_FUNC_START }
{ .op.code = RESUME, .op.arg = RESUME_OPARG_DEPTH1_MASK | RESUME_AT_FUNC_START },
{ .op.code = CACHE, .op.arg = 0 } /* RESUME's CACHE */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE:
This change is fine, but the CACHE isn't strictly needed as the RESUME is only a marker not to instrument the prior instructions, it will never be executed.

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Mar 16, 2026

Do you have benchmark numbers for this PR with the generator fix added? I assume they will be better, just curious to see what they look like.

I updated the PR's original description with it. The TLDR is that bm_generators is slower (expected, because we still can't optimize recursion well in the JIT, and now we're jitting more of bm_generators), but almost everything else that uses recursion is better.

macOS sees an across-the-board improvement, probably due to the good work you folks at ARM have done :).

@Fidget-Spinner
Copy link
Member Author

I think we only want to be jitting at the start of normal functions.

So this produced a really nice result. The slowdown on generators is now gone on my x86_64 machine, while keeping the speedup seen in logging that's also seen in the FT runners

Mean +- std dev: [gen_no_resume] 32.3 ms +- 0.2 ms -> [gen_resume] 31.5 ms +- 0.2 ms: 1.02x faster
logging_format: Mean +- std dev: [logging_no_resume] 6.58 us +- 0.10 us -> [logging_resume] 5.93 us +- 0.05 us: 1.11x faster
logging_silent: Mean +- std dev: [logging_no_resume] 81.8 ns +- 0.3 ns -> [logging_resume] 85.6 ns +- 1.1 ns: 1.05x slower
logging_simple: Mean +- std dev: [logging_no_resume] 5.86 us +- 0.06 us -> [logging_resume] 5.35 us +- 0.04 us: 1.10x faster

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
The code is clean and the performance impact is significant.

@Fidget-Spinner
Copy link
Member Author

Just to make sure that the latest commit indeed fixed generators, this branch without the fix sees a 30% slowdown in bm_generators.
Mean +- std dev: [gen_resume] 31.5 ms +- 0.2 ms -> [gen_resume_nofix] 41.2 ms +- 0.4 ms: 1.31x slower

So yes, let's do this.

@Fidget-Spinner Fidget-Spinner merged commit 3d0824a into python:main Mar 16, 2026
77 of 79 checks passed
@Fidget-Spinner Fidget-Spinner deleted the resume_tracing branch March 16, 2026 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants