gh-127958: Trace from RESUME in the JIT#145905
Conversation
This reverts commit 6b65f76.
markshannon
left a comment
There was a problem hiding this comment.
The comprehensions benchmark is so much slower due to RESUME_CHECK_JIT being that much slower than RESUME_CHECK last I checked.
I think that will need to be addressed before we can merge this. I've one suggestion that might help.
|
When you're done making the requested changes, leave the comment: |
From my investigations, this is one of the strangest slowdowns I have found so far: Leaving RESUME_CHECK_JIT in the interpreter loop but only allowing specialization to RESUME_CHECK restores the original performance. I was bout to think this is some compiler/CPU magic, but it does show up on AArch64 and on GCC/Clang as well. Luckily, I remembered RESUME is part of generator static magic, and I finally fixed the slowdown in this commit 5ccf8e9 |
|
I have made the requested changes; please review again. |
|
Thanks for making the requested changes! @markshannon: please review the changes made to this pull request. |
markshannon
left a comment
There was a problem hiding this comment.
This looks good. I've one comment on what was _QUICKEN_RESUME, but that's all.
Do you have benchmark numbers for this PR with the generator fix added?
I assume they will be better, just curious to see what they look like.
| { .op.code = INTERPRETER_EXIT, .op.arg = 0 }, /* reached on yield */ | ||
| { .op.code = RESUME, .op.arg = RESUME_OPARG_DEPTH1_MASK | RESUME_AT_FUNC_START } | ||
| { .op.code = RESUME, .op.arg = RESUME_OPARG_DEPTH1_MASK | RESUME_AT_FUNC_START }, | ||
| { .op.code = CACHE, .op.arg = 0 } /* RESUME's CACHE */ |
There was a problem hiding this comment.
NOTE:
This change is fine, but the CACHE isn't strictly needed as the RESUME is only a marker not to instrument the prior instructions, it will never be executed.
I updated the PR's original description with it. The TLDR is that bm_generators is slower (expected, because we still can't optimize recursion well in the JIT, and now we're jitting more of bm_generators), but almost everything else that uses recursion is better. macOS sees an across-the-board improvement, probably due to the good work you folks at ARM have done :). |
So this produced a really nice result. The slowdown on generators is now gone on my x86_64 machine, while keeping the speedup seen in logging that's also seen in the FT runners |
markshannon
left a comment
There was a problem hiding this comment.
Looks good.
The code is clean and the performance impact is significant.
|
Just to make sure that the latest commit indeed fixed generators, this branch without the fix sees a 30% slowdown in bm_generators. So yes, let's do this. |
This adds a CACHE entry to RESUME and a RESUME_CHECK_JIT opcode.
Performance numbers are rather promising:
RESUMEin the JIT #127958