Compile / codegen panel re-audit (spec 1720)#97
Draft
tamnd wants to merge 22 commits into
Draft
Conversation
…ship The varnames slot order was wrong for functions mixing *args/**kwargs with keyword-only params: declareArgs emitted vararg/varkw before the kwonly names, but the VM derives fast-local slots from argcount + kwonlyargcount, so binding landed on the wrong slots. Reorder declareArgs to match symtable_visit_arguments (posonly, args, kwonly, vararg, varkw) and flip the matching slot math in the three call-binding paths. String constants are now interned the way CPython does in intern_constants: identifier-like strings in co_consts (recursing through tuples and frozensets) go through the global intern table, and sys.intern actually canonicalizes instead of returning its argument untouched. optimize_lists_and_sets was missing the contains/iter arm, so a set display on the right of "in" stayed a runtime BUILD_SET instead of folding to a frozenset constant. Port the full pass: frozenset/tuple folding for for-loops and membership tests, plus the list-to-tuple fallback. Also lands the _opcode.stack_effect port and the _testinternalcapi code-var helpers the panel's @cpython_only tests now exercise.
Most of this is the dis module finally rendering exactly what CPython emits. The big one was NOT_TAKEN exception-table protection: when normalize_jumps appends a NOT_TAKEN on a forward conditional, CPython's basicblock_addop leaves i_except untouched, so the new instruction inherits whatever the array slot held. After NOP compaction that slot still carries a removed instruction's handler, so the fall-through stays protected; on a freshly grown slot it's zero and unprotected. Our addOp was zeroing Except with a full struct literal, so every NOT_TAKEN came out unprotected, splitting exception runs and shifting the label column by one. addOp now mirrors the C path and reuses the slot in place, which lines up with/without/distb/traceback disassembly with CPython. Also here: - _collections tuplegetter hands back an owned reference (Py_INCREF), matching tuplegetter_descr_get. Without it the VM over-decref'd namedtuple fields as temporaries and corrupted Positions tuples. - pseudo-op metadata flags (JUMP, SETUP_*, POP_BLOCK, ...) so HasArg/ HasJump and dis's argument-width math behave on the pseudo range. - exctable offset scaling and the disassembly plumbing (marshal, code attrs, sys) needed to make the traceback-dis tests reachable. - _testcapi.code_newempty (PyCode_NewEmpty) and vendored dis_module.py. Remaining: test_disassemble_str needs PEP 649 deferred __annotate__ codegen, tracked separately.
The format-spec parser was byte-indexed, so a multi-byte fill char like
' ' got misread; ParseSpec now walks runes. Width/precision in the
bytes and str % paths went through the wrong C-int conversion, so a huge
precision either panicked the slice allocator or skipped the overflow
that CPython raises. Float/complex precision past INT_MAX now raises
"precision too big", and get_integer overflow raises "Too many decimal
digits in format string".
The 'g' presentation type kept the trailing zeros from the
fixed-precision re-render instead of dropping them like dtoa mode 2, and
the 'z' (no-neg-0) flag decided the sign from the unrounded value so a
small negative that rounds to zero kept its minus. Both fixed in the
float renderer.
Finally the "%s" % str / "{}".format(str) identity optimizations: the
formatter copied the argument through a Go string, losing object
identity. PyUnicode_Format and str.format now clear overallocation once
the format string is exhausted and write the exact str object straight
into the writer, so the readonly-alias fast path returns the argument
unchanged.
…ncode A reraiseError marks an exception so the RERAISE-executing frame skips its PyTraceBack_Here entry, matching CPython where RERAISE jumps straight to exception_unwind. But the marker was leaking past that frame: once the exception found no handler and propagated to the caller, the caller frame also skipped its traceback entry. An exception escaping exec()/eval() this way lost every caller frame above the reraising one, so test_code_module's test_context_tb saw the chained exception render with no traceback block. Unwrap the marker when the reraising frame propagates with no handler so the caller attaches its own entry, exactly as CPython's error label does. Also route a str source through the strict utf-8 codec in compile()/exec() the way _Py_SourceAsString does via PyUnicode_AsUTF8AndSize. A source carrying a lone surrogate now raises UnicodeEncodeError before the tokenizer runs instead of surfacing the lexer's Non-UTF-8 SyntaxError.
A fully constant slice subscript now loads a slice constant and folds through NB_SUBSCR instead of emitting BUILD_SLICE, matching CPython 3.14. Slice constants flow the whole pipeline: codegen const pool, the cfg constant folder, the LOAD_CONST runtime wrap, and marshal's TYPE_SLICE so a .pyc round-trips. The two-element slice optimization (BINARY_SLICE / STORE_SLICE) covers non-constant slices without a step. The comprehension temporary-variable assignment idiom (for y in [f(x)]) drops its FOR_ITER loop, so nested comprehensions emit one fewer loop. %-format of a non-number through %f now raises 'must be real number, not str' the way PyFloat_AsDouble does, and *args / **kwargs parameters count toward the optimizer's parameter slots so they load via LOAD_FAST_BORROW rather than LOAD_FAST_CHECK.
test_peepholer's three remaining errors all came down to assigning frame.f_lineno, which gopy didn't support. Port the line-jump machinery from frameobject.c: model the eval stack across the whole code object (mark_stacks), pick the deepest compatible target on the requested line, bind newly-live locals to None with the RuntimeWarning, pop the excess stack, and move the instruction pointer. The trickier half was the eval loop. A line trace callback can relocate the instruction pointer mid-dispatch, so after the INSTRUMENTED_LINE event we resume at the new target instead of running the opcode the marker was hiding. A bare before/after InstrPtr comparison there also trips on the EXTENDED_ARG-prefix advance fetchExtended does, which sent async-comprehension frames off into the middle of an instruction and panicked, so the jump now carries its own explicit flag. test_peepholer is 130/130.
…ALL_FUNCTION_EX test_compile's TestExpressionStackSize asserts co_stacksize stays O(log n) for big dict displays and big calls. gopy was emitting a single BUILD_MAP/CALL over the whole argument list, so a 100-entry dict or a 100-arg call piled every operand onto the evaluation stack. Port codegen_dict/codegen_subdict and the codegen_call_helper_impl ex_call routing so a run past _PY_STACK_USE_GUIDELINE folds into the container incrementally (BUILD_MAP 0 + MAP_ADD, BUILD_LIST 0 + LIST_APPEND) and a large call goes through CALL_FUNCTION_EX. Also adds __debug__ to the builtins dict so test_debug_assignment can read it.
…un dont_inherit through PyObject_IsTrue _PyAST_Validate raises TypeError for a handful of node checks (NamedExpr target, AnnAssign simple flag, TypeAlias name, invalid Constant type) and ValueError for the rest. Wrap the validator error so those cases surface the type CPython raises instead of a blanket ValueError. compile()'s dont_inherit argument goes through PyObject_IsTrue in CPython, so a misbehaving __bool__ propagates its own exception rather than a type error about the argument.
…aults compile()/eval()/exec() take any buffer object in CPython, so route a memoryview through _Py_SourceAsString and let BINARY_SLICE fall back to the generic subscript instead of erroring on a non-list/tuple/str container. The tokenizer also checks for embedded NUL bytes per line as it reads, so a NUL on an earlier line wins over a later non-UTF-8 byte. The str-input lexer was skipping the NUL scan entirely; add it, and order the two diagnostics by source line so a script with both reports the NUL.
compile() handed an _ast tree built by hand (the way ast.parse output or test fixtures arrive) was dropping TypeAlias statements to Pass and discarding type_params on functions, classes, and aliases, so a generic definition lost its parameters and an invalid TypeAlias name never reached the validator. Convert TypeAlias and rebuild the TypeVar / TypeVarTuple / ParamSpec nodes from their _ast instances.
co_consts holds child code objects; comparing them with reflect.DeepEqual dragged in co_filename and other fields code_richcompare ignores, so an AST round-trip that only changed the filename argument compared unequal. Recurse into codeEqual for *Code consts (and into tuple consts that may nest them), matching _PyCode_ConstantKey.
…n codegen Port update_start_location_to_match_attr (codegen.c:3824) for LOAD_ATTR/ STORE_ATTR/DELETE_ATTR, method calls and augmented attribute stores, so the attribute opcode span starts at the attribute's end line. Locate a lambda's implicit RETURN_VALUE at the body, recompute the comprehension loop-close JUMP on the produced element (dict comps use the combined key/value span), and carry the whole-comprehension location through the async-for scaffolding so an async generator expression's implicit return reports the full span.
…ires visitBoolOp emitted the already-expanded COPY 1 + TO_BOOL + POP_JUMP_IF_X form inline. That hid the conditional jumps from the optimizer's pseudo-jump threading, so a chain like `v[0] and v[1] or v[2]` threaded its POP_JUMP_IF_FALSE into the OR-test's re-test block and evaluated v[0] twice (gh-124285). Emit the pseudo JUMP_IF_FALSE / JUMP_IF_TRUE exactly as codegen_boolop does and let convert_pseudo_conditional_jumps expand them after threading runs. Disassembly now matches CPython byte for byte and test_compound passes.
CPython routes every code object's co_consts, co_filename, co_linetable and co_exceptiontable through a per-compile const cache so two functions with identical constants or location tables share one object (issue #25843, bpo-42217). gopy materializes constants late, at the lift from compile.Code to objects.Code, so the merge runs there: InternCodeConstants walks the freshly lifted tree once and interns each piece through a shared cache keyed by exact type and value. The merged tuple is a fresh object that nothing else pins, so it gets torn down by the first transient Decref while a later co_consts read still points at it. Pin each cached table with an Incref, mirroring how SyncConstObjs pins the per-code consts tuple.
The interactive grammar rule matches one statement_newline and stops, so source like "if x: pass\nelse: pass" past the first statement was silently dropped instead of raising. CPython runs bad_single_statement after a successful single-mode parse, scanning the raw source past the tokenizer cursor for a second statement. Wire that check into runParse for ModeSingle and pin the "multiple statements found while compiling a single statement" SyntaxError when it fires. CPython: Parser/pegen.c:754 bad_single_statement
Three lineno-parity gaps against CPython codegen: - visitExprStmt emitted its trailing POP_TOP at the expression's line; the reference compiler emits it with NO_LOCATION so the discarded value does not anchor a traceback line. - visitIf emitted the else-skip jump at the if line; CPython emits the branch-around-else with NO_LOCATION too. - duplicate_exits_without_lineno only duplicated scope-exit blocks. CPython also duplicates blocks holding an eval-breaker check (backward jump, call, resume) so each predecessor can lend its own line. Ported OPCODE_HAS_EVAL_BREAK and basicblock_has_eval_break. Fixes test_lineno_after_implicit_return, test_lineno_procedure_call, test_column_offset_deduplication and the synthetic-jump lineno cases. CPython: Python/flowgraph.c:3543 is_exit_or_eval_check_without_lineno
The compiler emits a finally body twice: once on the normal fall-through and once on the exception path. Without suppression a SyntaxWarning inside the finally fired twice. CPython tracks c_disable_warning, bumping it while emitting the FINALLY_END copy so warnings there are issued only once. Added the counter, raise/restore it around the finally-end fblock, and short-circuit warnAt while it is set. Fixes test_compile_warning_in_finally. CPython: Python/compile.c:106 c_disable_warning
code.co_filename, co_consts, co_linetable and co_exceptiontable returned the cached object borrowed. CPython hands back Py_NewRef of the cached value, so the caller owns a reference. Incref before returning to match, fixing a refcount under-count that left these attrs vulnerable to premature collection.
PEP 749 evaluates type-parameter bounds and defaults in a lazy annotation scope carrying a .format parameter and a prebuilt (1,) defaults tuple, guarded on the requested format. Emit that scaffolding so a TypeVar default folds the same way CPython does. Also refreshed two boolop unit tests and the if/else jump test: compileMod runs raw codegen, so boolop emits pseudo JUMP_IF_FALSE/JUMP_IF_TRUE and the else-skip is JUMP_NO_INTERRUPT before the CFG pass expands them. Fixes test_folding_type_param.
InitFrame sizes LocalsPlus in one make() before the cycle collector can see the frame, but a generator running on its own goroutine can trigger a collection that traverses a frame mid-init on another goroutine, reaching FrameFastLocal/FrameCellLocal/FrameFreeLocal before LocalsPlus reaches full width. Return nil for an index past the current length instead of indexing out of range. This guards the symptom; the underlying VM/GC visibility race is task #223.
…it MAKE_FUNCTION_ANNOTATE Module-level annotations were stashed together with the BUILD_SET that seeds __conditional_annotations__ and prepended at the very front of the graph, so the __annotate__ build landed ahead of RESUME instead of after it. Plant ANNOTATIONS_PLACEHOLDER right after the module RESUME the way codegen_enter_scope does, keep only the __annotate__ build in the stash, and have cfgFromSequence splice it in at the placeholder. The BUILD_SET / STORE __conditional_annotations__ pair now runs as a body prologue, which matches CPython's RESUME, __annotate__, BUILD_SET, body order. Function annotations were tagged with MAKE_FUNCTION_ANNOTATIONS (0x04) while actually pushing the __annotate__ callable, so SET_FUNCTION_ATTRIBUTE stored a callable under __annotations__ and foo.__annotate__ stayed None. Return MAKE_FUNCTION_ANNOTATE (0x10), emit the matching attribute op, and split the VM handler so 0x04 stores an annotations dict and 0x10 wires the __annotate__ callable (with the gh-137814 qualname fixup).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-running the 36-file Compile / codegen panel under the current build and driving each file to CPython parity, one at a time.
First file landed: test_dis.py, down from 9 failures + 8 errors to 2 failures (both the PEP 649 deferred-annotation case, tracked separately).
The headline fix is NOT_TAKEN exception-table protection. When normalize_jumps appends a NOT_TAKEN on a forward conditional, CPython's basicblock_addop never writes i_except, so the instruction inherits whatever the array slot already held. After NOP compaction that slot still carries a removed instruction's handler (protected fall-through); on a freshly grown slot it's zero (unprotected). Our addOp was clobbering Except with a full struct literal, so every NOT_TAKEN came out unprotected, which split exception runs and pushed the disassembly label column one space wide. addOp now reuses the slot in place like the C path, and with/without/distb/traceback disassembly all line up.
Also in this batch:
Still working through the rest of the panel (test_peepholer needs the _testinternalcapi compiler-pipeline helpers, test_compile has its own set). Will push as each file goes green.