-
-
Notifications
You must be signed in to change notification settings - Fork 34.3k
Description
Crash report
What happened?
It's possible to segfault the interpreter by causing an allocation failure in ExternalEntityParserCreate.
Automated diagnosis:
Bug: NULL deref in clear_handlers when handlers is NULL (line 2490).
Impact: ExternalEntityParserCreate error paths call Py_DECREF(new_parser) → dealloc → clear_handlers → dereferences self->handlers[i] where handlers is NULL. Crash on any allocation failure during external entity parser creation.
Fix: Add if (self->handlers == NULL) return; at top of clear_handlers.
Must be run with a non-ASan Python build. Uses RLIMIT_AS to force PyMem_Malloc(buffer_size) to fail in ExternalEntityParserCreate.
MRE:
import xml.parsers.expat
import resource
p = xml.parsers.expat.ParserCreate()
p.buffer_text = True
# Set buffer_size while we still have full memory
p.buffer_size = 2**31 - 1 # INT_MAX (~2GB) — allocates buffer for parent
soft, hard = resource.getrlimit(resource.RLIMIT_AS)
def handler(context, base, system_id, public_id):
# NOW restrict memory — right before ExternalEntityParserCreate
# The child parser will try PyMem_Malloc(2GB) and fail
limit = 512 * 1024 * 1024 # 512MB
resource.setrlimit(resource.RLIMIT_AS, (limit, hard))
ext = p.ExternalEntityParserCreate(context)
ext.Parse("<x/>", True)
return 1
p.ExternalEntityRefHandler = handler
xml_data = b"""<?xml version="1.0"?>
<!DOCTYPE test [
<!ENTITY ext SYSTEM "external.xml">
]>
<root>&ext;</root>"""
p.Parse(xml_data, True)Backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff73c5434 in clear_handlers (self=self@entry=0x7ffff72c0750, initial=initial@entry=0) at ./Modules/pyexpat.c:2504
2504 Py_CLEAR(self->handlers[i]);
#0 0x00007ffff73c5434 in clear_handlers (self=self@entry=0x7ffff72c0750, initial=initial@entry=0) at ./Modules/pyexpat.c:2504
#1 0x00007ffff73c96c4 in xmlparse_clear (op=op@entry=0x7ffff72c0750) at ./Modules/pyexpat.c:1555
#2 0x00007ffff73c94fa in xmlparse_dealloc (op=0x7ffff72c0750) at ./Modules/pyexpat.c:1569
#3 0x000055555573282f in _Py_Dealloc (op=op@entry=0x7ffff72c0750) at Objects/object.c:3274
#4 0x00007ffff73ca421 in Py_DECREF (lineno=1107, op=0x7ffff72c0750, filename=<optimized out>) at ./Include/refcount.h:403
#5 pyexpat_xmlparser_ExternalEntityParserCreate_impl (self=0x7ffff74aedd0, cls=0x555556015180, context=0x7ffff74f5d78 "ext", encoding=0x0) at ./Modules/pyexpat.c:1107
#6 pyexpat_xmlparser_ExternalEntityParserCreate (self=0x7ffff74aedd0, cls=0x555556015180, args=0x7fffffffa0a0, nargs=nargs@entry=1, kwnames=0x0) at ./Modules/clinic/pyexpat.c.h:313
#7 0x00005555556bbd2b in method_vectorcall_FASTCALL_KEYWORDS_METHOD (func=func@entry=0x7ffff74f8ad0, args=args@entry=0x7fffffffa098, nargsf=nargsf@entry=9223372036854775810,
kwnames=kwnames@entry=0x0) at Objects/descrobject.c:381
#8 0x00005555556a7ebe in _PyObject_VectorcallTstate (tstate=0x555555f41be0 <_PyRuntime+360640>, callable=0x7ffff74f8ad0, args=0x7fffffffa098, nargsf=9223372036854775810, kwnames=0x0)
at ./Include/internal/pycore_call.h:136
#9 0x0000555555844259 in _Py_VectorCallInstrumentation_StackRefSteal (callable=..., arguments=0x7ffff7fa7118, total_args=total_args@entry=2, kwnames=kwnames@entry=...,
call_instrumentation=false, frame=0x7ffff7fa7090, this_instr=0x7ffff75923ac, tstate=0x555555f41be0 <_PyRuntime+360640>) at Python/ceval.c:770
#10 0x000055555584eb68 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555f41be0 <_PyRuntime+360640>, frame=<optimized out>, frame@entry=0x7ffff7fa7090, throwflag=throwflag@entry=0)
at Python/generated_cases.c.h:1838
#11 0x0000555555843a8b in _PyEval_EvalFrame (tstate=0x555555f41be0 <_PyRuntime+360640>, frame=0x7ffff7fa7090, throwflag=0) at ./Include/internal/pycore_ceval.h:118
#12 _PyEval_Vector (tstate=0x555555f41be0 <_PyRuntime+360640>, func=0x7ffff74d68d0, locals=0x0, args=0x7ffff74f1c00, argcount=4, kwnames=0x0) at Python/ceval.c:2133
#13 0x00007ffff73c71cf in call_with_frame (lineno=769, func=0x7ffff72c0750, args=0x7ffff74f1be0, self=0x7ffff74aedd0, funcname=<optimized out>) at ./Modules/pyexpat.c:305
#14 my_ExternalEntityRefHandler (parser=0x5555560e56f8, context=0x5555560f0854 "ext", base=0x0, systemId=0x5555560eff88 "external.xml", publicId=0x0) at ./Modules/pyexpat.c:757
#15 0x00007ffff73d88bb in doContent (parser=0x5555560e56f8, startTagLevel=0, enc=0x7ffff73fcb28 <utf8_encoding>,
s=0x5555560e6244 "&ext;</root>", '\315' <repeats 187 times>, <incomplete sequence \315>..., end=end@entry=0x5555560e6250 '\315' <repeats 199 times>, <incomplete sequence \315>...,
nextPtr=0x5555560e5728, haveMore=0 '\000', account=XML_ACCOUNT_DIRECT) at ./Modules/expat/xmlparse.c:3458
#16 0x00007ffff73d36f3 in contentProcessor (parser=0x5555560e56f8, start=0x5555560e623e "<root>&ext;</root>", '\315' <repeats 181 times>, <incomplete sequence \315>...,
end=0x5555560e6250 '\315' <repeats 199 times>, <incomplete sequence \315>..., endPtr=<optimized out>) at ./Modules/expat/xmlparse.c:3179
#17 doProlog (parser=parser@entry=0x5555560e56f8, enc=0x7ffff73fcb28 <utf8_encoding>, s=0x5555560e623e "<root>&ext;</root>", '\315' <repeats 181 times>, <incomplete sequence \315>...,
s@entry=0x5555560e61f0 "<?xml version=\"1.0\"?>\n<!DOCTYPE test [\n<!ENTITY ext SYSTEM \"external.xml\">\n]>\n<root>&ext;</root>", '\315' <repeats 103 times>, <incomplete sequence \315>..., end=end@entry=0x5555560e6250 '\315' <repeats 199 times>, <incomplete sequence \315>..., tok=<optimized out>,
next=0x5555560e623e "<root>&ext;</root>", '\315' <repeats 181 times>, <incomplete sequence \315>..., nextPtr=0x5555560e5728, haveMore=0 '\000', allowClosingDoctype=1 '\001',
account=XML_ACCOUNT_DIRECT) at ./Modules/expat/xmlparse.c:5486
#18 0x00007ffff73d03e2 in prologProcessor (parser=0x5555560e56f8,
s=0x5555560e61f0 "<?xml version=\"1.0\"?>\n<!DOCTYPE test [\n<!ENTITY ext SYSTEM \"external.xml\">\n]>\n<root>&ext;</root>", '\315' <repeats 103 times>, <incomplete sequence \315>...,
end=0x5555560e6250 '\315' <repeats 199 times>, <incomplete sequence \315>..., nextPtr=0x5555560e5728) at ./Modules/expat/xmlparse.c:5189
#19 prologInitProcessor (parser=0x5555560e56f8,
s=0x5555560e61f0 "<?xml version=\"1.0\"?>\n<!DOCTYPE test [\n<!ENTITY ext SYSTEM \"external.xml\">\n]>\n<root>&ext;</root>", '\315' <repeats 103 times>, <incomplete sequence \315>...,
end=0x5555560e6250 '\315' <repeats 199 times>, <incomplete sequence \315>..., nextPtr=0x5555560e5728) at ./Modules/expat/xmlparse.c:4991
#20 0x00007ffff73cf843 in callProcessor (parser=0x5555560e56f8,
start=start@entry=0x5555560e61f0 "<?xml version=\"1.0\"?>\n<!DOCTYPE test [\n<!ENTITY ext SYSTEM \"external.xml\">\n]>\n<root>&ext;</root>", '\315' <repeats 103 times>, <incomplete sequence \315>..., end=0x5555560e6250 '\315' <repeats 199 times>, <incomplete sequence \315>..., endPtr=0x5555560e5728) at ./Modules/expat/xmlparse.c:1293
#21 0x00007ffff73cf6b0 in PyExpat_XML_ParseBuffer (parser=parser@entry=0x5555560e56f8, len=<optimized out>, isFinal=<optimized out>) at ./Modules/expat/xmlparse.c:2494
#22 0x00007ffff73cf1ab in PyExpat_XML_Parse (parser=<optimized out>,
s=s@entry=0x7ffff74a89c0 "<?xml version=\"1.0\"?>\n<!DOCTYPE test [\n<!ENTITY ext SYSTEM \"external.xml\">\n]>\n<root>&ext;</root>", len=<optimized out>, isFinal=isFinal@entry=1)
at ./Modules/expat/xmlparse.c:2448
#23 0x00007ffff73c98c4 in pyexpat_xmlparser_Parse_impl (self=0x7ffff74aedd0, cls=0x555556015180, data=0x7ffff74a89a0, isfinal=1) at ./Modules/pyexpat.c:892
#24 pyexpat_xmlparser_Parse (self=0x7ffff74aedd0, cls=0x555556015180, args=0x7fffffffb620, nargs=nargs@entry=2, kwnames=0x0) at ./Modules/clinic/pyexpat.c.h:109
#25 0x00005555556bbd2b in method_vectorcall_FASTCALL_KEYWORDS_METHOD (func=func@entry=0x7ffff74f8890, args=args@entry=0x7fffffffb618, nargsf=nargsf@entry=9223372036854775811,
kwnames=kwnames@entry=0x0) at Objects/descrobject.c:381
#26 0x00005555556a7ebe in _PyObject_VectorcallTstate (tstate=0x555555f41be0 <_PyRuntime+360640>, callable=0x7ffff74f8890, args=0x7fffffffb618, nargsf=9223372036854775811, kwnames=0x0)
at ./Include/internal/pycore_call.h:136
#27 0x0000555555844259 in _Py_VectorCallInstrumentation_StackRefSteal (callable=..., arguments=0x7ffff7fa7078, total_args=total_args@entry=3, kwnames=kwnames@entry=...,
call_instrumentation=false, frame=0x7ffff7fa7020, this_instr=0x7ffff756da5a, tstate=0x555555f41be0 <_PyRuntime+360640>) at Python/ceval.c:770
#28 0x000055555584eb68 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555f41be0 <_PyRuntime+360640>, frame=<optimized out>, frame@entry=0x7ffff7fa7020, throwflag=throwflag@entry=0)
at Python/generated_cases.c.h:1838
#29 0x0000555555843a8b in _PyEval_EvalFrame (tstate=0x555555f41be0 <_PyRuntime+360640>, frame=0x7ffff7fa7020, throwflag=0) at ./Include/internal/pycore_ceval.h:118
#30 _PyEval_Vector (tstate=tstate@entry=0x555555f41be0 <_PyRuntime+360640>, func=func@entry=0x7ffff7466690, locals=locals@entry=0x7ffff746a450, args=args@entry=0x0,
argcount=argcount@entry=0, kwnames=kwnames@entry=0x0) at Python/ceval.c:2133
#31 0x000055555584381e in PyEval_EvalCode (co=co@entry=0x7ffff756d8a0, globals=globals@entry=0x7ffff746a450, locals=locals@entry=0x7ffff746a450) at Python/ceval.c:681
#32 0x0000555555a5981e in run_eval_code_obj (tstate=0x555555f41be0 <_PyRuntime+360640>, co=co@entry=0x7ffff756d8a0, globals=globals@entry=0x7ffff746a450, locals=locals@entry=0x7ffff746a450)
at Python/pythonrun.c:1368
#33 0x0000555555a5936b in run_mod (mod=mod@entry=0x5555560ee258, filename=filename@entry=0x7ffff74aed40, globals=globals@entry=0x7ffff746a450, locals=locals@entry=0x7ffff746a450,
flags=0x7fffffffc930, arena=arena@entry=0x7ffff74dace0, interactive_src=0x0, generate_new_source=0) at Python/pythonrun.c:1471
Found using cpython-review-toolkit with Claude Opus 4.6, using the /cpython-review-toolkit:explore Modules/pyexpat.c all deep command.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Output from running 'python -VV' on the command line:
Python 3.15.0a7+ (heads/main:40095d526bd, Mar 16 2026, 00:42:57) [Clang 21.1.2 (2ubuntu6)]