Skip to content

NULL dereference in pyexpat clear_handlers #146079

@devdanzin

Description

@devdanzin

Crash report

What happened?

It's possible to segfault the interpreter by causing an allocation failure in ExternalEntityParserCreate.

Automated diagnosis:

Bug: NULL deref in clear_handlers when handlers is NULL (line 2490).
Impact: ExternalEntityParserCreate error paths call Py_DECREF(new_parser) → dealloc → clear_handlers → dereferences self->handlers[i] where handlers is NULL. Crash on any allocation failure during external entity parser creation.
Fix: Add if (self->handlers == NULL) return; at top of clear_handlers.
Must be run with a non-ASan Python build. Uses RLIMIT_AS to force PyMem_Malloc(buffer_size) to fail in ExternalEntityParserCreate.

MRE:

import xml.parsers.expat
import resource

p = xml.parsers.expat.ParserCreate()
p.buffer_text = True

# Set buffer_size while we still have full memory
p.buffer_size = 2**31 - 1  # INT_MAX (~2GB) — allocates buffer for parent
soft, hard = resource.getrlimit(resource.RLIMIT_AS)

def handler(context, base, system_id, public_id):
    # NOW restrict memory — right before ExternalEntityParserCreate
    # The child parser will try PyMem_Malloc(2GB) and fail
    limit = 512 * 1024 * 1024  # 512MB
    resource.setrlimit(resource.RLIMIT_AS, (limit, hard))

    ext = p.ExternalEntityParserCreate(context)
    ext.Parse("<x/>", True)
    return 1

p.ExternalEntityRefHandler = handler

xml_data = b"""<?xml version="1.0"?>
<!DOCTYPE test [
<!ENTITY ext SYSTEM "external.xml">
]>
<root>&ext;</root>"""

p.Parse(xml_data, True)

Backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff73c5434 in clear_handlers (self=self@entry=0x7ffff72c0750, initial=initial@entry=0) at ./Modules/pyexpat.c:2504
2504                Py_CLEAR(self->handlers[i]);

#0  0x00007ffff73c5434 in clear_handlers (self=self@entry=0x7ffff72c0750, initial=initial@entry=0) at ./Modules/pyexpat.c:2504
#1  0x00007ffff73c96c4 in xmlparse_clear (op=op@entry=0x7ffff72c0750) at ./Modules/pyexpat.c:1555
#2  0x00007ffff73c94fa in xmlparse_dealloc (op=0x7ffff72c0750) at ./Modules/pyexpat.c:1569
#3  0x000055555573282f in _Py_Dealloc (op=op@entry=0x7ffff72c0750) at Objects/object.c:3274
#4  0x00007ffff73ca421 in Py_DECREF (lineno=1107, op=0x7ffff72c0750, filename=<optimized out>) at ./Include/refcount.h:403
#5  pyexpat_xmlparser_ExternalEntityParserCreate_impl (self=0x7ffff74aedd0, cls=0x555556015180, context=0x7ffff74f5d78 "ext", encoding=0x0) at ./Modules/pyexpat.c:1107
#6  pyexpat_xmlparser_ExternalEntityParserCreate (self=0x7ffff74aedd0, cls=0x555556015180, args=0x7fffffffa0a0, nargs=nargs@entry=1, kwnames=0x0) at ./Modules/clinic/pyexpat.c.h:313
#7  0x00005555556bbd2b in method_vectorcall_FASTCALL_KEYWORDS_METHOD (func=func@entry=0x7ffff74f8ad0, args=args@entry=0x7fffffffa098, nargsf=nargsf@entry=9223372036854775810,
    kwnames=kwnames@entry=0x0) at Objects/descrobject.c:381
#8  0x00005555556a7ebe in _PyObject_VectorcallTstate (tstate=0x555555f41be0 <_PyRuntime+360640>, callable=0x7ffff74f8ad0, args=0x7fffffffa098, nargsf=9223372036854775810, kwnames=0x0)
    at ./Include/internal/pycore_call.h:136
#9  0x0000555555844259 in _Py_VectorCallInstrumentation_StackRefSteal (callable=..., arguments=0x7ffff7fa7118, total_args=total_args@entry=2, kwnames=kwnames@entry=...,
    call_instrumentation=false, frame=0x7ffff7fa7090, this_instr=0x7ffff75923ac, tstate=0x555555f41be0 <_PyRuntime+360640>) at Python/ceval.c:770
#10 0x000055555584eb68 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555f41be0 <_PyRuntime+360640>, frame=<optimized out>, frame@entry=0x7ffff7fa7090, throwflag=throwflag@entry=0)
    at Python/generated_cases.c.h:1838
#11 0x0000555555843a8b in _PyEval_EvalFrame (tstate=0x555555f41be0 <_PyRuntime+360640>, frame=0x7ffff7fa7090, throwflag=0) at ./Include/internal/pycore_ceval.h:118
#12 _PyEval_Vector (tstate=0x555555f41be0 <_PyRuntime+360640>, func=0x7ffff74d68d0, locals=0x0, args=0x7ffff74f1c00, argcount=4, kwnames=0x0) at Python/ceval.c:2133
#13 0x00007ffff73c71cf in call_with_frame (lineno=769, func=0x7ffff72c0750, args=0x7ffff74f1be0, self=0x7ffff74aedd0, funcname=<optimized out>) at ./Modules/pyexpat.c:305
#14 my_ExternalEntityRefHandler (parser=0x5555560e56f8, context=0x5555560f0854 "ext", base=0x0, systemId=0x5555560eff88 "external.xml", publicId=0x0) at ./Modules/pyexpat.c:757
#15 0x00007ffff73d88bb in doContent (parser=0x5555560e56f8, startTagLevel=0, enc=0x7ffff73fcb28 <utf8_encoding>,
    s=0x5555560e6244 "&ext;</root>", '\315' <repeats 187 times>, <incomplete sequence \315>..., end=end@entry=0x5555560e6250 '\315' <repeats 199 times>, <incomplete sequence \315>...,
    nextPtr=0x5555560e5728, haveMore=0 '\000', account=XML_ACCOUNT_DIRECT) at ./Modules/expat/xmlparse.c:3458
#16 0x00007ffff73d36f3 in contentProcessor (parser=0x5555560e56f8, start=0x5555560e623e "<root>&ext;</root>", '\315' <repeats 181 times>, <incomplete sequence \315>...,
    end=0x5555560e6250 '\315' <repeats 199 times>, <incomplete sequence \315>..., endPtr=<optimized out>) at ./Modules/expat/xmlparse.c:3179
#17 doProlog (parser=parser@entry=0x5555560e56f8, enc=0x7ffff73fcb28 <utf8_encoding>, s=0x5555560e623e "<root>&ext;</root>", '\315' <repeats 181 times>, <incomplete sequence \315>...,
    s@entry=0x5555560e61f0 "<?xml version=\"1.0\"?>\n<!DOCTYPE test [\n<!ENTITY ext SYSTEM \"external.xml\">\n]>\n<root>&ext;</root>", '\315' <repeats 103 times>, <incomplete sequence \315>..., end=end@entry=0x5555560e6250 '\315' <repeats 199 times>, <incomplete sequence \315>..., tok=<optimized out>,
    next=0x5555560e623e "<root>&ext;</root>", '\315' <repeats 181 times>, <incomplete sequence \315>..., nextPtr=0x5555560e5728, haveMore=0 '\000', allowClosingDoctype=1 '\001',
    account=XML_ACCOUNT_DIRECT) at ./Modules/expat/xmlparse.c:5486
#18 0x00007ffff73d03e2 in prologProcessor (parser=0x5555560e56f8,
    s=0x5555560e61f0 "<?xml version=\"1.0\"?>\n<!DOCTYPE test [\n<!ENTITY ext SYSTEM \"external.xml\">\n]>\n<root>&ext;</root>", '\315' <repeats 103 times>, <incomplete sequence \315>...,
    end=0x5555560e6250 '\315' <repeats 199 times>, <incomplete sequence \315>..., nextPtr=0x5555560e5728) at ./Modules/expat/xmlparse.c:5189
#19 prologInitProcessor (parser=0x5555560e56f8,
    s=0x5555560e61f0 "<?xml version=\"1.0\"?>\n<!DOCTYPE test [\n<!ENTITY ext SYSTEM \"external.xml\">\n]>\n<root>&ext;</root>", '\315' <repeats 103 times>, <incomplete sequence \315>...,
    end=0x5555560e6250 '\315' <repeats 199 times>, <incomplete sequence \315>..., nextPtr=0x5555560e5728) at ./Modules/expat/xmlparse.c:4991
#20 0x00007ffff73cf843 in callProcessor (parser=0x5555560e56f8,
    start=start@entry=0x5555560e61f0 "<?xml version=\"1.0\"?>\n<!DOCTYPE test [\n<!ENTITY ext SYSTEM \"external.xml\">\n]>\n<root>&ext;</root>", '\315' <repeats 103 times>, <incomplete sequence \315>..., end=0x5555560e6250 '\315' <repeats 199 times>, <incomplete sequence \315>..., endPtr=0x5555560e5728) at ./Modules/expat/xmlparse.c:1293
#21 0x00007ffff73cf6b0 in PyExpat_XML_ParseBuffer (parser=parser@entry=0x5555560e56f8, len=<optimized out>, isFinal=<optimized out>) at ./Modules/expat/xmlparse.c:2494
#22 0x00007ffff73cf1ab in PyExpat_XML_Parse (parser=<optimized out>,
    s=s@entry=0x7ffff74a89c0 "<?xml version=\"1.0\"?>\n<!DOCTYPE test [\n<!ENTITY ext SYSTEM \"external.xml\">\n]>\n<root>&ext;</root>", len=<optimized out>, isFinal=isFinal@entry=1)
    at ./Modules/expat/xmlparse.c:2448
#23 0x00007ffff73c98c4 in pyexpat_xmlparser_Parse_impl (self=0x7ffff74aedd0, cls=0x555556015180, data=0x7ffff74a89a0, isfinal=1) at ./Modules/pyexpat.c:892
#24 pyexpat_xmlparser_Parse (self=0x7ffff74aedd0, cls=0x555556015180, args=0x7fffffffb620, nargs=nargs@entry=2, kwnames=0x0) at ./Modules/clinic/pyexpat.c.h:109
#25 0x00005555556bbd2b in method_vectorcall_FASTCALL_KEYWORDS_METHOD (func=func@entry=0x7ffff74f8890, args=args@entry=0x7fffffffb618, nargsf=nargsf@entry=9223372036854775811,
    kwnames=kwnames@entry=0x0) at Objects/descrobject.c:381
#26 0x00005555556a7ebe in _PyObject_VectorcallTstate (tstate=0x555555f41be0 <_PyRuntime+360640>, callable=0x7ffff74f8890, args=0x7fffffffb618, nargsf=9223372036854775811, kwnames=0x0)
    at ./Include/internal/pycore_call.h:136
#27 0x0000555555844259 in _Py_VectorCallInstrumentation_StackRefSteal (callable=..., arguments=0x7ffff7fa7078, total_args=total_args@entry=3, kwnames=kwnames@entry=...,
    call_instrumentation=false, frame=0x7ffff7fa7020, this_instr=0x7ffff756da5a, tstate=0x555555f41be0 <_PyRuntime+360640>) at Python/ceval.c:770
#28 0x000055555584eb68 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555f41be0 <_PyRuntime+360640>, frame=<optimized out>, frame@entry=0x7ffff7fa7020, throwflag=throwflag@entry=0)
    at Python/generated_cases.c.h:1838
#29 0x0000555555843a8b in _PyEval_EvalFrame (tstate=0x555555f41be0 <_PyRuntime+360640>, frame=0x7ffff7fa7020, throwflag=0) at ./Include/internal/pycore_ceval.h:118
#30 _PyEval_Vector (tstate=tstate@entry=0x555555f41be0 <_PyRuntime+360640>, func=func@entry=0x7ffff7466690, locals=locals@entry=0x7ffff746a450, args=args@entry=0x0,
    argcount=argcount@entry=0, kwnames=kwnames@entry=0x0) at Python/ceval.c:2133
#31 0x000055555584381e in PyEval_EvalCode (co=co@entry=0x7ffff756d8a0, globals=globals@entry=0x7ffff746a450, locals=locals@entry=0x7ffff746a450) at Python/ceval.c:681
#32 0x0000555555a5981e in run_eval_code_obj (tstate=0x555555f41be0 <_PyRuntime+360640>, co=co@entry=0x7ffff756d8a0, globals=globals@entry=0x7ffff746a450, locals=locals@entry=0x7ffff746a450)
    at Python/pythonrun.c:1368
#33 0x0000555555a5936b in run_mod (mod=mod@entry=0x5555560ee258, filename=filename@entry=0x7ffff74aed40, globals=globals@entry=0x7ffff746a450, locals=locals@entry=0x7ffff746a450,
    flags=0x7fffffffc930, arena=arena@entry=0x7ffff74dace0, interactive_src=0x0, generate_new_source=0) at Python/pythonrun.c:1471

Found using cpython-review-toolkit with Claude Opus 4.6, using the /cpython-review-toolkit:explore Modules/pyexpat.c all deep command.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.15.0a7+ (heads/main:40095d526bd, Mar 16 2026, 00:42:57) [Clang 21.1.2 (2ubuntu6)]

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirtopic-XMLtype-crashA hard crash of the interpreter, possibly with a core dump

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions