lxml juno patches#1
Open
igormarkoff wants to merge 9 commits intolxml-6.0.2+junofrom
Open
Conversation
Bump ``__version__`` to ``6.0.2+juno`` in ``src/lxml/__init__.py``.
PEP 440 local version segment marks the fork-patched build so that
``importlib.metadata.metadata("lxml")["version"]`` and
``lxml.__version__`` agree on the suffix.
Replace the prior fork's custom ``tools.setup_common``-based build loop in ``setup.py`` with the standard upstream ``setup(...)`` call plus an ``IOSBuildExt(_build_ext)`` cmdclass gated on ``IOS_BUILD_PLATFORM``. Without the env var set, the cmdclass is a transparent passthrough to upstream setuptools — host ``pip install`` and CI smoke builds keep working unchanged. With it, the extension build picks up iOS-specific compile/link flags (``-arch arm64 -isysroot <iOS-SDK> -m<plat>-version-min=16.0 -Wl,-undefined,dynamic_lookup``) and an optional ``IOS_PYTHON_INCLUDE`` for prepending the iOS Python framework's ``Headers/`` to the include path. Companion changes: - ``pyproject.toml``: declare ``build-backend = "setuptools.build_meta"`` so PEP 517 frontends (``pip wheel``, ``python -m build``) drive the build correctly. - ``versioninfo.py``: anchor ``get_base_dir()`` on ``__file__`` rather than ``sys.argv[0]``. Under PEP 517 backends ``sys.argv[0]`` is the pyproject_hooks subprocess script, not the project's ``setup.py``, so the legacy heuristic resolves to the wrong directory and breaks callers like ``version()``-via-open(``src/lxml/__init__.py``).
``src/lxml/etree.pyx`` was calling ``xmlparser.xmlCleanupParser()`` immediately before ``xmlparser.xmlInitParser()`` at module-import time. libxml2's ``xmlCleanupParser`` is destructive of *process-global* state — catalogs, encoding handlers, registered IO callbacks, schema type tables, error handlers — and the docs reserve it for controlled process / interpreter teardown when no libxml2 objects can still be alive. Calling it on every lxml import means that, in any embedding that runs multiple Python interpreters in the same process, a second interpreter's lxml import wipes the first interpreter's still-live registrations. The first interpreter's parsers / XInclude / DTD operations then drift out of a half-initialised state on subsequent calls. Drop the call. ``xmlInitParser`` is internally idempotent so a fresh import is safe without any preceding cleanup.
``src/lxml/xslt.pxi`` was calling ``xslt.xsltUninit()`` immediately before ``xslt.xsltSetLoaderFunc(NULL)`` at module-import time. ``xsltUninit`` only flips libxslt's initialisation-once flag — it does not clear extension/element/style registries. Used at module import without a follow-up ``xsltCleanupGlobals()``, the next sub-interpreter's lxml import re-runs ``xsltInit()``'s built-in registrations on top of the already-populated tables. ``xsltSetLoaderFunc(NULL)`` alone is the right scoped operation for the loader reset; full table cleanup belongs in a controlled teardown path (``xsltCleanupGlobals``), not at import. Drops the call and the matching ``cdef void xsltUninit() nogil`` declaration from ``src/lxml/includes/xslt.pxd`` (the only caller goes away).
When ``lxml.__version__`` carries a PEP 440 local version segment (e.g. ``6.0.2+juno`` for an embedding's fork-patched build), upstream's split-by-dot unpacker produces ``(6, 0, '2+juno', 0)`` — a tuple whose third element is a string, breaking ``isinstance(LXML_VERSION[2], int)`` checks (and the ``test_etree.ETreeOnlyTestCase.test_version`` self-test). Strip the ``+xyz`` segment before splitting so ``LXML_VERSION`` stays ``(int, int, int, int)`` regardless of whether a local- version suffix is present.
``parser.pxi``'s ``super(_ParseError, self).__init__(...)`` and ``etree.pyx``'s ``super(_Error, self).__init__(...)`` cached the class object in a process-static ``cdef object _ParseError = ParseError`` / ``cdef object _Error = Error``. Cython emits these cdef-level objects as file-scope ``static PyObject *`` outside ``__pyx_mstate_global``, so a second concurrent interpreter's import overwrites the first's. After the overwrite, the first interpreter raising e.g. ``XMLSyntaxError`` (which inherits from ``ParseError``) trips ``TypeError: super(type, obj): obj (instance of XMLSyntaxError) is not an instance or subtype of type (ParseError).`` Replace with name-lookup forms that resolve via the importing module's per-interpreter ``__dict__``: - ``parser.pxi`` (``ParseError`` is a plain Python class) → ``super().__init__(message)``. Cython emits the ``__class__`` cell that the no-arg form needs. - ``etree.pyx`` (``LxmlError`` is a ``cdef class`` — no ``__class__`` cell available) → ``super(Error, self).__init__(message)``. Crucially, both forms preserve the cooperative super chain that ultimately reaches ``SyntaxError.__init__`` and populates ``self.msg`` for SyntaxError-derived subclasses; bypassing the chain (e.g. by calling ``Error.__init__`` directly) leaves ``self.msg`` unset and ``str(exception)`` shows ``"None …"``. Drop the now-orphan ``cdef object _ParseError = ParseError`` and ``cdef object _Error = Error`` definitions.
Three independent fixes that all surface when the test suite runs
against the installed wheel rather than the in-repo source tree
(``pip install lxml && pytest <site-packages>/lxml/tests`` style),
which is the deploy shape any embedding will see.
1. ``tests/common_imports.py``: ``DOC_DIR`` source change + a
``make_doctest`` graceful skip. The legacy ``DOC_DIR`` walked
four ``dirname`` levels up from ``__file__`` and resolved to a
non-existent path under wheel installs (lxml's ``doc/`` only
ships in the source tree). Allow callers to override via
``LXML_DOC_DIR`` (or ``SITE_PACKAGES_DIR``) for deployers that
ship the doc tree, fall back to the legacy walk otherwise, and
make ``make_doctest`` return an empty TestSuite when the file
isn't on disk — instead of letting ``DocFileSuite`` raise
``FileNotFoundError`` at collection time and torch the
surrounding ``test_suite`` (~12 such cascades observed before).
2. ``html/tests/test_feedparser_data.py``: add ``__test__ = False``
to ``FeedTestCase``. The class's ``__init__`` requires a
``filename`` arg; pytest's auto-discovery instantiates it as
``FeedTestCase('runTest')``, assigning the method name to
``self.filename``, and downstream ``open('runTest')`` then
raises ``FileNotFoundError``. The surrounding ``test_suite()``
constructs instances with proper file paths.
3. ``tests/test_etree.py``: route the
``test_python3_problem_filebased_*`` tests through
``tempfile.NamedTemporaryFile`` instead of
``open('test.xml', 'w+b')``. ``tests/test.xml`` is the bundled
fixture used by ``test_parse_file``, ``test_xinclude``,
``test_dtd_*`` and ~15 other tests; on platforms where the
resource bundle is writable (notably the iOS Simulator),
overwriting it on cycle 1 corrupted every subsequent test that
read it (``b'<a><b></b></a>' != b'<some_ns_id:some_head_elem
...>'``-style mismatches).
Python 3.13 added a guard in ``code.__set__`` that rejects ``func.__code__`` assignment when the new code object's ``co_freevars`` length differs from the function's existing closure cells. ``_RestoreChecker.install_clone()`` swaps a code object that may not satisfy this guard, raising ``ValueError: <name>() requires a code object with N free vars, not M`` and torching the collection of every doctest that opted in via ``temp_install`` (typically the ``html/tests/test_*.txt`` files). Wrap the swap in ``try / except ValueError`` and fall back to no swap when the guard rejects it. The override-via- ``_temp_call_super_check_output`` mechanism stays in place, so the default-strict comparison runs for doctests that wanted lxml's HTML-aware comparison — a soft regression versus the swap success path, but better than crashing every dependent doctest. Some HTML-aware doctests may pass under strict comparison anyway when the expected output happens to match exactly; the rest are clear follow-ups for a proper rewrite of ``temp_install`` (subclass + bound-method shadow rather than ``__code__`` replacement).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Juno patches for the xml library