From 9de2962d534f4501b2d62fb317b77e828bd08e14 Mon Sep 17 00:00:00 2001 From: Emma Smith Date: Tue, 6 Jan 2026 16:45:49 -0800 Subject: [PATCH 1/3] PEP 819: Add information about duplicate keys and integer parsing Update the PEP to clarify that integers and floats should be serialized to strings, and specify that when there are duplicate keys the second key wins in a JSON object. --- peps/pep-0819.rst | 60 ++++++++++++++++++++++++++++++++--------------- 1 file changed, 41 insertions(+), 19 deletions(-) diff --git a/peps/pep-0819.rst b/peps/pep-0819.rst index 339d8cecbdb..d259c1459bb 100644 --- a/peps/pep-0819.rst +++ b/peps/pep-0819.rst @@ -235,6 +235,40 @@ JSON schema for wheel metadata has been produced. This schema will be updated with each revision to the wheel metadata specification. The schema is available in :ref:`0819-wheel-json-schema`. +Handling of Integer and Float Values in JSON Package Metadata +------------------------------------------------------------- + +While no core metadata or wheel metadata values are currently encoded as +integers or floats, when decoding a JSON file, integer and float values should +be decoded as strings for both core metadata and wheel metadata. This is to +avoid compatibility issues due to differences in precision and representation +of integers and floats between languages and parsers. This also mitigates a +security risk with integer parsing denial of service attacks based on +`CVE-2020-10735 `__. + +If a future field of core metadata or wheel metadata needs to be encoded as an +integer or float, the field MUST be decoded lazily after loading the JSON +document. This minimizes the risks of denial of service attacks by minimizing +the integer parsing allowed during the deserialization process. + +If using the Python :mod:`!json` module, parsing integers and floats as strings +can be accomplished by setting the ``parse_int`` and ``parse_float`` +keyword arguments to :func:`json.load` or :func:`json.loads` to :class:`str`. + +Handling of Duplicate Keys in JSON Package Metadata +--------------------------------------------------- + +JSON does not define semantics for duplicate keys in a JSON document. However, +different parsers treat duplicate keys differently. Tools SHOULD NOT generate +duplicate keys in JSON package metadata. However, it is likely duplicate keys +may be generated anyway, so tools consuming JSON package metadata should handle +duplicate keys gracefully. In the interest of compatibility and matching the +behavior of the Python :mod:`!json` module, if duplicate keys are encountered, +the second duplicate key should be used as the data for that key. This matches +the behavior of many JSON parsers such as those in Python, Rust, Go, and the +ECMAScript Standard. Tools MAY warn about duplicate keys in JSON package +metadata. + Deprecation of the ``METADATA``, ``PKG-INFO``, and ``WHEEL`` Files ------------------------------------------------------------------ @@ -272,25 +306,13 @@ or ``WHEEL`` files. Security Implications ===================== -One attack vector with JSON encoded core metadata is if the JSON payload is -designed to consume excessive memory or CPU resources in a denial of service -(DoS) attack. While this attack is not likely to affect users whom can cancel -resource-intensive interactive operations, it may be an issue for package -indexes. - -There are several mitigations that can be made to prevent this: - -#. The length of the JSON payload can be restricted to a reasonable size. -#. The reader may use a :class:`~json.JSONDecoder` to omit parsing :class:`int` - and :class:`float` values to avoid quadratic number parsing time complexity - attacks. -#. I plan to contribute a change to :class:`~json.JSONDecoder` in Python - 3.15+ that will allow it to be configured to restrict the nesting of JSON - payloads to a reasonable depth. Core metadata currently has a maximum depth - of 2 to encode mapping and list fields. - -With these mitigations in place, concerns about denial of service attacks with -JSON encoded core metadata are minimal. +JSON encoded core metadata and wheel metadata have the potential for a denial +of service attack due to the quadratic parsing time complexity of parsing of +integers. This PEP mitigates this risk by requiring that integers and floats be +parsed as strings, and only lazily parsed into integers or floats after the +initial deserialization of the JSON document. With these mitigations in place, +concerns about denial of service attacks with JSON encoded package metadata are +considered minimal. Reference Implementation From f0f830b394ef51cbbf1b8e7a10e0c063c2f3c73c Mon Sep 17 00:00:00 2001 From: Emma Smith Date: Tue, 6 Jan 2026 17:01:46 -0800 Subject: [PATCH 2/3] Also make the JSON files required --- peps/pep-0819.rst | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/peps/pep-0819.rst b/peps/pep-0819.rst index d259c1459bb..b93370f8f98 100644 --- a/peps/pep-0819.rst +++ b/peps/pep-0819.rst @@ -118,8 +118,8 @@ Specification JSON Format Core Metadata File ------------------------------ -A new optional but recommended file ``METADATA.json`` shall be introduced as a -metadata file for Python distribution packages. If generated, the ``METADATA.json`` file +A new required file ``METADATA.json`` shall be introduced as a +metadata file for Python distribution packages. The ``METADATA.json`` file MUST be placed in the same directory as the current email formatted ``METADATA`` or ``PKG-INFO`` file. @@ -200,8 +200,8 @@ encoded core metadata file MUST be served at JSON Format Wheel Metadata File ------------------------------- -A new optional but recommended file ``WHEEL.json`` shall be introduced as a -JSON encoded version of the ``WHEEL`` file. If generated, the ``WHEEL.json`` +A new required file ``WHEEL.json`` shall be introduced as a +JSON encoded version of the ``WHEEL`` file. The ``WHEEL.json`` file MUST be placed in the same directory as the current key-value formatted ``WHEEL`` file, i.e. the ``.dist-info`` directory. The semantic contents of the ``WHEEL`` and ``WHEEL.json`` files MUST be equivalent. The wheel file @@ -348,6 +348,15 @@ format, JSON has been chosen for a few reasons: #. JSON is fast to parse and emit. #. JSON schemas are JSON native and commonly used. +Make the JSON Package Metadata Files Optional +--------------------------------------------- + +A future major revision of the wheel format specification may make the +``METADATA.json`` and ``WHEEL.json`` files the default. Therefore, tools should +begin generating and consuming JSON package metadata files to ensure tools are +prepared for the future transition to the JSON package metadata files being +the default. + Open Issues =========== From 07fa915af32a24db48ca60b2e560feb29c1bfb23 Mon Sep 17 00:00:00 2001 From: Emma Smith Date: Wed, 7 Jan 2026 11:44:08 -0800 Subject: [PATCH 3/3] Discuss int values in security section --- peps/pep-0819.rst | 41 ++++++++++++++--------------------------- 1 file changed, 14 insertions(+), 27 deletions(-) diff --git a/peps/pep-0819.rst b/peps/pep-0819.rst index b93370f8f98..c3f7be1a1a0 100644 --- a/peps/pep-0819.rst +++ b/peps/pep-0819.rst @@ -235,26 +235,6 @@ JSON schema for wheel metadata has been produced. This schema will be updated with each revision to the wheel metadata specification. The schema is available in :ref:`0819-wheel-json-schema`. -Handling of Integer and Float Values in JSON Package Metadata -------------------------------------------------------------- - -While no core metadata or wheel metadata values are currently encoded as -integers or floats, when decoding a JSON file, integer and float values should -be decoded as strings for both core metadata and wheel metadata. This is to -avoid compatibility issues due to differences in precision and representation -of integers and floats between languages and parsers. This also mitigates a -security risk with integer parsing denial of service attacks based on -`CVE-2020-10735 `__. - -If a future field of core metadata or wheel metadata needs to be encoded as an -integer or float, the field MUST be decoded lazily after loading the JSON -document. This minimizes the risks of denial of service attacks by minimizing -the integer parsing allowed during the deserialization process. - -If using the Python :mod:`!json` module, parsing integers and floats as strings -can be accomplished by setting the ``parse_int`` and ``parse_float`` -keyword arguments to :func:`json.load` or :func:`json.loads` to :class:`str`. - Handling of Duplicate Keys in JSON Package Metadata --------------------------------------------------- @@ -306,13 +286,20 @@ or ``WHEEL`` files. Security Implications ===================== -JSON encoded core metadata and wheel metadata have the potential for a denial -of service attack due to the quadratic parsing time complexity of parsing of -integers. This PEP mitigates this risk by requiring that integers and floats be -parsed as strings, and only lazily parsed into integers or floats after the -initial deserialization of the JSON document. With these mitigations in place, -concerns about denial of service attacks with JSON encoded package metadata are -considered minimal. +Maliciously crafted JSON encoded metadata files have the potential to cause a +denial of service attack due to the quadratic parsing time complexity of +reading integer strings as reported in +`CVE-2020-10735 `__. No +package metadata fields are currently encoded as integers, so this risk can be +mitigated by decoding integer values as strings when parsing JSON package +metadata. + +If using the Python :mod:`!json` module, parsing integers as strings +can be accomplished by setting the ``parse_int`` keyword argument to +:func:`json.load` or :func:`json.loads` to :class:`str`. + +With this mitigation in place, concerns about denial of service attacks with +JSON encoded package metadata are considered minimal. Reference Implementation