Skip to content

fix(data): rewrite bundled Skeleton.hwpx id to fit in signed int32#37

Merged
airmang merged 1 commit into
airmang:mainfrom
seonghoony:fix/skeleton-id-range
Apr 27, 2026
Merged

fix(data): rewrite bundled Skeleton.hwpx id to fit in signed int32#37
airmang merged 1 commit into
airmang:mainfrom
seonghoony:fix/skeleton-id-range

Conversation

@seonghoony
Copy link
Copy Markdown

Closes #35.

Root cause

src/hwpx/data/Skeleton.hwpx is the seed used by HwpxDocument.new(). Its Contents/section0.xml shipped with

<hp:p id="3121190098" ...>

3121190098 = 0xBA0994D2, with bit 31 set. Every newly-created document inherits this out-of-range id at its first paragraph, and the _allocate_*_id helpers (max + 1) propagate the high value into subsequent ids whenever the skeleton paragraph is the current maximum.

Fix

Replace the lone out-of-range id with 0. The other ids inside the archive (Contents/header.xml, settings.xml, etc.) were already in range, so a single replacement is enough.

-<hp:p id="3121190098" paraPrIDRef="0" styleIDRef="0" pageBreak="0" ...
+<hp:p id="0"          paraPrIDRef="0" styleIDRef="0" pageBreak="0" ...

The repacked archive keeps the original file order and stores mimetype first / uncompressed, matching the OPC convention used by the existing template.

Regression test

tests/test_skeleton_template_ids.py (new):

from importlib.resources import files
import re, zipfile


def test_skeleton_template_ids_fit_in_signed_int32() -> None:
    skeleton = files("hwpx.data") / "Skeleton.hwpx"
    with zipfile.ZipFile(skeleton.open("rb")) as zf:
        for name in zf.namelist():
            if not name.endswith(".xml"):
                continue
            data = zf.read(name).decode("utf-8")
            for m in re.finditer(r'\sid="(-?\d+)"', data):
                value = int(m.group(1))
                assert 0 <= value < 2**31, (
                    f"{name} contains id={value} (0x{value:x}); "
                    "Skeleton.hwpx values must stay in [0, 2^31)"
                )

Without the fix, this test fails on the seed file:

AssertionError: Contents/section0.xml contains id=3121190098 (0xba0994d2);
Skeleton.hwpx values must stay in [0, 2^31)

With the fix:

$ pytest tests/test_skeleton_template_ids.py
tests/test_skeleton_template_ids.py .                                    [100%]
============================== 1 passed in 0.04s ===============================

Wider regression check

$ pytest tests/ -k 'new or skeleton or template or empty'
82 passed, 168 deselected in 0.72s

HwpxDocument.new() and the existing template/regression tests all continue to pass.

Notes

The bundled skeleton template that seeds HwpxDocument.new() shipped with
<hp:p id="3121190098"> in Contents/section0.xml. That value is >= 2^31
and the same out-of-range pattern that downstream consumers misinterpret
as a negative signed int. Replacing it with id="0" keeps the template
within [0, 2^31) without changing any other content; the rest of the
archive (header.xml, settings.xml, etc.) was already in range.

Adds a regression test that scans every <name>.xml inside Skeleton.hwpx
for id attributes and asserts they all fit in signed int32.
@airmang airmang merged commit 84d8ae6 into airmang:main Apr 27, 2026
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Skeleton.hwpx ships with <hp:p id="3121190098"> (out of signed int32 range)

2 participants