Project-specific instructions. Inherits all rules from ~/.claude/CLAUDE.md (global).
- What it is: PyO3 Python bindings for the edwardkim/rhwp Rust HWP/HWPX parser & renderer
- Names: PyPI
rhwp-python/import rhwp/ extensionrhwp._rhwp - Core delivery: Rust core consumed via git submodule at
external/rhwp, pinned to a specific upstream commit (tracked inCHANGELOG.md+.gitmodules) - License: MIT — dual copyright (Edward Kim for rhwp core, DanMeon for bindings). Both LICENSE files are bundled in the wheel (
license-files = ["LICENSE", "external/rhwp/LICENSE"]) - Status: unofficial community package. The
rhwpname on PyPI is intentionally left for the upstream maintainer
- After any Rust change (
src/*.rs):uv run maturin develop --releasebeforepytest(without it, tests run against the stale binary), andcargo clippy --all-targets -- -D warningsfor lint external/rhwp/is upstream-owned. Never edit it locally — file an issue / PR against edwardkim/rhwp instead- PyO3
#[pyclass(unsendable)]:_Documentis bound to its creation thread (upstreamDocumentCoreholdsRefCellfields —!Sync). Same-thread worker pattern (parse + consume + return primitivesinside one thread) works;asyncio.to_thread(rhwp.parse, path)does NOT — the Future resolves on the main thread and first attribute access panics with_rhwp::document::PyDocument is unsendable, but sent to another thread - GIL release via
py.detach— apply selectively, not blanket:- Release for ≥1 ms CPU/IO-bound work that touches only Rust-side data (parse, render, decode, compress, file read). Current sites:
_Document::from_bytes/render_pdf()/export_pdf(). When adding new methods of this shape, follow the same pattern - Don't release for trivial getters, short attribute access, or hot paths that frequently call back into Python — the
detach/attachround-trip cost exceeds the gain, and may slow things down - When unsure, measure with the
benches/bench_gil.pypattern (with vs withoutpy.detachwall-clock comparison) before committing
- Release for ≥1 ms CPU/IO-bound work that touches only Rust-side data (parse, render, decode, compress, file read). Current sites:
abi3-py310feature: one wheel covers 3.10–3.13+. Don't bind to Python version-specific C API
- Python-surface APIs for I/O and integrations are async-first: when adding LangChain (or future RAG framework) loaders, implement
aload/alazy_load/ async counterparts alongside sync versions - Forbidden pattern:
asyncio.to_thread(rhwp.parse, path)—_Documentis unsendable (see Rust+Python hybrid build note above), the returned Document panics on main-thread access.async fnin#[pymethods]is also incompatible (PyO3 requiresSend + 'staticfutures) - Supported async pattern:
aparse(path)uses stdlibasyncio.to_threadto offload the file read to a thread pool, then callsDocument.from_bytes(data)on the event-loop thread. Document never crosses a thread boundary. No external dependency — Pythonasynciolacks native async file I/O so all async file libs (aiofiles etc.) wrap thread pools anyway; stdlib achieves the same effect with zero install footprint - Document instance-level async methods (
doc.ato_ir()etc.) are NOT provided — they would require thread offload which unsendable forbids. For async code,await rhwp.aparse(path)once, then call sync methods on the Document directly (these are fast, in-memory, GIL-holding operations) - If upstream rhwp ever replaces its
RefCellcaches with thread-safe synchronization, revisit this —unsendablecould then be dropped, enabling trueasync fn pymethods
- Real HWP fixtures live in the submodule:
external/rhwp/samples/aift.hwp(HWP5),table-vpos-01.hwpx(HWPX).tests/conftest.py+benches/bench_gil.pyreference this path - When changing one path, change both
- Markers:
slow(PDF render),langchain(extras required). Default run:pytest -m "not slow" - Extras-gated test files use module-level
pytest.importorskipso the whole file counts as 1 skip when the extra is missing. Current gated files:test_langchain_loader.py+test_langchain_loader_ir.py(langchain-core),test_ir_schema_export.py(jsonschema),test_cli.py(typer),test_mcp_server.py(fastmcp),test_render_png.py(Pillow) → CI'stest-without-extrasjob validates exactly 6 skipped (see.github/workflows/ci.yml). When adding a new extras-gated file, bump the count in both AGENTS.md and ci.yml tests/type_check_errors.pyholds exactly 4 intentional pyright errors — CI validates that too. When editing, preserve count; don't fix them
- Single-branch trunk model: feature branches off
main→ PR tomain. Nodevelop/staging - Branch naming: MINOR =
feature/vX.Y.0(long-lived, isolates external contract changes across stages). PATCH =<type>/<topic>(short-lived, merges directly to main, tag onlyvX.Y.Z) where<type>follows Conventional Commits (fix/chore/refactor/docs/build/ci/perf/test/revert) - Commit subject: lowercase
type: description(seed commit:init: 프로젝트 초기화) - PR body follows .github/pull_request_template.md — Summary / Why / Related Issues
- Full contributor flow (fork, pre-submit checks, rhwp-core changes): CONTRIBUTING_EN.md (Korean: CONTRIBUTING.md)
- Git tags
vX.Y.Z, SemVer, MINOR-sized increments - Cargo.toml is the version source of truth via
dynamic = ["version"]in pyproject.toml. Always bump Cargo.toml before tagging —publish.yml'sverify-versionaborts on mismatch - No breaking changes across Phase boundaries (Phase 1 → 2 must keep existing APIs)
- Release trigger: GitHub Release
publishedevent firespublish.yml. Draft releases don't trigger - Every release records the
external/rhwpsubmodule commit hash in CHANGELOG. The git submodule itself (visible viagit ls-tree <tag> external/rhwp) is the authoritative pin per release - Integration-only runtime deps (LangChain, typer, jsonschema) belong in
[project.optional-dependencies], never[project] dependencies— keeps the core wheel dependency-free
Authoritative policy is docs/CONVENTIONS.md — read it before any docs work. Active spec index SSOT is docs/roadmap/README.md.
Hard rules (auto-applied without further instruction):
- Every per-version spec / ADR / impl-log / verification report carries a YAML frontmatter block as the first lines:
status: <Active | Draft | Frozen | Superseded>+ga: vX.Y.Zortarget: vX.Y.Z+last_updated: YYYY-MM-DD. Living docs (README, CHANGELOG, AGENTS.md, CLAUDE.md, CONVENTIONS itself) skip the frontmatter. - Frozen spec body is immutable — typo / broken-link fixes only. Decision changes go to a new spec; the old one's frontmatter flips to
status: Superseded,superseded_by: <new spec>(single-block edit). Exception: Living-policy schema migration (see CONVENTIONS § Frozen 면제 조항). - Spec ↔ spec direct cross-links are forbidden even within the same
vX.Y.Z/directory. Useroadmap/README.mdas the bridge. Exception: pair files<topic>.md↔<topic>-research.md(the spec ↔ ADR pair) link directly. - New version
vX.Y.Z: invoke/new-spec <version> <topic>Claude Code skill (auto-scaffolds spec + paired ADR + README index row). Manual: createdocs/roadmap/vX.Y.Z/<topic>.md+docs/design/vX.Y.Z/<topic>-research.md(frontmatterstatus: Draft,target: vX.Y.Z), then add a row to the active-spec index inroadmap/README.md. On GA: flipstatus: Draft → Frozen, swaptarget→ga, writeimplementation/vX.Y.Z/...(Frozen on creation), refresh README index.
- No secrets required. PyPI publish uses Trusted Publisher (OIDC) — no API token to manage
secrets.GITHUB_TOKENis injected automatically; don't try to "register" it- Workflow permissions stay minimal.
publish.ymldeclaresid-token: writeat the job level only