perf: avoid per-call ctx allocation in next_no_xml_space#58
Conversation
`next_no_xml_space` is only entered when the source document has no `xml:space` attribute anywhere, in which case the per-node `ctx` (xml:space inheritance stack) is always `Bool[false]` and is never mutated. Allocating a fresh `[false]` on every call therefore costs ~32 B × N nodes for no semantic benefit. Reuse the parent's `ctx` instead. On a 47 MiB representative KML file (5,411 Placemarks, ~1 M XML nodes traversed during a `DataFrame` extraction in a downstream consumer), this drops cumulative tracked allocations by ~60 MiB without any behavior or API change. Wall-clock parsing time is unchanged. next_xml_space is unaffected: that path mutates ctx via push/pop when descending into elements with `xml:space`, so each Raw still needs its own copy.
|
FYI there's a bit of a rewrite in progress: |
|
Indeed, I had not seen it, sorry. I will wait for its completion to see if the issue I was addressing in this PR and PR #59 are still there. |
|
After benchmarking against v0.4 (#54): the context-sharing fix I proposed here for Confirmed in FastKML.jl 3-way benchmarks: v0.4 eager is faster than I'll close this PR once #54 merges. Keeping it open until then in case there's a v0.3.x maintenance line where this fix should still be merged independently. |
Summary
next_no_xml_spaceis the fast path ofRaw's document-order traversal — it's selected bynext(::Raw)when the underlying byte buffer contains noxml:spaceattribute anywhere. In that situation the per-Rawctx(xml:space inheritance stack) field is alwaysBool[false]and is never mutated, but the function nevertheless allocates a fresh[false]on every call:For documents with thousands of XML nodes that allocation dominates the cumulative memory profile of any consumer that walks the tree. The change here just reuses the parent's
ctxreference instead:Why this is safe
next_no_xml_spaceis selected only wheno.has_xml_space === false, in which casectxdescends from the root'sBool[false](line 73) and is never mutated — the only function that writes toctxisnext_xml_space, on the disjointhas_xml_space === truebranch. Insidenext_no_xml_spaceitself,ctxis read once and passed to theRaw(...)constructor; nopush!/pop!/setindex!.A module-level
const _DEFAULT_CTX = Bool[false]would be an equally allocation-free alternative if you'd prefer not to rely on the propagation chain — happy to switch.Why it matters
Measured on a downstream consumer (FastKML.jl extracting a
DataFramefrom a 47 MiB sample KML, walking ~1 MRawnodes during the traversal), this allocation site was contributing ~60 MiB of cumulative tracked allocation. Removing it cuts the consumer's end-to-end memory by about a sixth and lops ~7% off wall-clock time. The same pattern would apply to any package usingXML.LazyNodeorXML.Rawto walk a large document.(See also #59 for a complementary additive change adding
next!/prev!for in-place traversal — independent and stackable.)Verification
xml:spacecases — those exercise thenext_xml_spacepath that's deliberately unchanged here.Diff size
One real line changed:
Plus a short comment explaining the rationale, so future readers don't re-allocate it back without understanding the constraint.