Skip to content

perf: avoid per-call ctx allocation in next_no_xml_space#58

Open
mathieu17g wants to merge 1 commit into
JuliaComputing:mainfrom
mathieu17g:perf-share-default-ctx
Open

perf: avoid per-call ctx allocation in next_no_xml_space#58
mathieu17g wants to merge 1 commit into
JuliaComputing:mainfrom
mathieu17g:perf-share-default-ctx

Conversation

@mathieu17g

@mathieu17g mathieu17g commented Apr 30, 2026

Copy link
Copy Markdown

Summary

next_no_xml_space is the fast path of Raw's document-order traversal — it's selected by next(::Raw) when the underlying byte buffer contains no xml:space attribute anywhere. In that situation the per-Raw ctx (xml:space inheritance stack) field is always Bool[false] and is never mutated, but the function nevertheless allocates a fresh [false] on every call:

function next_no_xml_space(o::Raw)
    ...
    ctx = [false]                                  # ← fresh allocation per call
    ...
    return Raw(type, depth, i, j - i, data, ctx, has_xml_space)
end

For documents with thousands of XML nodes that allocation dominates the cumulative memory profile of any consumer that walks the tree. The change here just reuses the parent's ctx reference instead:

ctx = o.ctx

Why this is safe

next_no_xml_space is selected only when o.has_xml_space === false, in which case ctx descends from the root's Bool[false] (line 73) and is never mutated — the only function that writes to ctx is next_xml_space, on the disjoint has_xml_space === true branch. Inside next_no_xml_space itself, ctx is read once and passed to the Raw(...) constructor; no push! / pop! / setindex!.

A module-level const _DEFAULT_CTX = Bool[false] would be an equally allocation-free alternative if you'd prefer not to rely on the propagation chain — happy to switch.

Why it matters

Measured on a downstream consumer (FastKML.jl extracting a DataFrame from a 47 MiB sample KML, walking ~1 M Raw nodes during the traversal), this allocation site was contributing ~60 MiB of cumulative tracked allocation. Removing it cuts the consumer's end-to-end memory by about a sixth and lops ~7% off wall-clock time. The same pattern would apply to any package using XML.LazyNode or XML.Raw to walk a large document.

(See also #59 for a complementary additive change adding next! / prev! for in-place traversal — independent and stackable.)

Verification

  • The full XML.jl test suite passes locally on this branch (Julia 1.12), including all 71 xml:space cases — those exercise the next_xml_space path that's deliberately unchanged here.
  • No public API change, no behavioral change observable to existing callers.

Diff size

One real line changed:

-    ctx = [false]
+    ctx = o.ctx

Plus a short comment explaining the rationale, so future readers don't re-allocate it back without understanding the constraint.

`next_no_xml_space` is only entered when the source document has no
`xml:space` attribute anywhere, in which case the per-node `ctx`
(xml:space inheritance stack) is always `Bool[false]` and is never
mutated. Allocating a fresh `[false]` on every call therefore costs
~32 B × N nodes for no semantic benefit. Reuse the parent's `ctx`
instead.

On a 47 MiB representative KML file (5,411 Placemarks, ~1 M XML nodes
traversed during a `DataFrame` extraction in a downstream consumer),
this drops cumulative tracked allocations by ~60 MiB without any
behavior or API change. Wall-clock parsing time is unchanged.

next_xml_space is unaffected: that path mutates ctx via push/pop when
descending into elements with `xml:space`, so each Raw still needs its
own copy.
@joshday

joshday commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

FYI there's a bit of a rewrite in progress:

#54

@mathieu17g

Copy link
Copy Markdown
Author

Indeed, I had not seen it, sorry.

I will wait for its completion to see if the issue I was addressing in this PR and PR #59 are still there.

@mathieu17g

Copy link
Copy Markdown
Author

After benchmarking against v0.4 (#54): the context-sharing fix I proposed here for next_no_xml_space is naturally subsumed by v0.4's tokenizer refactor — there's no separate xml_space buffer pool left to share once tokenization moves to the new Tokenizer + TokenizerState model.

Confirmed in FastKML.jl 3-way benchmarks: v0.4 eager is faster than v0.3.8 + #58 + #59 eager on all 4 reference files (×2–2.6 speedup; 37–69% memory reduction), so this PR's contribution under v0.3.8 no longer applies on the v0.4 path.

I'll close this PR once #54 merges. Keeping it open until then in case there's a v0.3.x maintenance line where this fix should still be merged independently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants