perf: avoid per-call ctx allocation in next_no_xml_space by mathieu17g · Pull Request #58 · JuliaComputing/XML.jl

mathieu17g · 2026-04-30T09:47:33Z

Summary

next_no_xml_space is the fast path of Raw's document-order traversal — it's selected by next(::Raw) when the underlying byte buffer contains no xml:space attribute anywhere. In that situation the per-Raw ctx (xml:space inheritance stack) field is always Bool[false] and is never mutated, but the function nevertheless allocates a fresh [false] on every call:

function next_no_xml_space(o::Raw)
    ...
    ctx = [false]                                  # ← fresh allocation per call
    ...
    return Raw(type, depth, i, j - i, data, ctx, has_xml_space)
end

For documents with thousands of XML nodes that allocation dominates the cumulative memory profile of any consumer that walks the tree. The change here just reuses the parent's ctx reference instead:

ctx = o.ctx

Why this is safe

next_no_xml_space is selected only when o.has_xml_space === false, in which case ctx descends from the root's Bool[false] (line 73) and is never mutated — the only function that writes to ctx is next_xml_space, on the disjoint has_xml_space === true branch. Inside next_no_xml_space itself, ctx is read once and passed to the Raw(...) constructor; no push! / pop! / setindex!.

A module-level const _DEFAULT_CTX = Bool[false] would be an equally allocation-free alternative if you'd prefer not to rely on the propagation chain — happy to switch.

Why it matters

Measured on a downstream consumer (FastKML.jl extracting a DataFrame from a 47 MiB sample KML, walking ~1 M Raw nodes during the traversal), this allocation site was contributing ~60 MiB of cumulative tracked allocation. Removing it cuts the consumer's end-to-end memory by about a sixth and lops ~7% off wall-clock time. The same pattern would apply to any package using XML.LazyNode or XML.Raw to walk a large document.

(See also #59 for a complementary additive change adding next! / prev! for in-place traversal — independent and stackable.)

Verification

The full XML.jl test suite passes locally on this branch (Julia 1.12), including all 71 xml:space cases — those exercise the next_xml_space path that's deliberately unchanged here.
No public API change, no behavioral change observable to existing callers.

Diff size

One real line changed:

-    ctx = [false]
+    ctx = o.ctx

Plus a short comment explaining the rationale, so future readers don't re-allocate it back without understanding the constraint.

`next_no_xml_space` is only entered when the source document has no `xml:space` attribute anywhere, in which case the per-node `ctx` (xml:space inheritance stack) is always `Bool[false]` and is never mutated. Allocating a fresh `[false]` on every call therefore costs ~32 B × N nodes for no semantic benefit. Reuse the parent's `ctx` instead. On a 47 MiB representative KML file (5,411 Placemarks, ~1 M XML nodes traversed during a `DataFrame` extraction in a downstream consumer), this drops cumulative tracked allocations by ~60 MiB without any behavior or API change. Wall-clock parsing time is unchanged. next_xml_space is unaffected: that path mutates ctx via push/pop when descending into elements with `xml:space`, so each Raw still needs its own copy.

joshday · 2026-04-30T16:45:01Z

FYI there's a bit of a rewrite in progress:

#54

mathieu17g · 2026-04-30T20:58:50Z

Indeed, I had not seen it, sorry.

I will wait for its completion to see if the issue I was addressing in this PR and PR #59 are still there.

mathieu17g · 2026-05-20T22:25:01Z

After benchmarking against v0.4 (#54): the context-sharing fix I proposed here for next_no_xml_space is naturally subsumed by v0.4's tokenizer refactor — there's no separate xml_space buffer pool left to share once tokenization moves to the new Tokenizer + TokenizerState model.

Confirmed in FastKML.jl 3-way benchmarks: v0.4 eager is faster than v0.3.8 + #58 + #59 eager on all 4 reference files (×2–2.6 speedup; 37–69% memory reduction), so this PR's contribution under v0.3.8 no longer applies on the v0.4 path.

I'll close this PR once #54 merges. Keeping it open until then in case there's a v0.3.x maintenance line where this fix should still be merged independently.

mathieu17g mentioned this pull request Apr 30, 2026

feat: add next! and prev! for in-place LazyNode traversal #59

Open

This was referenced May 20, 2026

A StAX-style streaming primitive for v0.4 — recovering FastKML's lazy walk class without the LazyNode-as-cursor hack #61

Open

WIP XML.jl v0.4: Rewrite of internals, streaming tokenizer, XPath support, and bug fixes #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: avoid per-call ctx allocation in next_no_xml_space#58

perf: avoid per-call ctx allocation in next_no_xml_space#58
mathieu17g wants to merge 1 commit into
JuliaComputing:mainfrom
mathieu17g:perf-share-default-ctx

mathieu17g commented Apr 30, 2026 •

edited

Loading

Uh oh!

joshday commented Apr 30, 2026

Uh oh!

mathieu17g commented Apr 30, 2026

Uh oh!

mathieu17g commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mathieu17g commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this is safe

Why it matters

Verification

Diff size

Uh oh!

joshday commented Apr 30, 2026

Uh oh!

mathieu17g commented Apr 30, 2026

Uh oh!

mathieu17g commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mathieu17g commented Apr 30, 2026 •

edited

Loading