feat: add `next!` and `prev!` for in-place LazyNode traversal by mathieu17g · Pull Request #59 · JuliaComputing/XML.jl

mathieu17g · 2026-04-30T11:52:23Z

Summary

next(::LazyNode) allocates a fresh LazyNode wrapper on every call. Consumers walking large documents — e.g. extracting every Placemark from a 50 MiB KML — can churn ~1 M wrappers per traversal (~38 MiB cumulative). This PR adds next!(o) / prev!(o) that mutate o in place and return it, or nothing at the document boundary. Functionally equivalent to o = next(o), zero per-step allocation.

Why this is safe

Strictly additive: next / prev are unchanged, callers opt in. The aliasing trade-off (o is the same object across calls, so a retained reference would silently track the new position) is documented inline; the docstring points readers needing a snapshot at LazyNode(o.raw).

Why it matters

Measured on FastKML.jl extracting a DataFrame from a 47 MiB sample KML (~1 M Raw nodes traversed): the per-step allocation site at next(::LazyNode) was contributing ~38 MiB; switching the consumer's traversal loop to next! drops that to zero with no functional change. Independent of (and stackable with) the next_no_xml_space ctx fix in #58.

Verification

Full test suite passes (Julia 1.12), including a new LazyNode next! / prev! testset covering: functional equivalence with next, identity (next!(o) === o), memoization-field reset on advance, nothing at the document boundary, and prev! symmetry.

`next(o::LazyNode)` allocates a fresh `LazyNode` on every call, which is fine for occasional use but adds up sharply when a downstream package walks a large document — e.g. extracting all `Placemark` elements from a 50 MiB KML can allocate ~1 M `LazyNode` wrappers in the iterator alone (~38 MiB cumulative on a single benchmark run). Add a strictly-additive in-place pair, `next!(o)` / `prev!(o)`, that mutates `o` to point at the next/previous node and returns `o` (or `nothing` at the document boundary). Exported alongside `next` / `prev`. The aliasing trade-off is documented in the docstring: callers must not retain references to a previous position unless they explicitly snapshot with `LazyNode(o.raw)`. The existing `next` / `prev` methods are unchanged; this is purely opt-in API surface for hot paths.

mathieu17g · 2026-05-20T22:25:55Z

Following up on this PR after benchmarking against v0.4 (#54): the next!/prev! mutation pattern I introduced here remains the only single-wrapper lazy walk primitive observed in any tested XML.jl configuration (one LazyNode allocated upfront and mutated across the walk, rather than one per child).

On a 100k-Placemark synthetic walk (all rows measured today on the same host, Julia 1.12.6, Darwin aarch64):

Configuration	Total allocs	Time
`v0.3.8 + #58 + #59` `next!()` DFS (this PR)	1.9M	57 ms
`v0.4` `eachchildnode`	15M	351 ms
`v0.4` raw `Tokenizer` + recursive `LazyNode`	12M	293 ms

For a typed-DOM consumer (one LazyNode view per yielded child — the default FastKML lazy pattern), the per-child wrapper allocation is the dominant cost of the v0.4 lazy walk (~64% of total in the synth, and visible in real-workload profiles on the 4 reference files). Exposing the raw Tokenizer publicly recovers ~17% of the v0.4 cost (the iterator wrappers) but doesn't address that 64% dominant share.

I've opened design issue #61 laying out a SOTA-informed two-layer StAX design to recover this performance class under v0.4's immutable design. The proposed cursor-based StAX layer (with a new CursorNode type) is essentially this PR's mutation pattern ported to a dedicated streaming primitive alongside v0.4's immutable LazyNode — I'm keeping this PR open as a reference for that discussion, including the prior thinking in this thread on the aliasing contract.

mathieu17g mentioned this pull request Apr 30, 2026

perf: avoid per-call ctx allocation in next_no_xml_space #58

Open

This was referenced May 20, 2026

A StAX-style streaming primitive for v0.4 — recovering FastKML's lazy walk class without the LazyNode-as-cursor hack #61

Open

WIP XML.jl v0.4: Rewrite of internals, streaming tokenizer, XPath support, and bug fixes #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `next!` and `prev!` for in-place LazyNode traversal#59

feat: add `next!` and `prev!` for in-place LazyNode traversal#59
mathieu17g wants to merge 1 commit into
JuliaComputing:mainfrom
mathieu17g:feature-next-bang

mathieu17g commented Apr 30, 2026

Uh oh!

mathieu17g commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mathieu17g commented Apr 30, 2026

Summary

Why this is safe

Why it matters

Verification

Uh oh!

mathieu17g commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant