You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @joshday — FastKML.jl is the fork of KML.jl that you suggested in joshday/KML.jl#14. Still early-stage and unpublished, but I've hit a v0.4 design question I'd like your thinking on before settling on a FastKML XML.jl upgrade path.
The question is, in one sentence: should XML.jl v0.4 offer an explicit StAX-style (pull-mode) streaming primitive, separate from LazyNode? The body of this issue lays out why I'm asking. Quick terminology refresher upfront, since the answer depends on it:
SAX (Simple API for XML, 1998) is push-mode: the parser drives, the consumer registers handlers, and the parser calls them.
StAX (Streaming API for XML, 2004) is pull-mode: the consumer drives, holding a cursor and calling next() to advance. Most of the XML parsers (Rust quick-xml, libxml2's xmlTextReader, lxml's iterparse, Go's encoding/xml, EzXML.jl's StreamReader) seem to have converged on this model. Two granularities show up in practice: iterator-based StAX at the token level (consumer iterates raw tokens — StartTag, Text, EndTag, ...) and cursor-based StAX at the event level (consumer advances a cursor over higher-level events). I'll use these two names below.
The pattern FastKML adopted under v0.3.8 is StAX-style (the consumer body iterates events). The primitive it needs from XML.jl is StAX-style too.
What FastKML's lazy mode actually is
FastKML's DataFrame extraction code path on v0.3.8 (used by DataFrame(file; layer=k) and PlacemarkTable) — the dominant consumer of the lazy reader — is, under the hood, a StAX-style streaming parser: a pull-mode cursor walking the token stream forward, with the consumer body deciding what to do at each event. I built it on top of LazyNode + next() replaced with PR #59 by next!() mutation, with a macro layer (@for_each_immediate_child) that tracks depth and exposes a single, mutating wrapper under a child name in consumer bodies. The depth-tracked cursor sweeps the token stream forward; the consumer extracts current-position state into a flat output (a DataFrame row), never building a node-stack.
(Other uses of LazyKMLFile — direct navigation, partial reads, user-defined queries — exploit LazyNode as a normal DOM primitive; the "hack" framing below applies specifically to the cursor pattern used by the tabular-extraction path, not to LazyNode's role in general.)
Honest framing for this extraction code path: it's a bit of a hack. I'm repurposing a DOM-shaped primitive (LazyNode) as a StAX-style cursor — the right model on the wrong primitive (a DOM node). The extraction originally used XML.next(::LazyNode) (allocating); PR #59's next!() mutation made it zero-wrapper-allocation, and with that applied the perf is strong across the board: FastKML beats ArchGDAL in time on 4 of 4 reference KML files and is the most memory-efficient configuration on 3 of 4. But "performant hack" is still a hack — the design is asking for an explicit StAX primitive, not a mutable LazyNode.
What works well in v0.4 for FastKML.jl tabular extraction
The eager path improved substantially across the four reference KML files declared in benchmark/benchmark_kml_parsers.jl (KMZ2 / KML4 / KMZ5 / KMZ6 in the tables below — naming follows file extension; full URLs + content profile in the Reproduction section):
File (placemarks)
eager v0.3.8 + #58 + #59
eager v0.4.0
speedup
KMZ2 enzone (5.4k)
412 ms / 363 MiB
192 ms / 229 MiB
×2.15
KML4 WRS-2 (28.5k)
1101 ms / 1629 MiB
534 ms / 597 MiB
×2.06
KMZ5 qfaults (114k)
4834 ms / 4814 MiB
1836 ms / 2239 MiB
×2.63
KMZ6 nat_frs (163k)
4259 ms / 6597 MiB
1924 ms / 2065 MiB
×2.21
×2–2.6 wall-clock and 37–69% memory reduction on the eager path. Real and welcome.
Where the door closed
v0.4 removed the linear-traversal API on LazyNode — both XML.next / XML.prev (which existed since v0.3.x, allocating a new LazyNode per advance, used to walk a document linearly) and XML.next! / XML.prev! (which would be added by PR #59 as in-place-mutation optimizations of the same mechanism). All four are absent in v0.4, replaced with eachchildnode(::LazyNode) and children(::LazyNode) — DOM-style iteration primitives that don't expose linear advance. That's the actual door closure for the StAX-on-LazyNode hack. The broader immutability of v0.4's LazyNode is a separate design choice for the DOM use case; even if mutability were preserved, the linear-traversal API would still need re-introduction.
Same four files, lazy mode:
File
lazy v0.3.8 + #58 + #59
lazy v0.4.0
slowdown
KMZ2
204 ms / 188 MiB
395 ms / 405 MiB
×1.94
KML4
261 ms / 480 MiB
688 ms / 1707 MiB
×2.64
KMZ5
2357 ms / 1855 MiB
3188 ms / 6004 MiB
×1.35
KMZ6
1247 ms / 2320 MiB
2862 ms / 8099 MiB
×2.30
On KML4 and KMZ6 (the deeply-structured files), v0.3.8 + #58 + #59 lazy is also faster than v0.4 eager — so a strict v0.4 migration loses the previously optimal path on those profiles, even after picking the best v0.4 path.
The question, sharpened
Streaming consumers — FastKML, and potentially others (anything that walks an XML document once into a flat tabular output, à la XLSX.jl-style extraction) — lose the v0.3.8+#59 performance class without an equivalent v0.4-native primitive. The question is what to do about it.
Three placements are conceivable: (a) inside XML.jl as a first-class streaming primitive, (b) in a separate package (XMLStreaming.jl?), (c) per-consumer re-implementation. Each package author paying again for the same pattern feels like the right cue that this belongs at the XML.jl layer.
This isn't a request to retrofit mutability onto LazyNode (which would conflict with v0.4's deliberate immutable design). It's about whether the streaming use case warrants its own first-class primitive — most naturally a StAX-style pull cursor, conceptually separate from the DOM API.
Design space — informed by a SOTA survey
I surveyed with Claude, nine streaming XML parsers — both SAX-style (push) and StAX-style (pull) — across Java SAX, Java StAX, Expat, libxml2's xmlTextReader, lxml's iterparse, Rust quick-xml, Rust xmlparser, Go's encoding/xml, and existing Julia options — and what they imply for a v0.4-idiomatic implementation, in a companion design research file. 8 of 11 surveyed parsers are StAX-style (pull-mode); XML parsers have largely converged on consumer-driven cursors as the default. The synthesis points to a two-layer StAX design:
Foundation layer (iterator-based StAX) — public Tokenizer / TokenizerState / Token (already exist internally in v0.4). Pull-mode at the token level (consumer iterates one token at a time via Base.iterate):
Direct Julia analog of Rust xmlparser's Tokenizer / quick-xml::Tokenizer. Useful as-is for consumers that don't need event reassembly.
To make it allocation-free in the hot path, three upstream-internal refactors:
Bitstype-ify Token (replace raw::SubString{S} with offset::Int + length::Int). Required for Julia's calling convention to pass Token values in registers across function boundaries.
Split iterate(::Tokenizer, ::TokenizerState) into per-mode @inline helpers. The current single body with a mode-driven if/elseif chain exceeds Julia's inliner budget, so SROA can't scalarize the returned Tuple across the iterate boundary.
Use a TOKEN_EOF sentinel instead of Union{Nothing, …} return. Directly inspired by quick-xml::Event::Eof (a unit variant of the Event enum, not an Option::None) — both close the "non-bitstype Union return forces heap allocation" trap.
Streaming primitive on top (cursor-based StAX) — an explicit CursorNode (mutable by design, named to signal its role as a pull-mode cursor), walked via next!() / prev!() (forward + backward linear advance, matching the v0.3.x next / prev API but on a new type), with depth-tracked traversal macros. Pull-mode at the event level:
Direct Julia analog of quick-xml::Reader<'a> / Java StAX's XMLStreamReader / libxml2's xmlTextReader / EzXML.jl's StreamReader — all StAX-style pull cursors.
The existing @for_each_immediate_child macro on FastKML's wip-xml-next-bang-adoption branch is the model for the macro layer — it already implements StAX-style iteration via depth tracking through next!().
The aliasing contract ("a cursor mutates as you advance it") becomes a natural property of the type rather than a footgun on a value-like LazyNode.
Cross-package use case: XLSX.jl follows the same pattern. @TimG1964 flagged the removal of next / prev on this PR back in March 2026 (comment) because XLSX.jl's sheetrow / tablerow iterators rely on these exported functions for forward + backward token traversal. FastKML and XLSX.jl share the same shape of need: a linear-advance API (currently next / prev, ideally next! / prev! for the zero-alloc class once PR feat: add next! and prev! for in-place LazyNode traversal #59 is integrated) on a streaming-suitable primitive.
In short, the cursor-based StAX layer re-introduces the v0.3.x next / prev linear-traversal API (with PR #59's !-suffixed mutating variants for the zero-alloc class) on a new mutable type (CursorNode) instead of on LazyNode — preserving v0.4's DOM-side immutability while restoring the streaming-side mutating cursor.
A callback-style walker (walk_children(node) do child …) is the SAX-style (push) alternative — same conceptual layer, parser drives instead of consumer drives. It can be implemented as a thin wrapper on top of CursorNode (for_each(f, c::CursorNode)) for consumers who prefer the do-block syntax, but not as the primary public API — most XML parsers' default is StAX, and it matches Julia's iterator idiom directly.
A poolable LazyChildIterator would address only the wrapper allocation share and not the dominant per-event cost; insufficient on its own.
Root-cause analysis of the per-iterate Tuple allocation on Julia 1.12.6 — the SROA effectiveness diagnosis (inlining boundary
non-isbits operand types) surfaced via @code_typed, @code_llvm, and isbitstype. Documents the technical basis for the three foundation-layer refactors proposed above (bitstype Token, iterate split, TOKEN_EOF sentinel): benchmark/rootcause_iterate_tuple_allocation_2026-05-11.md.
Julia 1.12.6, Darwin aarch64. @benchmark budget: 10 s per (file, technique) for real workloads.
Related
PR #58 (ctx-share in next_no_xml_space) — naturally subsumed by v0.4's refactor; commented to that effect on the PR.
PR #59 (next! / prev! for LazyNode) — historically the mechanism that made the StAX-on-LazyNode hack performant under v0.3.8. The discussion thread there has prior thinking on the aliasing contract any cursor-style API needs to document.
PR #54 — the v0.4 work-in-progress this issue is responding to. A short pointer comment has been added there linking back here.
@TimG1964's comment on PR #54 (2026-03-08) — earlier signal on the same concern, from the XLSX.jl side. Flagged the removal of next / prev as a challenge because XLSX.jl's sheetrow / tablerow iterators rely on those exported functions. This issue follows up with FastKML's data and a design space, on a use case isomorphic to XLSX.jl's.
Thanks for considering — happy to refine the benchmark or prototype a specific direction (CursorNode + macro layer, callback-style walker, or bitstype-ified Token) if one of these reads as worth pursuing.
Hi @joshday — FastKML.jl is the fork of KML.jl that you suggested in joshday/KML.jl#14. Still early-stage and unpublished, but I've hit a
v0.4design question I'd like your thinking on before settling on a FastKML XML.jl upgrade path.The question is, in one sentence: should XML.jl
v0.4offer an explicit StAX-style (pull-mode) streaming primitive, separate fromLazyNode? The body of this issue lays out why I'm asking. Quick terminology refresher upfront, since the answer depends on it:next()to advance. Most of the XML parsers (Rust quick-xml, libxml2'sxmlTextReader, lxml'siterparse, Go'sencoding/xml, EzXML.jl'sStreamReader) seem to have converged on this model. Two granularities show up in practice: iterator-based StAX at the token level (consumer iterates raw tokens —StartTag,Text,EndTag, ...) and cursor-based StAX at the event level (consumer advances a cursor over higher-level events). I'll use these two names below.The pattern FastKML adopted under v0.3.8 is StAX-style (the consumer body iterates events). The primitive it needs from XML.jl is StAX-style too.
What FastKML's lazy mode actually is
FastKML's
DataFrameextraction code path onv0.3.8(used byDataFrame(file; layer=k)andPlacemarkTable) — the dominant consumer of the lazy reader — is, under the hood, a StAX-style streaming parser: a pull-mode cursor walking the token stream forward, with the consumer body deciding what to do at each event. I built it on top ofLazyNode+next()replaced with PR #59 bynext!()mutation, with a macro layer (@for_each_immediate_child) that tracks depth and exposes a single, mutating wrapper under achildname in consumer bodies. The depth-tracked cursor sweeps the token stream forward; the consumer extracts current-position state into a flat output (a DataFrame row), never building a node-stack.(Other uses of
LazyKMLFile— direct navigation, partial reads, user-defined queries — exploitLazyNodeas a normal DOM primitive; the "hack" framing below applies specifically to the cursor pattern used by the tabular-extraction path, not to LazyNode's role in general.)Honest framing for this extraction code path: it's a bit of a hack. I'm repurposing a DOM-shaped primitive (
LazyNode) as a StAX-style cursor — the right model on the wrong primitive (a DOM node). The extraction originally usedXML.next(::LazyNode)(allocating); PR #59'snext!()mutation made it zero-wrapper-allocation, and with that applied the perf is strong across the board: FastKML beats ArchGDAL in time on 4 of 4 reference KML files and is the most memory-efficient configuration on 3 of 4. But "performant hack" is still a hack — the design is asking for an explicit StAX primitive, not a mutable LazyNode.What works well in
v0.4for FastKML.jl tabular extractionThe eager path improved substantially across the four reference KML files declared in
benchmark/benchmark_kml_parsers.jl(KMZ2 / KML4 / KMZ5 / KMZ6 in the tables below — naming follows file extension; full URLs + content profile in the Reproduction section):v0.3.8 + #58 + #59v0.4.0×2–2.6 wall-clock and 37–69% memory reduction on the eager path. Real and welcome.
Where the door closed
v0.4removed the linear-traversal API onLazyNode— bothXML.next/XML.prev(which existed sincev0.3.x, allocating a new LazyNode per advance, used to walk a document linearly) andXML.next!/XML.prev!(which would be added by PR #59 as in-place-mutation optimizations of the same mechanism). All four are absent inv0.4, replaced witheachchildnode(::LazyNode)andchildren(::LazyNode)— DOM-style iteration primitives that don't expose linear advance. That's the actual door closure for the StAX-on-LazyNode hack. The broader immutability ofv0.4'sLazyNodeis a separate design choice for the DOM use case; even if mutability were preserved, the linear-traversal API would still need re-introduction.Same four files, lazy mode:
v0.3.8 + #58 + #59v0.4.0On KML4 and KMZ6 (the deeply-structured files),
v0.3.8 + #58 + #59lazy is also faster thanv0.4eager — so a strictv0.4migration loses the previously optimal path on those profiles, even after picking the bestv0.4path.The question, sharpened
Streaming consumers — FastKML, and potentially others (anything that walks an XML document once into a flat tabular output, à la XLSX.jl-style extraction) — lose the v0.3.8+#59 performance class without an equivalent
v0.4-native primitive. The question is what to do about it.Three placements are conceivable: (a) inside XML.jl as a first-class streaming primitive, (b) in a separate package (
XMLStreaming.jl?), (c) per-consumer re-implementation. Each package author paying again for the same pattern feels like the right cue that this belongs at the XML.jl layer.This isn't a request to retrofit mutability onto
LazyNode(which would conflict withv0.4's deliberate immutable design). It's about whether the streaming use case warrants its own first-class primitive — most naturally a StAX-style pull cursor, conceptually separate from the DOM API.Design space — informed by a SOTA survey
I surveyed with Claude, nine streaming XML parsers — both SAX-style (push) and StAX-style (pull) — across Java SAX, Java StAX, Expat, libxml2's
xmlTextReader, lxml'siterparse, Rust quick-xml, Rust xmlparser, Go'sencoding/xml, and existing Julia options — and what they imply for av0.4-idiomatic implementation, in a companion design research file. 8 of 11 surveyed parsers are StAX-style (pull-mode); XML parsers have largely converged on consumer-driven cursors as the default. The synthesis points to a two-layer StAX design:Foundation layer (iterator-based StAX) — public
Tokenizer/TokenizerState/Token(already exist internally inv0.4). Pull-mode at the token level (consumer iterates one token at a time viaBase.iterate):Tokenizer/quick-xml::Tokenizer. Useful as-is for consumers that don't need event reassembly.Token(replaceraw::SubString{S}withoffset::Int+length::Int). Required for Julia's calling convention to passTokenvalues in registers across function boundaries.iterate(::Tokenizer, ::TokenizerState)into per-mode@inlinehelpers. The current single body with amode-drivenif/elseifchain exceeds Julia's inliner budget, so SROA can't scalarize the returned Tuple across the iterate boundary.TOKEN_EOFsentinel instead ofUnion{Nothing, …}return. Directly inspired byquick-xml::Event::Eof(a unit variant of theEventenum, not anOption::None) — both close the "non-bitstype Union return forces heap allocation" trap.Streaming primitive on top (cursor-based StAX) — an explicit
CursorNode(mutable by design, named to signal its role as a pull-mode cursor), walked vianext!()/prev!()(forward + backward linear advance, matching the v0.3.xnext/prevAPI but on a new type), with depth-tracked traversal macros. Pull-mode at the event level:quick-xml::Reader<'a>/ Java StAX'sXMLStreamReader/ libxml2'sxmlTextReader/ EzXML.jl'sStreamReader— all StAX-style pull cursors.@for_each_immediate_childmacro on FastKML'swip-xml-next-bang-adoptionbranch is the model for the macro layer — it already implements StAX-style iteration via depth tracking throughnext!().LazyNode.next/prevon this PR back in March 2026 (comment) because XLSX.jl'ssheetrow/tablerowiterators rely on these exported functions for forward + backward token traversal. FastKML and XLSX.jl share the same shape of need: a linear-advance API (currentlynext/prev, ideallynext!/prev!for the zero-alloc class once PR feat: addnext!andprev!for in-place LazyNode traversal #59 is integrated) on a streaming-suitable primitive.In short, the cursor-based StAX layer re-introduces the v0.3.x
next/prevlinear-traversal API (with PR #59's!-suffixed mutating variants for the zero-alloc class) on a new mutable type (CursorNode) instead of onLazyNode— preserving v0.4's DOM-side immutability while restoring the streaming-side mutating cursor.A callback-style walker (
walk_children(node) do child …) is the SAX-style (push) alternative — same conceptual layer, parser drives instead of consumer drives. It can be implemented as a thin wrapper on top ofCursorNode(for_each(f, c::CursorNode)) for consumers who prefer the do-block syntax, but not as the primary public API — most XML parsers' default is StAX, and it matches Julia's iterator idiom directly.A poolable
LazyChildIteratorwould address only the wrapper allocation share and not the dominant per-event cost; insufficient on its own.Reproduction
All measurements are reproducible from:
benchmark/results_eager_vs_lazy_3way_2026-05-11.md; reference files declared inbenchmark/benchmark_kml_parsers.jl. Test files:<MultiGeometry>, ~46 MiB)Tupleallocation on Julia 1.12.6 — the SROA effectiveness diagnosis (inlining boundaryisbitsoperand types) surfaced via@code_typed,@code_llvm, andisbitstype. Documents the technical basis for the three foundation-layer refactors proposed above (bitstypeToken,iteratesplit,TOKEN_EOFsentinel):benchmark/rootcause_iterate_tuple_allocation_2026-05-11.md.notes/upstream_issues/streaming_parser_research.md.Versions tested:
v0.3.8—Pkg.add(name="XML", version="0.3.8")v0.3.8 + PRs #58 + #59— combined branchmathieu17g/XML.jl@dev-combinedv0.4.0— JuliaComputing/XML.jl@main SHAe7e21a7(PR WIP XML.jl v0.4: Rewrite of internals, streaming tokenizer, XPath support, and bug fixes #54 head as of 2026-05-11)Julia 1.12.6, Darwin aarch64.
@benchmarkbudget: 10 s per (file, technique) for real workloads.Related
next_no_xml_space) — naturally subsumed byv0.4's refactor; commented to that effect on the PR.next!/prev!forLazyNode) — historically the mechanism that made the StAX-on-LazyNode hack performant underv0.3.8. The discussion thread there has prior thinking on the aliasing contract any cursor-style API needs to document.v0.4work-in-progress this issue is responding to. A short pointer comment has been added there linking back here.next/prevas a challenge because XLSX.jl'ssheetrow/tablerowiterators rely on those exported functions. This issue follows up with FastKML's data and a design space, on a use case isomorphic to XLSX.jl's.Thanks for considering — happy to refine the benchmark or prototype a specific direction (CursorNode + macro layer, callback-style walker, or bitstype-ified Token) if one of these reads as worth pursuing.