Skip to content

A StAX-style streaming primitive for v0.4 — recovering FastKML's lazy walk class without the LazyNode-as-cursor hack #61

@mathieu17g

Description

@mathieu17g

Hi @joshday — FastKML.jl is the fork of KML.jl that you suggested in joshday/KML.jl#14. Still early-stage and unpublished, but I've hit a v0.4 design question I'd like your thinking on before settling on a FastKML XML.jl upgrade path.

The question is, in one sentence: should XML.jl v0.4 offer an explicit StAX-style (pull-mode) streaming primitive, separate from LazyNode? The body of this issue lays out why I'm asking. Quick terminology refresher upfront, since the answer depends on it:

  • SAX (Simple API for XML, 1998) is push-mode: the parser drives, the consumer registers handlers, and the parser calls them.
  • StAX (Streaming API for XML, 2004) is pull-mode: the consumer drives, holding a cursor and calling next() to advance. Most of the XML parsers (Rust quick-xml, libxml2's xmlTextReader, lxml's iterparse, Go's encoding/xml, EzXML.jl's StreamReader) seem to have converged on this model. Two granularities show up in practice: iterator-based StAX at the token level (consumer iterates raw tokens — StartTag, Text, EndTag, ...) and cursor-based StAX at the event level (consumer advances a cursor over higher-level events). I'll use these two names below.

The pattern FastKML adopted under v0.3.8 is StAX-style (the consumer body iterates events). The primitive it needs from XML.jl is StAX-style too.

What FastKML's lazy mode actually is

FastKML's DataFrame extraction code path on v0.3.8 (used by DataFrame(file; layer=k) and PlacemarkTable) — the dominant consumer of the lazy reader — is, under the hood, a StAX-style streaming parser: a pull-mode cursor walking the token stream forward, with the consumer body deciding what to do at each event. I built it on top of LazyNode + next() replaced with PR #59 by next!() mutation, with a macro layer (@for_each_immediate_child) that tracks depth and exposes a single, mutating wrapper under a child name in consumer bodies. The depth-tracked cursor sweeps the token stream forward; the consumer extracts current-position state into a flat output (a DataFrame row), never building a node-stack.

(Other uses of LazyKMLFile — direct navigation, partial reads, user-defined queries — exploit LazyNode as a normal DOM primitive; the "hack" framing below applies specifically to the cursor pattern used by the tabular-extraction path, not to LazyNode's role in general.)

Honest framing for this extraction code path: it's a bit of a hack. I'm repurposing a DOM-shaped primitive (LazyNode) as a StAX-style cursor — the right model on the wrong primitive (a DOM node). The extraction originally used XML.next(::LazyNode) (allocating); PR #59's next!() mutation made it zero-wrapper-allocation, and with that applied the perf is strong across the board: FastKML beats ArchGDAL in time on 4 of 4 reference KML files and is the most memory-efficient configuration on 3 of 4. But "performant hack" is still a hack — the design is asking for an explicit StAX primitive, not a mutable LazyNode.

What works well in v0.4 for FastKML.jl tabular extraction

The eager path improved substantially across the four reference KML files declared in benchmark/benchmark_kml_parsers.jl (KMZ2 / KML4 / KMZ5 / KMZ6 in the tables below — naming follows file extension; full URLs + content profile in the Reproduction section):

File (placemarks) eager v0.3.8 + #58 + #59 eager v0.4.0 speedup
KMZ2 enzone (5.4k) 412 ms / 363 MiB 192 ms / 229 MiB ×2.15
KML4 WRS-2 (28.5k) 1101 ms / 1629 MiB 534 ms / 597 MiB ×2.06
KMZ5 qfaults (114k) 4834 ms / 4814 MiB 1836 ms / 2239 MiB ×2.63
KMZ6 nat_frs (163k) 4259 ms / 6597 MiB 1924 ms / 2065 MiB ×2.21

×2–2.6 wall-clock and 37–69% memory reduction on the eager path. Real and welcome.

Where the door closed

v0.4 removed the linear-traversal API on LazyNode — both XML.next / XML.prev (which existed since v0.3.x, allocating a new LazyNode per advance, used to walk a document linearly) and XML.next! / XML.prev! (which would be added by PR #59 as in-place-mutation optimizations of the same mechanism). All four are absent in v0.4, replaced with eachchildnode(::LazyNode) and children(::LazyNode) — DOM-style iteration primitives that don't expose linear advance. That's the actual door closure for the StAX-on-LazyNode hack. The broader immutability of v0.4's LazyNode is a separate design choice for the DOM use case; even if mutability were preserved, the linear-traversal API would still need re-introduction.

Same four files, lazy mode:

File lazy v0.3.8 + #58 + #59 lazy v0.4.0 slowdown
KMZ2 204 ms / 188 MiB 395 ms / 405 MiB ×1.94
KML4 261 ms / 480 MiB 688 ms / 1707 MiB ×2.64
KMZ5 2357 ms / 1855 MiB 3188 ms / 6004 MiB ×1.35
KMZ6 1247 ms / 2320 MiB 2862 ms / 8099 MiB ×2.30

On KML4 and KMZ6 (the deeply-structured files), v0.3.8 + #58 + #59 lazy is also faster than v0.4 eager — so a strict v0.4 migration loses the previously optimal path on those profiles, even after picking the best v0.4 path.

The question, sharpened

Streaming consumers — FastKML, and potentially others (anything that walks an XML document once into a flat tabular output, à la XLSX.jl-style extraction) — lose the v0.3.8+#59 performance class without an equivalent v0.4-native primitive. The question is what to do about it.

Three placements are conceivable: (a) inside XML.jl as a first-class streaming primitive, (b) in a separate package (XMLStreaming.jl?), (c) per-consumer re-implementation. Each package author paying again for the same pattern feels like the right cue that this belongs at the XML.jl layer.

This isn't a request to retrofit mutability onto LazyNode (which would conflict with v0.4's deliberate immutable design). It's about whether the streaming use case warrants its own first-class primitive — most naturally a StAX-style pull cursor, conceptually separate from the DOM API.

Design space — informed by a SOTA survey

I surveyed with Claude, nine streaming XML parsers — both SAX-style (push) and StAX-style (pull) — across Java SAX, Java StAX, Expat, libxml2's xmlTextReader, lxml's iterparse, Rust quick-xml, Rust xmlparser, Go's encoding/xml, and existing Julia options — and what they imply for a v0.4-idiomatic implementation, in a companion design research file. 8 of 11 surveyed parsers are StAX-style (pull-mode); XML parsers have largely converged on consumer-driven cursors as the default. The synthesis points to a two-layer StAX design:

Foundation layer (iterator-based StAX) — public Tokenizer / TokenizerState / Token (already exist internally in v0.4). Pull-mode at the token level (consumer iterates one token at a time via Base.iterate):

  • Direct Julia analog of Rust xmlparser's Tokenizer / quick-xml::Tokenizer. Useful as-is for consumers that don't need event reassembly.
  • To make it allocation-free in the hot path, three upstream-internal refactors:
    • Bitstype-ify Token (replace raw::SubString{S} with offset::Int + length::Int). Required for Julia's calling convention to pass Token values in registers across function boundaries.
    • Split iterate(::Tokenizer, ::TokenizerState) into per-mode @inline helpers. The current single body with a mode-driven if/elseif chain exceeds Julia's inliner budget, so SROA can't scalarize the returned Tuple across the iterate boundary.
    • Use a TOKEN_EOF sentinel instead of Union{Nothing, …} return. Directly inspired by quick-xml::Event::Eof (a unit variant of the Event enum, not an Option::None) — both close the "non-bitstype Union return forces heap allocation" trap.

Streaming primitive on top (cursor-based StAX) — an explicit CursorNode (mutable by design, named to signal its role as a pull-mode cursor), walked via next!() / prev!() (forward + backward linear advance, matching the v0.3.x next / prev API but on a new type), with depth-tracked traversal macros. Pull-mode at the event level:

  • Direct Julia analog of quick-xml::Reader<'a> / Java StAX's XMLStreamReader / libxml2's xmlTextReader / EzXML.jl's StreamReader — all StAX-style pull cursors.
  • The existing @for_each_immediate_child macro on FastKML's wip-xml-next-bang-adoption branch is the model for the macro layer — it already implements StAX-style iteration via depth tracking through next!().
  • The aliasing contract ("a cursor mutates as you advance it") becomes a natural property of the type rather than a footgun on a value-like LazyNode.
  • Cross-package use case: XLSX.jl follows the same pattern. @TimG1964 flagged the removal of next / prev on this PR back in March 2026 (comment) because XLSX.jl's sheetrow / tablerow iterators rely on these exported functions for forward + backward token traversal. FastKML and XLSX.jl share the same shape of need: a linear-advance API (currently next / prev, ideally next! / prev! for the zero-alloc class once PR feat: add next! and prev! for in-place LazyNode traversal #59 is integrated) on a streaming-suitable primitive.

In short, the cursor-based StAX layer re-introduces the v0.3.x next / prev linear-traversal API (with PR #59's !-suffixed mutating variants for the zero-alloc class) on a new mutable type (CursorNode) instead of on LazyNode — preserving v0.4's DOM-side immutability while restoring the streaming-side mutating cursor.

A callback-style walker (walk_children(node) do child …) is the SAX-style (push) alternative — same conceptual layer, parser drives instead of consumer drives. It can be implemented as a thin wrapper on top of CursorNode (for_each(f, c::CursorNode)) for consumers who prefer the do-block syntax, but not as the primary public API — most XML parsers' default is StAX, and it matches Julia's iterator idiom directly.

A poolable LazyChildIterator would address only the wrapper allocation share and not the dominant per-event cost; insufficient on its own.

Reproduction

All measurements are reproducible from:

  • FastKML real workloads (per-file profile, ArchGDAL content equivalence, per-file tokenization decomposition): benchmark/results_eager_vs_lazy_3way_2026-05-11.md; reference files declared in benchmark/benchmark_kml_parsers.jl. Test files:
    • KMZ2 — enzone2022.kmz (NY DEC environmental zones, 5 411 Polygon Placemarks wrapped in <MultiGeometry>, ~46 MiB)
    • KML4 — WRS-2_bound_world_0.kml (USGS WRS-2 Landsat tile boundaries, 28 557 Polygon Placemarks, single flat top-level layer)
    • KMZ5 — qfaults.kmz (USGS Quaternary Faults, 114 037 LineString Placemarks across 8 thematic top-level Folders, ~420 MiB decompressed)
    • KMZ6 — national_frs.kmz (EPA Facility Registry, 163 426 Point Placemarks, 3 top-level layers / 19 182 nested Folders)
  • Root-cause analysis of the per-iterate Tuple allocation on Julia 1.12.6 — the SROA effectiveness diagnosis (inlining boundary
  • SOTA streaming parser survey (SAX + StAX models) + Julia transposition + implementation sketch: notes/upstream_issues/streaming_parser_research.md.

Versions tested:

Julia 1.12.6, Darwin aarch64. @benchmark budget: 10 s per (file, technique) for real workloads.

Related

  • PR #58 (ctx-share in next_no_xml_space) — naturally subsumed by v0.4's refactor; commented to that effect on the PR.
  • PR #59 (next! / prev! for LazyNode) — historically the mechanism that made the StAX-on-LazyNode hack performant under v0.3.8. The discussion thread there has prior thinking on the aliasing contract any cursor-style API needs to document.
  • PR #54 — the v0.4 work-in-progress this issue is responding to. A short pointer comment has been added there linking back here.
  • @TimG1964's comment on PR #54 (2026-03-08) — earlier signal on the same concern, from the XLSX.jl side. Flagged the removal of next / prev as a challenge because XLSX.jl's sheetrow / tablerow iterators rely on those exported functions. This issue follows up with FastKML's data and a design space, on a use case isomorphic to XLSX.jl's.

Thanks for considering — happy to refine the benchmark or prototype a specific direction (CursorNode + macro layer, callback-style walker, or bitstype-ified Token) if one of these reads as worth pursuing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions