A pure Swift XML parsing and encoding library that prioritizes correctness, explicit semantics, and Codable integration. The project is still evolving toward broader XML 1.0 coverage by layering strict structural checks on top of a transparent pipeline.
This library is intended for developers who need explicit, inspectable XML parsing and type-safe decoding with predictable semantics. It is well suited for infrastructure code, data transformation pipelines, and applications that value correctness and transparency over convenience magic.
It is not intended for workloads that require full DTD validation, XML Schema support, comprehensive XPath/XQuery engines, or legacy XML feature completeness.
- Favor explicit behavior over implicit magic; each layer exposes the data it actually observes.
- Treat type-safe decoding as the primary API, with DOM construction and serialization available for other workflows.
- Keep the tokenizer, pull parser, internal concrete intermediate representation builder, DOM, and coding layers clearly separated so that responsibilities remain auditable.
- Implement features incrementally while being guided by the XML 1.0 (Fifth Edition) specification.
- Diagnostics are structured and grammar-aware, designed to be human-readable without leaking implementation details.
- Tokenizer –
XMLTokenizerperforms best-effort lexical scanning of raw input into tokens (start/end tags, attributes, text, comments, CDATA, processing instructions). It recognizes but does not surface DOCTYPE and XML declarations as public tokens. - Pull Parser –
XMLPullParserwraps the tokenizer with.startDocument/.endDocumentframing, expands empty-element tags, and streamsXMLEventvalues without performing structural validation. - Concrete Intermediate Representation (Internal) – An internal construction phase folds the event stream into a lossless, immutable concrete intermediate representation. This phase enforces XML well-formedness rules (single root element, balanced start/end tags, and classification of whitespace-only prolog/epilog as non-semantic trivia) or fragment rules depending on mode. This representation is an implementation detail and is not exposed as public API.
- DOM Layer –
DOMBuilderconverts the immutable tree into a mutable DOM with parent/owner invariants, preserving comments, CDATA, processing instructions, attribute ordering, and source-formatting trivia as metadata. - Codable Decoder / Encoder –
XMLDecoderwalks DOM nodes to expose keyed, unkeyed, and single-value containers that reflect the DOM hierarchy exactly. It only decodes scalar values from textual content; interpreting element names or schema constructs is deliberately left to user code.XMLEncoderbuilds DOM trees fromEncodablevalues, andXMLWriterserializes DOM documents or fragments back to XML text.
This package supports Swift Package Manager. The current release version is 0.1.0.
If you manage dependencies in Package.swift, add swift-xml to your package dependencies:
dependencies: [
.package(url: "https://github.com/zijievv/swift-xml.git", from: "0.1.0")
]Then add the dependency to the appropriate target:
.target(
name: "YourTarget",
dependencies: [
.product(name: "XML", package: "swift-xml")
]
)If your project is not managed by Swift Package Manager, you can still integrate swift-xml using Xcode:
- Open your project in Xcode.
- Choose File > Add Package Dependencies…
- Enter the repository URL: https://github.com/zijievv/swift-xml
- When prompted, select version 0.1.0 (or a compatible range).
- Add the package to your project.
- In your app target’s General > Frameworks, Libraries, and Embedded Content, ensure the
XMLproduct is linked.
Xcode will manage fetching, building, and linking the package automatically, even if your project itself is not SPM-based.
- Parsing of well-formed XML documents and fragments, including elements, attributes, text, comments, CDATA sections, and processing instructions.
- Structural validation in document mode: exactly one root element, matching start/end tags, whitespace-only text outside the root, and rejection of unclosed or stray tags.
- Fragment mode parsing that permits multiple top-level nodes while still enforcing balanced nesting.
- DOM construction with stable node identity, parent/owner propagation, and
textContentaggregation. - Codable decoding of element/attribute data via keyed, unkeyed, and single-value containers that directly mirror DOM structure, plus encoding of tree-like structures and attributes (via
XMLCodingKey.isAttribute). - Deterministic, unformatted serialization of DOM documents and fragments with basic escaping.
- Schema agnostic by design:
XMLDecodernever interprets element names such as<dict>,<key>,<array>,<true>, or domain-specific tags. - Provides only structural traversal—keyed/unkeyed/single-value containers are derived solely from the DOM tree.
- Scalar decoding reads text or CDATA content (including concatenated descendant text for elements) into
String,Bool, and numeric primitives. Booleans are parsed from textual tokens (for exampletrue/false, and other textual representations currently supported by the decoder) rather than from element or attribute names. - Any higher-level schema semantics (plist pairing, custom collection shapes, sentinel elements, etc.) must be implemented manually in
Decodabletypes by walking the containers produced by the decoder.
Public APIs that are currently inert or unsupported:
-
XMLDecoder.keyDecodingStrategyis stored but not applied. -
Namespace handling is structural/DOM-level only; higher-level decoding remains prefix-agnostic.
-
Namespace-aware lookup and decoding/encoding. Namespace URI resolution and default namespace handling are implemented at the DOM layer, but higher-level APIs remain prefix-agnostic.
-
Entity and character reference expansion; sequences such as
&remain literal text. -
DOCTYPE exposure, DTD validation, attribute defaulting, ID/IDREF typing, or parameter entity processing.
-
XML declaration parsing beyond duplicate detection, encoding detection, character-set validation, or CR/LF normalization.
-
Writer-side features such as XML declaration emission, DOCTYPE serialization, pretty printing, or validation of CDATA edge cases.
This library currently implements only a subset of XML 1.0 well-formedness rules. Detailed subsystem-by-subsystem coverage, including partial and missing features, is documented in XML-1.0-Compliance.md, which serves as the authoritative reference.
- Namespace-aware lookup and Codable integration so decoding/encoding can differentiate elements and attributes by namespace URI rather than prefix.
- Intended entity and character reference handling, including expansion of
&name;and numeric references prior to tree construction. - Namespace-aware decoder and encoder behavior (attribute vs element disambiguation, key decoding strategy wiring, mixed-content detection).
- Optional, non-core query or dynamic access helpers built on top of the DOM for convenience use cases beyond Codable.
- XML declaration parsing plus writer improvements for emitting declarations, DOCTYPEs, and additional escaping safeguards.
- Full DTD or XML Schema validation; structural well-formedness is the extent of current checking.
- External entity resolution or network/file loading for DTDs and entity bodies.
- Broad XPath/XQuery engines or other complex query languages on top of the DOM.
This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.