Skip to content

zijievv/swift-xml

Repository files navigation

swift-xml

A pure Swift XML parsing and encoding library that prioritizes correctness, explicit semantics, and Codable integration. The project is still evolving toward broader XML 1.0 coverage by layering strict structural checks on top of a transparent pipeline.

Who This Is For (and Not For)

This library is intended for developers who need explicit, inspectable XML parsing and type-safe decoding with predictable semantics. It is well suited for infrastructure code, data transformation pipelines, and applications that value correctness and transparency over convenience magic.

It is not intended for workloads that require full DTD validation, XML Schema support, comprehensive XPath/XQuery engines, or legacy XML feature completeness.

Design Philosophy

  • Favor explicit behavior over implicit magic; each layer exposes the data it actually observes.
  • Treat type-safe decoding as the primary API, with DOM construction and serialization available for other workflows.
  • Keep the tokenizer, pull parser, internal concrete intermediate representation builder, DOM, and coding layers clearly separated so that responsibilities remain auditable.
  • Implement features incrementally while being guided by the XML 1.0 (Fifth Edition) specification.
  • Diagnostics are structured and grammar-aware, designed to be human-readable without leaking implementation details.

Architecture Overview

  1. TokenizerXMLTokenizer performs best-effort lexical scanning of raw input into tokens (start/end tags, attributes, text, comments, CDATA, processing instructions). It recognizes but does not surface DOCTYPE and XML declarations as public tokens.
  2. Pull ParserXMLPullParser wraps the tokenizer with .startDocument/.endDocument framing, expands empty-element tags, and streams XMLEvent values without performing structural validation.
  3. Concrete Intermediate Representation (Internal) – An internal construction phase folds the event stream into a lossless, immutable concrete intermediate representation. This phase enforces XML well-formedness rules (single root element, balanced start/end tags, and classification of whitespace-only prolog/epilog as non-semantic trivia) or fragment rules depending on mode. This representation is an implementation detail and is not exposed as public API.
  4. DOM LayerDOMBuilder converts the immutable tree into a mutable DOM with parent/owner invariants, preserving comments, CDATA, processing instructions, attribute ordering, and source-formatting trivia as metadata.
  5. Codable Decoder / EncoderXMLDecoder walks DOM nodes to expose keyed, unkeyed, and single-value containers that reflect the DOM hierarchy exactly. It only decodes scalar values from textual content; interpreting element names or schema constructs is deliberately left to user code. XMLEncoder builds DOM trees from Encodable values, and XMLWriter serializes DOM documents or fragments back to XML text.

Installation

Swift Package Manager

This package supports Swift Package Manager. The current release version is 0.1.0.

If you manage dependencies in Package.swift, add swift-xml to your package dependencies:

dependencies: [
    .package(url: "https://github.com/zijievv/swift-xml.git", from: "0.1.0")
]

Then add the dependency to the appropriate target:

.target(
    name: "YourTarget",
    dependencies: [
        .product(name: "XML", package: "swift-xml")
    ]
)

Xcode

If your project is not managed by Swift Package Manager, you can still integrate swift-xml using Xcode:

  1. Open your project in Xcode.
  2. Choose File > Add Package Dependencies…
  3. Enter the repository URL: https://github.com/zijievv/swift-xml
  4. When prompted, select version 0.1.0 (or a compatible range).
  5. Add the package to your project.
  6. In your app target’s General > Frameworks, Libraries, and Embedded Content, ensure the XML product is linked.

Xcode will manage fetching, building, and linking the package automatically, even if your project itself is not SPM-based.

What Is Supported Today

  • Parsing of well-formed XML documents and fragments, including elements, attributes, text, comments, CDATA sections, and processing instructions.
  • Structural validation in document mode: exactly one root element, matching start/end tags, whitespace-only text outside the root, and rejection of unclosed or stray tags.
  • Fragment mode parsing that permits multiple top-level nodes while still enforcing balanced nesting.
  • DOM construction with stable node identity, parent/owner propagation, and textContent aggregation.
  • Codable decoding of element/attribute data via keyed, unkeyed, and single-value containers that directly mirror DOM structure, plus encoding of tree-like structures and attributes (via XMLCodingKey.isAttribute).
  • Deterministic, unformatted serialization of DOM documents and fragments with basic escaping.

XMLDecoder Responsibilities

  • Schema agnostic by design: XMLDecoder never interprets element names such as <dict>, <key>, <array>, <true>, or domain-specific tags.
  • Provides only structural traversal—keyed/unkeyed/single-value containers are derived solely from the DOM tree.
  • Scalar decoding reads text or CDATA content (including concatenated descendant text for elements) into String, Bool, and numeric primitives. Booleans are parsed from textual tokens (for example true / false, and other textual representations currently supported by the decoder) rather than from element or attribute names.
  • Any higher-level schema semantics (plist pairing, custom collection shapes, sentinel elements, etc.) must be implemented manually in Decodable types by walking the containers produced by the decoder.

Public APIs that are currently inert or unsupported:

  • XMLDecoder.keyDecodingStrategy is stored but not applied.

  • Namespace handling is structural/DOM-level only; higher-level decoding remains prefix-agnostic.

  • Namespace-aware lookup and decoding/encoding. Namespace URI resolution and default namespace handling are implemented at the DOM layer, but higher-level APIs remain prefix-agnostic.

  • Entity and character reference expansion; sequences such as &amp; remain literal text.

  • DOCTYPE exposure, DTD validation, attribute defaulting, ID/IDREF typing, or parameter entity processing.

  • XML declaration parsing beyond duplicate detection, encoding detection, character-set validation, or CR/LF normalization.

  • Writer-side features such as XML declaration emission, DOCTYPE serialization, pretty printing, or validation of CDATA edge cases.

XML 1.0 Compliance Note

This library currently implements only a subset of XML 1.0 well-formedness rules. Detailed subsystem-by-subsystem coverage, including partial and missing features, is documented in XML-1.0-Compliance.md, which serves as the authoritative reference.

Roadmap / TODO (Non-Binding)

  1. Namespace-aware lookup and Codable integration so decoding/encoding can differentiate elements and attributes by namespace URI rather than prefix.
  2. Intended entity and character reference handling, including expansion of &name; and numeric references prior to tree construction.
  3. Namespace-aware decoder and encoder behavior (attribute vs element disambiguation, key decoding strategy wiring, mixed-content detection).
  4. Optional, non-core query or dynamic access helpers built on top of the DOM for convenience use cases beyond Codable.
  5. XML declaration parsing plus writer improvements for emitting declarations, DOCTYPEs, and additional escaping safeguards.

Non-Goals

  • Full DTD or XML Schema validation; structural well-formedness is the extent of current checking.
  • External entity resolution or network/file loading for DTDs and entity bodies.
  • Broad XPath/XQuery engines or other complex query languages on top of the DOM.

License

This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.