Skip to content

feat: SPARQL builder spike — design exploration and recommendation#4137

Draft
BalduinLandolt wants to merge 18 commits into
mainfrom
feature/sparql-builder-spike
Draft

feat: SPARQL builder spike — design exploration and recommendation#4137
BalduinLandolt wants to merge 18 commits into
mainfrom
feature/sparql-builder-spike

Conversation

@BalduinLandolt

Copy link
Copy Markdown
Contributor

What this is

A design spike (not production-ready) exploring how to replace dsp-api's three
fragmented SPARQL-generation patterns — Twirl templates, hand-built RDF4J SparqlBuilder
code, and raw string concatenation — with one small, safe, composable query-building library.

The branch was rebased onto current main (it had fallen ~143 commits behind) and its
documentation was consolidated from ~5,200 lines to ~970.

What's here

  • Prototype module modules/sparql-builder/ — the recommended API (Fragment monoid +
    sparql"..." interpolator + typed Iri/Variable/Literal), wired into build.sbt.
    Compiles under Scala 3.3.7; carries 59 passing tests including an injection-safety spec.
  • Consolidated docs under docs/sparql-builder-approaches/:
    • README.md — index
    • decision.md — design space, comparison matrix, recommendation, Phase 2 next steps
    • recommended-approach.md — the chosen API with two worked benchmarks + design notes
    • alternatives-considered.md — the rejected approaches and why each lost
    • reference-sparql.md — the 6 benchmark queries
    • ../sparql-inventory.md — inventory of existing generation sites (migration input)

Recommendation

An interpolated-template API backed by RDF4J escaping — reads like raw SPARQL,
prevents injection by construction, adds no new dependency, and was the only prototype to
cover all six benchmark queries.

Status / scope

  • Phase 1 (design exploration): complete. The API is chosen and prototyped.
  • Phase 2 (productionisation): not started. The builder is not yet wired into any real
    query code; the core types still use custom escaping rather than the recommended RDF4J
    escaping. See decision.md → "Next steps (Phase 2)".

Opened as a draft for review of the direction before committing to Phase 2.

🤖 Generated with Claude Code

BalduinLandolt and others added 17 commits June 2, 2026 09:13
Categorize all ~64 SPARQL generation sites in the codebase by pattern
(RDF4J builder, hybrid string interpolation, Twirl templates, graph
management), query type, and complexity. This inventory supports the
SPARQL builder library spike (Phase 1, Step 1).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…olator)

Add modules/sparql-builder/ sbt subproject with Doobie-inspired Fragment
type and sparql"..." string interpolator. Includes:
- Core types: Iri, Variable, Literal (sealed trait hierarchy)
- Fragment type with monoid composition (++)
- sparql"..." interpolator with type-safe value rendering
- Combinators: optional, union, graph, filterNotExists, minus, bind, values
- Query builders: SELECT, CONSTRUCT, ASK, UPDATE, INSERT DATA
- Tests demonstrating all benchmark queries from the spike plan

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ApproachBSpec demonstrating an AST-based alternative to the Fragment
interpolator. Shows the same benchmark queries (IsNodeUsedQuery, simple
SELECT, conditional patterns, iteration) implemented with typed AST nodes
(TriplePattern, GraphPattern enum). Demonstrates that both approaches can
coexist — AST for structure, Fragment for flexibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Define "injection-safe by construction" with tests for SPARQL injection
prevention (string escaping, IRI wrapping, variable safety), Lucene
injection prevention, and Fragment.raw escape hatch documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Compare Fragment+interpolator (Approach A) vs AST case classes (Approach B)
against 6 benchmark queries. Recommend Approach A as foundation with option
to add AST nodes later. Includes injection safety model, comparison matrix,
and migration considerations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- C: Fluent immutable builder (no interpolator, method chaining)
- D: String interpolator + RDF4J escaping (implementation strategy swap)
- E: Thin Scala 3 wrapper over Jena ARQ QueryBuilder (mutable Java API)
- F: Template + bind via Jena ParameterizedSparqlString

Findings: RDF4J escaping matches custom escaping but also handles \f, \b,
and single quotes. Jena PSS is weakest for conditional/iteration patterns
(requires string concat of template). Jena QueryBuilder produces validated
Query AST but mutable API clashes with FP style.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update comparison matrix to include approaches C (fluent builder),
D (RDF4J escaping), E (Jena wrapper), F (template+bind). Key finding:
RDF4J escaping covers more edge cases than custom code. Jena approaches
fight Scala idioms. Recommend A's API + D's escaping strategy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create 6 markdown documents in docs/sparql-builder-approaches/ for
side-by-side comparison of SPARQL builder API designs:

- reference-sparql.md: fixed parameter values and target SPARQL for all
  6 benchmark queries
- approach-a-interpolator.md: sparql"..." string interpolator + Fragment
- approach-b-ast.md: AST case classes + typed rendering
- approach-c-fluent-builder.md: fluent immutable builder with triple()
- approach-e-jena-wrapper.md: thin wrapper over Jena ARQ QueryBuilder
- approach-f-template-bind.md: Jena ParameterizedSparqlString

Each document shows all 6 benchmarks with the approach's API alongside
the plain SPARQL for comparison. Approach D is merged into A's document
as a note on escaping strategy (identical API surface).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New approaches identified during user design review of the original 5:

- approach-g-scala-template-bind.md: immutable Scala template+bind API
  (inspired by F but idiomatic Scala, not Jena)
- approach-h-hybrid-interpolator-template.md: sparql"..." interpolator
  for entire multi-line queries (needs feasibility check)
- approach-c-variant-consequent-fluent.md: fluent triple chaining with
  .and(), .andOptional(), .andAll() (extends C's builder concept)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Approach A: add design review feedback section (prefer tp() over
  sparql"...", triple().optional(), bulk prefixes, safety note)
- Approach C: rename FluentSelect/Ask/Update to Select/Ask/Update,
  add design review feedback section
- Move eliminated approaches (B, E, F) to eliminated/ subdirectory

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Confirmed feasibility: existing sparql"..." interpolator already
  supports multi-line templates via sparql"""..."""
- Added Prefix type (extends SparqlValue, renders as "name: <ns>")
  enabling sparql"PREFIX $kb" syntax
- Updated all benchmark examples to use PREFIX $kb instead of raw
  string interpolation in prefix declarations
- Removed feasibility concern notes, replaced with confirmation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update the unified Interpolated Template approach document with decisions
from the ongoing design review walkthrough:

- Rename Iri.trusted → Iri.unsafeFrom (matching codebase convention)
- Introduce Prefix type with unsafeFrom/unsafeIri for deriving IRIs
- Design Literal API: type-safe constructors (bool, int, instant) and
  string-based escaped/unescaped pairs (stringEscaped, typedEscaped)
- Add LanguageTag opaque type for BCP 47 language tags
- Add sp"..." as short alias for sparql"..." interpolator
- Add builder middle-ground variant (multi-line WHERE fragment) to
  benchmarks 1-3
- Add noted.md tracking open items from the review walkthrough
- Move eliminated approaches to eliminated/ subdirectory

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The knora-base link/non-link property duality is a major source of
complexity that must be considered in every code path touching
properties or values. Documents the convention, its impact, and
key code locations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Benchmark 3: add domain context explaining link value property convention
- Benchmark 6: apply Iri.unsafeFrom, prefix-derived IRIs,
  Literal.stringEscaped/typedEscaped, add builder multi-line variant
- Update noted.md to reflect all benchmarks now reviewed
- Mark Fragment.raw known issue as resolved via PropertyPath/jenaTextQuery

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 13 verbose approach documents (~5,200 lines) with 5 focused docs
(~970 lines): a README index, a decision summary, the recommended approach
(trimmed to two worked benchmarks and consolidated design notes), and a
single alternatives-considered document.

The previous docs re-rendered the same six benchmark queries across nine
separate approach files, and the recommended approach showed each benchmark
in three styles. The new set keeps the full reasoning while removing the
repetition. Eliminated/* and the per-approach showcases are removed (history
preserved in git); reference-sparql.md and sparql-inventory.md are retained.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@BalduinLandolt BalduinLandolt self-assigned this Jun 2, 2026
Keep the discarded approaches only in docs/sparql-builder-approaches/alternatives-considered.md;
remove all code that prototyped them.

- Delete the spec files for the rejected approaches: AST case classes, fluent
  builder, the Jena ARQ QueryBuilder wrapper, and Jena ParameterizedSparqlString.
- Drop the jena-arq and jena-querybuilder dependencies, which existed solely to
  enable the two Jena-based approaches.
- Rename the retained specs to drop the now-orphaned "Approach A/D" lettering:
  ApproachASpec -> SparqlInterpolatorSpec, ApproachDSpec -> Rdf4jEscapingSpec.

The module compiles and its 32 remaining tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant