Skip to content

feat(context): add --xsd-anyuri-as-iri flag for IRI type coercion#3

Open
jdsika wants to merge 1 commit intomainfrom
fix/uri-context-type
Open

feat(context): add --xsd-anyuri-as-iri flag for IRI type coercion#3
jdsika wants to merge 1 commit intomainfrom
fix/uri-context-type

Conversation

@jdsika
Copy link
Copy Markdown

@jdsika jdsika commented Mar 26, 2026

Summary

Add an --xsd-anyuri-as-iri flag to the JSON-LD context generator that emits "@type": "@id" for xsd:anyURI slots instead of the default "@type": "xsd:anyURI".

Problem

JSON-LD processors treat xsd:anyURI values as plain strings. When a LinkML slot has range uri (mapped to xsd:anyURI), the generated context marks it as a string type. Downstream RDF consumers then cannot distinguish IRIs from string literals.

Solution

When --xsd-anyuri-as-iri is set, the context generator checks whether a slot's range resolves to xsd:anyURI and, if so, emits "@type": "@id" to ensure the value is interpreted as an IRI reference during JSON-LD expansion.

Testing

Existing tests pass. The flag defaults to False, preserving current behaviour.

@jdsika jdsika force-pushed the fix/uri-context-type branch 4 times, most recently from 5e372a8 to d279be0 Compare April 2, 2026 10:14
@jdsika
Copy link
Copy Markdown
Author

jdsika commented Apr 2, 2026

🔍 Adversarial Review — PR #3

Summary

Well-structured feature with clean opt-in design. The JSON-LD context generator changes are solid and well-tested. However, the OWL generator changes introduce a cross-generator inconsistency for curie types, the transform_class_slot_expression path produces a None expression that falls through to a owl:allValuesFrom owl:Thing restriction (intentional?), and several edge-case types lack test coverage.


🐛 Bugs & Issues

1. Cross-generator inconsistency for curie type ranges

The OWL generator uses is_uri_range(sv, range) which checks CURIE_TYPES = {"uriorcurie", "curie"} and URI_TYPES = {"uri"}. The curie type maps to xsd:string (not xsd:anyURI). With the flag enabled:

Generator curie handling Result
OWL is_uri_range()True owl:ObjectProperty (IRI node)
JSON-LD xsd:string not in URI_RANGES_WITH_XSD No @type (string)
SHACL xsd:string not in _NON_LITERAL_TYPE_URIS sh:nodeKind sh:Literal

All three generators disagree on curie when the flag is set. The OWL generator says IRI, SHACL says Literal, JSON-LD says string. The flag name --xsd-anyuri-as-iri implies xsd:anyURI only, but the OWL generator catches more than that via is_uri_range().

Fix: Either (a) use a URI-check consistent with the JSON-LD context generator (checking the resolved type_uri against XSD.anyURI rather than calling is_uri_range()), or (b) extend URI_RANGES_WITH_XSD to also include XSD.string for curie types (but this would be semantically wrong). Option (a) is cleaner — it keeps all generators aligned on the same xsd:anyURI criterion.

2. transform_class_slot_expression returns None for URI ranges, stored under node_owltypes[None]

When the flag is set and range is URI-like, no owl_exprs are appended. Then:

this_expr = self._intersection_of(owl_exprs=[], owl_types={OWL.Thing})

_boolean_expression returns None when len(exprs) == 0 (line ~1374). This means:

  • self.node_owltypes[None].update({OWL.Thing}) — stores type info under a None key
  • The None propagates up to transform_class_definition where if not x: falls through to x = OWL.Thing (line ~633)
  • This produces owl:allValuesFrom owl:Thing — a vacuous restriction

This is technically valid OWL but: (a) the None key in node_owltypes can collide with other None returns from unrelated slots, contaminating their type inference; (b) the vacuous owl:Thing restriction adds noise to the ontology. Is this intentional, or should a URI-range slot with the flag simply emit no restriction at all?


⚠️ Concerns

1. Semantic validity of treating xsd:anyURI as IRI

W3C RDF 1.1 §3.3 defines xsd:anyURI as a datatype (literal). Overriding this is a pragmatic choice — JSON-LD processors treat xsd:anyURI values as plain strings, and SHACL already emits sh:nodeKind sh:IRI. The PR docstrings justify this well, but this is a deliberate semantic override that downstream consumers may not expect. Consider adding a note to the generated artifacts (e.g., OWL comment annotation) when the flag is active.

2. No rdfs:range for URI-range ObjectProperties

When the flag is set, URI-range slots become owl:ObjectProperty with no rdfs:range restriction. This is by design per the docstring, but it means OWL reasoners cannot infer anything about the values of these properties. Class-range ObjectProperties get rdfs:range ex:ClassName, but URI-range ObjectProperties get nothing. For consistency, consider emitting rdfs:range owl:Thing explicitly.

3. _range_is_datatype returning False may have side effects

The modified _range_is_datatype returns False for URI ranges when the flag is set. This prevents _range_uri from being called for these slots in some paths. Verify that all downstream code that checks _range_is_datatype handles the False return correctly when the slot's range is still technically a type (not a class or enum).

4. PR title scope mismatch

Title says feat(context): but the PR also modifies the OWL generator significantly (+47/-9 lines). Consider feat(context,owl): or feat(generators):.


🧪 Test Coverage Assessment

The three tests are well-structured and cover the primary happy paths. Gaps:

Missing Test Risk Priority
uriorcurie range with flag uriorcurie also maps to xsd:anyURI — should verify same behavior as uri High
curie range with flag Exposes the cross-generator inconsistency bug above High
SHACL consistency test Verify SHACL still emits sh:IRI and doesn't conflict with OWL output Medium
CLI integration test Verify --xsd-anyuri-as-iri flag is wired correctly through Click Medium
_range_is_datatype unit test Verify False return for URI ranges doesn't break downstream Medium
OWL output for any_of with URI + class JSON-LD has test_xsd_anyuri_as_iri_with_any_of but OWL doesn't Medium
Inherited/custom URI types is_uri_range checks ancestors — verify a custom type inheriting from uri works Low
type_objects=True interaction OWL generator with both type_objects=True and xsd_anyuri_as_iri=True Low

📋 Fix Plan

  1. [Bug Fix] Align OWL generator's URI-range check with JSON-LD context generator — either check the resolved type_uri against XSD.anyURI instead of calling is_uri_range(), or guard is_uri_range() with an additional type_uri check to exclude curie (which maps to xsd:string)
  2. [Bug Fix] Handle the None return from _intersection_of when owl_exprs is empty — consider returning early from transform_class_slot_expression with a sentinel instead of letting None pollute node_owltypes
  3. [Test] Add uriorcurie range test for both JSON-LD and OWL generators
  4. [Test] Add curie range test that documents the expected cross-generator behavior
  5. [Test] Add SHACL consistency assertion (e.g., verify sh:nodeKind sh:IRI is emitted for uri range regardless of flag)
  6. [Docs] Update PR title scope to reflect OWL changes

✅ What's Good

  • Clean opt-in design — defaults to False, zero behavioral change for existing users
  • Excellent docstrings with W3C spec references and cross-generator alignment notes
  • _literal_coercion_for_ranges change is surgical — correctly handles any_of mixed branches
  • is_uri_range reuse from common/subproperty — shares the SHACL generator's type-ancestry logic instead of reimplementing
  • Test for any_of mixed branches — this is a non-obvious edge case and the test catches it well
  • Both generators get CLI flags — consistent UX across gen-jsonld-context and gen-owl

… consistency

JSON-LD processors treat xsd:anyURI as an opaque string literal,
so range:uri/uriorcurie slots get xsd:anyURI coercion instead of
proper IRI node semantics (@type:@id, owl:ObjectProperty, sh:IRI).

Add an opt-in --xsd-anyuri-as-iri flag that promotes xsd:anyURI ranges
to IRI semantics across all three generators:

  - JSON-LD context: @type: xsd:anyURI → @type: @id
  - OWL: DatatypeProperty → ObjectProperty (no rdfs:range restriction)
  - SHACL: sh:datatype xsd:anyURI → sh:nodeKind sh:IRI

The flag only affects types whose XSD mapping is xsd:anyURI (uri and
uriorcurie). The curie type (xsd:string) is correctly excluded via
is_xsd_anyuri_range() to maintain cross-generator consistency.

Standards basis:
  - OWL 2 §5.3-5.4 (ObjectProperty vs DatatypeProperty)
  - SHACL §4.8.1 (sh:nodeKind sh:IRI)
  - JSON-LD 1.1 §4.2.2 (type coercion with @id)
  - RDF 1.1 §3.2-3.3 (IRIs as first-class nodes, not string literals)

Signed-off-by: jdsika <carlo.van-driesten@bmw.de>
@jdsika jdsika force-pushed the fix/uri-context-type branch from 9e4b11f to acc2df3 Compare April 2, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant