fix(generators): add --normalize-prefixes flag for well-known prefix names#4
fix(generators): add --normalize-prefixes flag for well-known prefix names#4
Conversation
42e007b to
71db97f
Compare
60f62c4 to
1b36dc0
Compare
🔍 Adversarial Review — PR #4SummarySolid feature with thorough test coverage, but contains a confirmed race condition between Phase 1 and Phase 2 of 🐛 Bugs & Issues1. Phase 2 overwrites Phase 1 normalization for HTTP/HTTPS variants (Confirmed) When a schema declares
Reproduced locally with rdflib 7.x: from rdflib import Graph, Namespace, URIRef
g = Graph()
g.bind('schema', Namespace('http://schema.org/')) # metamodel
g.namespace_manager.bind('sdo', URIRef('https://schema.org/')) # user
# Phase 1
g.bind('schema', Namespace('https://schema.org/'), override=True, replace=True)
# → schema: https://schema.org/ ✅, schema1: http://schema.org/ (orphan)
# Phase 2
g.bind('schema', Namespace('http://schema.org/'), override=True, replace=True)
# → schema: http://schema.org/ ❌ (HTTPS binding silently lost!)Fix: Phase 2 should check whether # In Phase 2, before rebinding:
current_ns_for_std = dict(graph.namespaces()).get(std_pfx)
if current_ns_for_std and str(current_ns_for_std) != ns_str:
continue # std_pfx already claimed by Phase 1 for a different URI2. The test asserts
|
| PR #1 | PR #4 | |
|---|---|---|
| Approach | Dynamic: {str(ns): str(pfx) for pfx, ns in RdfGraph().namespaces()} |
Static: MappingProxyType({...}) with ~30 entries |
| rdflib dependency | Changes with rdflib version | Frozen, version-independent |
| HTTP/HTTPS variants | Only what rdflib provides | Explicitly includes both |
PR #4's static approach is explicitly designed to solve a problem with PR #1's dynamic approach (rdflib defaults changed between 6.x and 7.x). These will conflict on merge. Recommendation: merge PR #4 first, or rebase PR #1 to use PR #4's static map.
4. Cross-PR conflict — flag forwarding in jsonldgen.py
PR #4 forwards both normalize_prefixes AND deterministic to the embedded ContextGenerator:
for flag in ("normalize_prefixes", "deterministic"):
if hasattr(self, flag):
context_kwargs.setdefault(flag, getattr(self, flag))PR #1 adds deterministic handling in a different location (post-serialization in end_schema). PR #4's forwarding of deterministic creates a dependency on PR #1's field existing. The hasattr guard prevents a crash, but semantically PR #4 shouldn't forward a flag it doesn't define. Either:
- Remove
"deterministic"from the forwarding loop (PR fix(generators): add --normalize-prefixes flag for well-known prefix names #4 doesn't own this flag), or - Document the intentional coupling with PR feat(generators): add --deterministic flag with hybrid RDFC-1.0 + rdflib serialization #1 and establish merge order
5. PR title says fix: but commit says feat: — which is it?
The commit message uses feat(generators): while the PR title uses fix(generators):. This is a new opt-in flag, which is a feature — feat: is correct per conventional commits.
6. Static map maintenance burden
The _WELL_KNOWN_PREFIX_MAP needs manual updates when rdflib adds new curated defaults. The test_matches_rdflib_defaults test catches additions but not removals. Consider adding a reverse check: entries in the static map that are NOT in rdflib's defaults could indicate stale entries.
🧪 Test Coverage Assessment
Strong coverage (18 tests across 7 classes):
- ✅ OWL, SHACL, JSON-LD context normalization (sdo→schema, dce→dc)
- ✅ HTTP/HTTPS schema.org variants
- ✅ Collision detection with warnings (caplog)
- ✅ Cross-generator consistency
- ✅ Static map integrity and copy safety
- ✅ JSONLDGenerator flag forwarding
- ✅ Flag-off backward compatibility
Gaps:
- ❌ No test for
normalize_prefixes+exclude_importsinteraction - ❌ No test for empty schema (zero prefixes) — is
_prefix_remap.clear()safe? - ❌ No test that explicitly sets up both HTTP and HTTPS runtime bindings to exercise Phase 2 overwrite (the scenario from Bug feat(generators): add --deterministic flag with hybrid RDFC-1.0 + rdflib serialization #1)
- ❌ No test for
add_prefixbeing called before__post_init__completes (defensive, but_prefix_remapaccess would crash) - ❌ CURIE remapping test (
test_normalize_prefixes_curie_remapping) only checks thatPerson @idstarts withschema:— doesn't verifyfull_nameslot's@idusesschema:nameinstead ofsdo:name
📋 Fix Plan
- Fix Phase 2 overwrite bug — add a graph-binding check before rebinding in Phase 2 to prevent overwriting Phase 1's normalization
- Add regression test — explicitly set up a graph with both
schema: http://schema.org/andsdo: https://schema.org/, run normalize, and assert the HTTPS binding survives - Remove
"deterministic"from forwarding loop — or gate it behindhasattronly (already done) but add a comment explaining the cross-PR dependency - Fix PR title — change
fix:tofeat:for consistency with the commit - Add empty-schema test — verify no crash when schema has zero prefixes
- Add
exclude_importsinteraction test — verify normalization works when imports are excluded - Strengthen CURIE remapping test — assert that slot URIs (not just class URIs) are remapped in the context
✅ What's Good
- Excellent documentation: the 3-case handling in
visit_schemais well-commented with case numbers - Defensive
_prefix_remapinitialization: correctly placed beforesuper().__post_init__()with clear comment explaining why - Collision detection: warning on skip is the right behavior — no silent data loss
MappingProxyTypefor the static map: prevents accidental mutationwell_known_prefix_map()returns a copy: callers can't corrupt the canonical map- Phase 2 concept is sound: cleaning up runtime-injected orphan bindings is the right idea, it just needs the overwrite guard
- Cross-generator consistency test: ensures OWL, SHACL, and JSON-LD agree — excellent
- Backward compatibility: flag defaults to off, existing behavior preserved
… names Add an opt-in --normalize-prefixes flag to OWL, SHACL, and JSON-LD Context generators that normalises non-standard prefix aliases to well-known names from a static prefix map (derived from rdflib 7.x defaults, cross-checked against prefix.cc consensus). Key design decisions: - Static frozen map (MappingProxyType) instead of runtime Graph().namespaces() lookup eliminates rdflib version dependency - Both http://schema.org/ and https://schema.org/ map to 'schema' - Shared normalize_graph_prefixes() helper used by OWL and SHACL - Two-phase graph normalisation: Phase 1 normalises schema-declared prefixes, Phase 2 cleans up runtime-injected bindings - Collision detection: skip with warning when standard prefix name is already user-declared for a different namespace - Phase 2 guard prevents overwriting HTTPS bindings with HTTP variants The flag defaults to off, preserving existing behaviour. Tests cover OWL, SHACL, and context generators with sdo->schema, dce->dc, http/https edge case, custom prefix preservation, flag-off backward compatibility, cross-generator consistency, prefix collision detection, schema1 regression prevention, Phase 2 HTTPS guard, empty schema edge case, and static map integrity. Signed-off-by: jdsika <carlo.van-driesten@bmw.de>
d9fb738 to
dd0ce0c
Compare
Summary
Add an opt-in
--normalize-prefixesflag to all generators that normalises non-standard prefix aliases to well-known names from a static, version-independent prefix map (derived from rdflib 7.x defaults, cross-checked against prefix.cc consensus).Motivation
Schemas often declare custom prefix aliases for well-known namespaces (e.g.
sdoforhttps://schema.org/,dceforhttp://purl.org/dc/elements/1.1/). When generated artifacts are consumed by tools that expect conventional prefix names, this causes friction. This flag allows opt-in normalization without changing default behavior.Changes
generator.py— Static prefix map and shared helper_WELL_KNOWN_PREFIX_MAP: Frozendictliteral of ~30 well-known namespace-to-prefix mappings. Eliminates rdflib version dependency (defaults changed between 6.x and 7.x). Includes bothhttp://schema.org/andhttps://schema.org/→schema(linkml-runtime uses HTTP, W3C prefers HTTPS).well_known_prefix_map(): Returns a copy of the static map (no longer createsGraph()at runtime).normalize_graph_prefixes(graph, schema_prefixes): Shared helper that rebinds non-standard aliases in an rdflib Graph to their standard names. Usesgraph.bind(..., override=True, replace=True)for clean rebinding.normalize_prefixesfield onGeneratorbase class + CLI flag.owlgen.py— Explicit post-bind normalisationoverride=True, the default), then callsnormalize_graph_prefixes()when flag is set.override=Falseapproach which made all 29 rdflib defaults sticky (not just well-known ones).shaclgen.py— Same explicit approachnormalize_graph_prefixes().jsonldcontextgen.py— Three-case remap logic_prefix_remapin__post_init__()(replaces defensivegetattr)._build_element_id().Test Coverage
New
test_normalize_prefixes.pywith 18 tests across 5 test classes:TestOwlNormalizePrefixesTestShaclNormalizePrefixesTestContextNormalizePrefixesTestWellKnownPrefixMapTestCrossGeneratorConsistencyPlus 3 existing context-gen tests retained unchanged.
Backward Compatibility
The flag defaults to off — existing behavior is fully preserved. No snapshot updates required.