Skip to content

feat(generators): add --exclude-external-imports flag#2

Open
jdsika wants to merge 6 commits intomainfrom
fix/contextgen-skip-external-imports
Open

feat(generators): add --exclude-external-imports flag#2
jdsika wants to merge 6 commits intomainfrom
fix/contextgen-skip-external-imports

Conversation

@jdsika
Copy link
Copy Markdown

@jdsika jdsika commented Mar 26, 2026

Summary

Add a --exclude-external-imports flag to the JSON-LD context generator that suppresses terms from URL-based external vocabulary imports, keeping only locally defined classes and slots.

Problem

When a schema imports an external vocabulary via URL (e.g. W3C Verifiable Credentials), all imported terms leak into the generated JSON-LD context — even with --no-mergeimports. This is problematic when the external vocabulary defines @protected terms in its own context (JSON-LD 1.1 §4.1.11): redefining them locally violates the protection semantics.

Solution

When --exclude-external-imports is set, the context generator identifies classes and slots from HTTP(S)-based imports via SchemaView.schema_map and skips them during visit_class / visit_slot. Local file imports and linkml standard imports are preserved.

Changes

  • jsonldcontextgen.py: Add exclude_external_imports field, _collect_external_elements() static helper, and filtering in visit_class() / visit_slot(). Add CLI option.
  • test_jsonldcontextgen.py: Add parametrized test covering both merge modes, verifying local elements preserved and external elements excluded.

Backward Compatibility

The flag defaults to off — existing behavior is fully preserved.

@jdsika jdsika force-pushed the fix/contextgen-skip-external-imports branch 5 times, most recently from 5df92ed to ff5ed11 Compare April 2, 2026 09:35
@jdsika
Copy link
Copy Markdown
Author

jdsika commented Apr 2, 2026

🔍 Adversarial Review — PR #2

Summary

Solid feature with a clear, well-documented use-case (JSON-LD 1.1 @protected term conflicts). The implementation is clean and follows existing patterns (exclude_imports). However, there are test coverage gaps and a few design concerns worth discussing before merge.

Verdict: Approve with suggestions — no blocking bugs found, but the test suite should be strengthened.


🐛 Bugs & Issues

After thorough analysis, no hard bugs were identified. Several initially suspicious areas were verified safe:

  1. linkml:types false-positive — NOT a bug. The linkml: CURIE prefix is resolved via default_import_map to a local file path. In schema_map, the key is stored as 'linkml:types' (the literal CURIE string), not as a URL. The startswith("http://") check correctly skips it.

  2. Enum/Type leak-through — NOT a bug (by design). The ContextGenerator does not emit standalone @context entries for enums or types. Enums only influence slot definitions (via SKOS @context sub-objects), and types only affect @type coercion of referencing slots. Since external slots referencing those enums/types are already filtered out, no external enum/type artifacts appear in the output.

  3. None safety on _external_classes/_external_slots — safe. Python's short-circuit and prevents evaluation of cls.name in self._external_classes when self.exclude_external_imports is False.

  4. Transitive external imports — handled correctly. imports_closure() flattens the full transitive graph into schema_map, so _collect_external_elements captures elements from deeply nested URL-based imports.


⚠️ Concerns

  1. Both flags simultaneously (exclude_imports + exclude_external_imports). Setting both is semantically redundant — exclude_imports is strictly more aggressive (local-only) and dominates the check order (line 233 returns False before line 235 is reached). Consider either:

    • Adding a click.UsageError if both are set, or
    • Documenting the precedence explicitly in the docstring/--help.
  2. External prefixes still emitted. When exclude_external_imports=True, the @context will still contain prefix declarations from external schemas (e.g. "ext": "https://example.org/external-vocab/"). This is arguably correct (prefixes are needed for URI expansion of any remaining references), but it may surprise users who expect a "clean" local-only context. Worth a sentence in the docstring.

  3. URI scheme heuristic is narrow. Only http:// and https:// are checked. Imports using urn:, did:, or other URI schemes would slip through. This is probably fine for the 99% case, but the docstring should note this limitation explicitly — currently it says "URL-based" which is correct, but "URI-based" would be the user's mental model.

  4. No other generators support this flag. The OWL and SHACL generators don't have exclude_external_imports. If someone uses the same schema with multiple generators, only the JSON-LD context will exclude external terms. This is fine as a first step (the JSON-LD @protected use-case is specific), but consider documenting that this is context-generator-specific.

  5. _collect_external_elements calls sv.imports_closure() as a side-effect. This is safe (it's idempotent and already called during __post_init__ setup), but annotating the method docstring with this fact would improve clarity.


🧪 Test Coverage Assessment

What's tested well:

  • Core functionality: local terms present, external terms absent ✅
  • Both mergeimports modes via parametrization ✅
  • Import map resolution for URL-to-local mapping ✅

Gaps:

Gap Risk Recommendation
No test that linkml:types slots (e.g. string, date) survive filtering Medium Add assertion that standard linkml types used by local slots still produce correct @type coercions
mergeimports=False parametrization doesn't verify "no effect" claim Low The test asserts external terms are absent, but doesn't verify the output is identical with and without the flag. Add a comparison: output_with_flag == output_without_flag when mergeimports=False
No test combining both exclude_imports=True and exclude_external_imports=True Low Verify no crash and that exclude_imports dominates
No CLI invocation test Low Add a CliRunner test that passes --exclude-external-imports and verifies it reaches the generator
No test for empty external schema Low Edge case: external schema with no classes/slots should produce empty sets, not errors
No test for local file import preservation Medium Add a second local-file import (not URL-based) and verify its elements ARE included
No test for transitive external imports Low External schema A imports external schema B — verify B's elements are also excluded

📋 Fix Plan

Before merge (recommended):

  1. Add a local-file-import preservation test. The current test only has one URL import. Add a local file import alongside it and assert its elements are included. This is the core differentiator from exclude_imports and should be tested explicitly.

  2. Add mergeimports=False equivalence assertion. When mergeimports=False, generate output both with and without exclude_external_imports and assert they're identical. This directly tests the docstring's "no effect" claim.

  3. Add linkml:types survival assertion. In the existing test, assert that a local slot with range: string still gets correct @type coercion (e.g. "xsd:string"), proving standard imports aren't filtered.

Post-merge / follow-up (nice-to-have):

  1. Expand the docstring to note that http(s):// is the detection heuristic (not arbitrary URIs), and that external prefixes are intentionally preserved.

  2. Consider a click.UsageError or warning when both --exclude-imports and --exclude-external-imports are set.

  3. Add CLI roundtrip test via CliRunner.


✅ What's Good

  • Clean design — follows the established exclude_imports pattern exactly, making the code predictable and maintainable.
  • Correct schema_map traversal — the static method is well-structured, idempotent-safe, and handles the transitive closure correctly.
  • Excellent docstring — clearly explains the @protected use-case, the interaction with mergeimports=False, and the scope of the filtering.
  • Backward compatible — defaults to False, zero behavior change for existing users.
  • CLI integration — proper boolean flag pair with --no- prefix for explicit opt-out.

Add a dedicated --exclude-external-imports / --no-exclude-external-imports
CLI flag to control whether external vocabulary terms are included in
generated artifacts when --no-mergeimports is set.

Previously external terms leaked into JSON-LD contexts even with
--no-mergeimports. The new flag explicitly suppresses terms whose
class_uri or slot_uri belong to an imported (external) schema.

Tests cover linkml:types built-in import preservation, local file
import preservation, and interaction with mergeimports=False.

Signed-off-by: jdsika <carlo.van-driesten@bmw.de>
@jdsika jdsika force-pushed the fix/contextgen-skip-external-imports branch from 8553afa to d1adeeb Compare April 2, 2026 15:24
jdsika and others added 5 commits April 2, 2026 22:46
The docstring incorrectly stated that the flag has no effect when
mergeimports=False. In reality, external vocabulary elements can still
leak into the context via the schema_map even in that mode, and the
flag actively catches them — as verified by the existing test
test_exclude_external_imports_works_with_mergeimports_false.

Replace the inaccurate note with a correct statement.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants