From 317981f48640385b7be8ca20c218c5e5659393f3 Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 13:51:14 +0200 Subject: [PATCH 01/20] docs(specs): add design for issue #99 consistency-to-lint consolidation Captures the brainstormed design before implementation begins: * Two-PR rollout: PR1 adds 6 new lint rules and reconciles 3 partial- overlap rules; PR2 removes consistency_service, its routes, the worker task, and the frontend Consistency tab. * Settled rule semantics: orphan-class keeps the loose definition, undefined-parent renames to dangling-ref and expands to cover rdfs:domain and rdfs:range, duplicate-label becomes case-insensitive + same-type + all entity types. * Level placements per the issue body (orphan-individual and deprecated-parent at L2; the other four at L4). * Out of scope: discussion #87's rdflib-vs-SQL question, the duplicate detection pipeline, the cross-references endpoint. Two reconciliation choices remain pending damienriehl's input on the issue thread; this spec reflects the working decisions and will be revised if those decisions change. Co-Authored-By: Claude Opus 4.7 (1M context) --- ...onsolidate-consistency-into-lint-design.md | 143 ++++++++++++++++++ 1 file changed, 143 insertions(+) create mode 100644 docs/superpowers/specs/2026-05-03-issue-99-consolidate-consistency-into-lint-design.md diff --git a/docs/superpowers/specs/2026-05-03-issue-99-consolidate-consistency-into-lint-design.md b/docs/superpowers/specs/2026-05-03-issue-99-consolidate-consistency-into-lint-design.md new file mode 100644 index 00000000..e94e1bf9 --- /dev/null +++ b/docs/superpowers/specs/2026-05-03-issue-99-consolidate-consistency-into-lint-design.md @@ -0,0 +1,143 @@ +# Issue #99 — Consolidate Consistency Checks into the Lint Rule System + +**Status:** Design approved (2026-05-03), pending @damienriehl input on `orphan-class` and `dangling-ref` reconciliation choices (logged in [issue #99 comment](https://github.com/CatholicOS/ontokit-api/issues/99#issuecomment-4364982879)). + +**Issue:** [#99](https://github.com/CatholicOS/ontokit-api/issues/99) — milestone v0.4.0 +**Related:** [Discussion #87](https://github.com/CatholicOS/ontokit-api/discussions/87) (rdflib-vs-SQL — out of scope here) +**Prerequisite:** [PR #94](https://github.com/CatholicOS/ontokit-api/pull/94) per-project lint config — merged ✓ + +## Problem + +`ontokit/services/consistency_service.py` (12 rules, ephemeral Redis cache, all-or-nothing runner) and `ontokit/services/linter.py` (19 rules, progressive levels, per-project config, persisted in PostgreSQL, entity-type-aware) both check ontology quality with significant overlap: + +- 3 fully redundant rules (`cycle-detect`, `missing-label`, `missing-comment`) +- 3 partially overlapping rules with diverging semantics (`orphan-class`, `undefined-parent` ↔ `dangling-ref`, `duplicate-label`) +- 6 rules unique to consistency that don't exist in lint + +The lint system is the more capable of the two and is where new work should land. Consistency_service should go away; its remaining behaviors should move into the linter. + +## Scope + +In: backend rule consolidation, route + worker + service deletion, frontend Consistency-tab removal. + +Out (explicitly): the broader rdflib-graph-vs-SQL question raised in discussion #87, the duplicate-detection (`pg_trgm`) pipeline, the cross-references endpoint. + +## Approach + +Two PRs, in order: + +1. **PR1 — `feat(lint): consolidate consistency rules into linter`.** Adds 6 new rules and reconciles 3 partial-overlap rules in `linter.py`. `consistency_service.py` and its routes are left untouched, so both pipelines run side by side and we can validate parity in production before tearing the old one down. +2. **PR2 — `chore: remove consistency_service in favor of linter`.** Hard-removes the service, its routes, the worker task, and the frontend Consistency tab. Depends on PR1 being merged. + +## PR1 — Rule Consolidation + +### 6 new rules added to `LINT_RULES` + +| `rule_id` | Name | Severity | Scope | Level | Notes | +|-----------|------|----------|-------|-------|-------| +| `unused-property` | Unused Property | warning | `["property"]` | L4 (Quality) | Declared property never used as predicate (excluding own declaration triple) | +| `orphan-individual` | Orphan Individual | warning | `["individual"]` | L2 (Consistency) | Individual's `rdf:type` target is not declared as `owl:Class` in this ontology; one finding per (individual, undeclared-type) pair | +| `empty-domain` | Empty Domain | info | `["property"]` | L4 (Quality) | `owl:ObjectProperty` or `owl:DatatypeProperty` with no `rdfs:domain` | +| `empty-range` | Empty Range | info | `["property"]` | L4 (Quality) | Same for `rdfs:range` | +| `deprecated-parent` | Deprecated Parent | warning | `["class"]` | L2 (Consistency) | Class `subClassOf` a class with `owl:deprecated true` | +| `multi-root` | Multiple Root Classes | info | `[]` (ontology-scope) | L4 (Quality) | Fires once if >5 root classes; uses `subject_iri=None`, `subject_type="other"` | + +Severities are taken verbatim from the originals in `consistency_service.py` so canary parity holds. Level placements match the issue body's recommendations: `orphan-individual` and `deprecated-parent` are TBox-correctness issues (the ontology is meaningfully wrong) so they belong at L2; the other four are quality/style concerns that don't break the ontology, so they stay at L4. + +`LINT_LEVEL_DEFINITIONS` descriptions get updated: +- L2 description gains "deprecated parent classes, orphan individuals" +- L4 description gains "unused properties, empty domain/range, multi-root warnings" + +### 3 partial-overlap rules — reconciliation + +**`orphan-class`** — no code change. Keep the existing lint behavior (no parent + no children). The stricter "no instances" variant from `consistency_service.py` simply disappears with PR2's removal. Stays at L2. + +> Rationale: "instance" here means individuals declared with `rdf:type Class`, and a class without individuals isn't necessarily orphan-worthy. TBox-only ontologies, abstract classes, and taxonomies declared before being populated all legitimately have classes with no individuals. Flagging those would produce false positives. + +**`undefined-parent` → `dangling-ref`** — rename and expand. + +- Rename `rule_id` from `undefined-parent` to `dangling-ref` in `LINT_RULES`, `LINT_RULES_MAP`, all level sets, and all callsites. +- `name` becomes "Dangling Reference"; description becomes "Reference to a URI not defined in the ontology (in `subClassOf`, `rdfs:domain`, or `rdfs:range`)". +- The check scans all three predicates. The well-known-namespace + `owl:imports`-derived skiplist is ported over from `consistency_service._check_dangling_ref`. +- `details` payload gains a `predicate` field so the UI can show *which* axis triggered the dangling reference. +- Stays at L1 (Critical) — domain/range dangling refs are equally fatal to reasoning as subclass dangling refs. +- **Historical data**: existing `LintIssue` rows with `rule_id="undefined-parent"` are left in place (they are snapshots of past runs and lint results are regenerable). Documenting this in the PR description is sufficient. UI filters by `rule_id` will not find the old name; users can re-run lint to refresh. + +**`duplicate-label`** — broaden semantics. + +- `scope`: `["class"]` → `_ALL` (now applies to classes, properties, and individuals). +- Matching key: `label_lower` → `(entity_type, label_lower, lang)` — group case-insensitively, per entity type, per language. +- A finding is emitted for each member of any group of size ≥ 2; `details.duplicates` lists the other IRIs in the group; `subject_type` is set to the entity type of the duplicate. +- Stays at L3 (Labels). + +### Files touched in PR1 + +- `ontokit/services/linter.py` — add 6 rule definitions to `LINT_RULES`, add 6 check methods to `OntologyLinter`, rename `undefined-parent` → `dangling-ref`, update `_check_duplicate_label` matching, update `LINT_LEVELS` membership and `LINT_LEVEL_DEFINITIONS` descriptions. +- `tests/unit/test_linter.py` — add per-rule test classes for new rules, extend `duplicate-label` and `dangling-ref` tests, update level-membership assertions. + +`consistency_service.py`, `ontokit/api/routes/quality.py`, `ontokit/worker.py`, and the frontend are NOT touched in PR1. + +## PR2 — Removal and Cutover + +### Backend deletions + +- **Files:** `ontokit/services/consistency_service.py`, `tests/unit/test_consistency_service.py`. The `tests/unit/test_quality_worker.py` file stays — only the `run_consistency_check_task` test class (line 59 onward) is deleted; `run_duplicate_detection_task` tests remain. +- **Routes in `ontokit/api/routes/quality.py`:** + - `POST /{project_id}/quality/check` → `trigger_consistency_check` (lines 64–109) + - `GET /{project_id}/quality/jobs/{job_id}` → `get_quality_job_result` (lines 112–165) — currently consistency-only despite the generic-looking path + - `GET /{project_id}/quality/issues` → `get_consistency_issues` (lines 167–204) + - **Kept**: `/entities/{iri}/references` (cross-refs, unrelated), all `/quality/duplicates/*` routes, the `/quality/ws` WebSocket — only its docstring drops the "Consistency check starts / completes / fails" line; the WS itself is a generic pubsub forwarder. +- **Worker (`ontokit/worker.py`):** delete `_parse_and_run_consistency_check` (line 65), `run_consistency_check_task` (line 643), and the `func(run_consistency_check_task, timeout=900)` entry from the ARQ task list (line 1287). +- **Schemas (`ontokit/schemas/quality.py`):** delete `ConsistencyCheckResult` and `ConsistencyIssue` (used exclusively by the consistency pipeline; verified by `grep`). +- **Route tests:** the consistency-route assertions in `tests/unit/test_quality_routes.py` (line 114 area) are deleted; duplicate-detection route tests stay. + +### Frontend deletions (`ontokit-web/`) + +- `components/editor/HealthCheckPanel.tsx`: remove the Consistency tab entirely. +- `lib/api/quality.ts`: remove the consistency-check client methods (precise names verified at implementation time). +- `lib/ontology/qualityTypes.ts`: remove `ConsistencyCheckResult` / `ConsistencyIssue` types (keep duplicate-detection types). +- `__tests__/lib/api/quality.test.ts`: remove tests for the deleted client methods. +- `__tests__/components/editor/HealthCheckPanel.test.tsx`: drop tab-switching tests for the Consistency tab. + +### External API + +No external consumers of the `/quality/check`, `/quality/jobs/{job_id}`, or `/quality/issues` endpoints are known, so all three are hard-removed in PR2 (no `410 Gone` shim, no alias-redirect). + +### Cache cleanup + +Existing `consistency_check:*` Redis keys have a TTL and will expire naturally. No explicit eviction needed. + +## Testing Strategy + +### PR1 + +- **Per-rule unit tests** for each of the 6 new rules in `tests/unit/test_linter.py`, each with at least: + - A "should flag" case (minimal in-memory rdflib graph that triggers the rule) + - A "should not flag" case (minimal graph that doesn't) + - Where applicable, a `details`-shape assertion (e.g., `details.predicate` for `dangling-ref`, `details.duplicates` for `duplicate-label`, `details.root_count` for `multi-root`). +- **Reconciled rules:** + - `dangling-ref`: rename existing `undefined-parent` tests (lines 224, 242, 286) to use the new `rule_id`; add tests for `rdfs:domain` and `rdfs:range` dangling refs; assert `details.predicate` is set per axis. + - `duplicate-label`: extend existing test (line 192) with property and individual cases; add a same-label-different-type test that asserts NO finding (same-type constraint); add a case-insensitivity test. + - `orphan-class`: no test changes (behavior unchanged). +- **Level coverage:** add `test_lint_levels_include_new_rules` that asserts each new `rule_id` is in the expected level set, and `dangling-ref` is in L1 (replacing `undefined-parent`). +- **No separate parity-vs-`consistency_service` test harness** — each new rule's own unit tests effectively replicate what the corresponding `consistency_service.py` test covers. + +### PR2 + +- Delete `tests/unit/test_consistency_service.py` whole file. +- Delete consistency-route tests in `tests/unit/test_quality_routes.py` (the `run_consistency_check_task` assertion area, plus the consistency-endpoint tests around it). +- Delete the `run_consistency_check_task` test class in `tests/unit/test_quality_worker.py`; keep `run_duplicate_detection_task` tests. +- Frontend: update `__tests__/components/editor/HealthCheckPanel.test.tsx` to drop Consistency-tab navigation tests; update `__tests__/lib/api/quality.test.ts` to remove tests for the deleted client methods. + +### Regression + +Full `pytest tests/ -k linter` must stay green after PR1. Same for the frontend's Vitest suite after PR2. + +## Risks and Open Questions + +- **Existing baselines containing `undefined-parent` issues.** When the rule is renamed, those rows will not appear under the new `dangling-ref` filter. Mitigation: lint runs are user-triggered and regenerable — re-running lint produces fresh issues with the new `rule_id`. Documenting the rename in the PR description and changelog is sufficient. +- **The two unresolved reconciliation choices** (logged in the issue comment, awaiting @damienriehl input): + - `orphan-class`: keep loose definition vs adopt strict "no instances". + - `undefined-parent` → `dangling-ref`: rename + expand vs keep separate rules. + This document reflects the working decision (keep loose `orphan-class`, rename + expand to `dangling-ref`); it will be revised if those decisions change. +- **`multi-root`'s ontology-scope shape.** The lint schema's `subject_iri` is `str | None` and `subject_type` already accepts `"other"`, so this fits without schema changes. If a future use case wants to distinguish ontology-scope findings from class-scope findings in the UI, a dedicated `subject_type="ontology"` value can be added then. From 8c366af908e780fc4c2b958a5d46f46eb6aab9f7 Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 13:58:17 +0200 Subject: [PATCH 02/20] docs(plans): add PR1 implementation plan for issue #99 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 12-task TDD plan executing the design doc's PR1 scope: 6 new rules, 3 reconciled rules, level description refresh, level-membership coverage test, full regression, PR open. Each task is bite-sized (write failing test → run → implement → run → commit) with concrete code, exact paths, and specific shell commands. Co-Authored-By: Claude Opus 4.7 (1M context) --- ...-05-03-issue-99-pr1-consistency-to-lint.md | 1601 +++++++++++++++++ 1 file changed, 1601 insertions(+) create mode 100644 docs/superpowers/plans/2026-05-03-issue-99-pr1-consistency-to-lint.md diff --git a/docs/superpowers/plans/2026-05-03-issue-99-pr1-consistency-to-lint.md b/docs/superpowers/plans/2026-05-03-issue-99-pr1-consistency-to-lint.md new file mode 100644 index 00000000..0b1ae745 --- /dev/null +++ b/docs/superpowers/plans/2026-05-03-issue-99-pr1-consistency-to-lint.md @@ -0,0 +1,1601 @@ +# Issue #99 PR1 — Consolidate Consistency Rules into Linter (Implementation Plan) + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add 6 new rules to the linter and reconcile 3 partial-overlap rules so the lint system covers everything `consistency_service.py` checks, while leaving `consistency_service.py` and its routes in place for canary parity. + +**Architecture:** All work lands in `ontokit/services/linter.py` (rule definitions + `OntologyLinter` check methods + level membership) and `tests/unit/test_linter.py`. The linter's existing dispatch convention turns a `rule_id` like `dangling-ref` into a method named `_check_dangling_ref` via `f"_check_{rule_id.replace('-', '_')}"` (linter.py:299). Each rule is a small `async` method that returns `list[LintResult]`. Helpers `_determine_entity_type`, `_get_local_name`, and `_get_label` are reused; `is_deprecated` is imported from `ontokit.services.rdf_utils`. + +**Tech Stack:** Python 3.11+, RDFLib 7.1+, pytest with `asyncio_mode="auto"`, ruff (line length 100), mypy strict. + +**Spec:** `docs/superpowers/specs/2026-05-03-issue-99-consolidate-consistency-into-lint-design.md` + +**Branch:** `feat/issue-99-consolidate-consistency` (already created, spec already committed as `317981f`). + +--- + +## File Structure + +| File | Role | Change shape | +|------|------|--------------| +| `ontokit/services/linter.py` | Rule definitions, level sets, `OntologyLinter` check methods | Add 6 `LintRuleInfo` entries; add 6 `_check_*` methods; rename one method; modify two methods; update level membership and level descriptions | +| `tests/unit/test_linter.py` | Per-rule unit tests | Add ~14 new tests; rename ~3 existing; update one level-membership assertion | +| `ontokit/services/rdf_utils.py` | `is_deprecated` helper | Import only — no change to file | + +`consistency_service.py`, route files, worker, schemas, and frontend are NOT touched in PR1. + +--- + +## Task 1: Add `unused-property` rule (warning, L4) + +**Files:** +- Modify: `ontokit/services/linter.py` +- Test: `tests/unit/test_linter.py` + +- [ ] **Step 1: Write the failing tests** + +Append at the end of `tests/unit/test_linter.py`: + +```python +# --------------------------------------------------------------------------- +# unused-property +# --------------------------------------------------------------------------- + + +async def test_unused_property_flags_property_with_no_usage() -> None: + """An ObjectProperty declared but never used as a predicate is flagged.""" + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + # No (?, EX.knows, ?) triples anywhere. + + linter = OntologyLinter(enabled_rules={"unused-property"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "unused-property") + assert len(matches) == 1 + assert matches[0].issue_type == "warning" + assert matches[0].subject_iri == str(EX.knows) + assert matches[0].subject_type == "property" + + +async def test_unused_property_does_not_flag_used_property() -> None: + """A property used as a predicate at least once is not flagged.""" + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + g.add((EX.Alice, EX.knows, EX.Bob)) + + linter = OntologyLinter(enabled_rules={"unused-property"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "unused-property") == [] + + +async def test_unused_property_covers_datatype_and_annotation_properties() -> None: + """DatatypeProperty and AnnotationProperty are also covered.""" + g = Graph() + g.add((EX.age, RDF.type, OWL.DatatypeProperty)) + g.add((EX.note, RDF.type, OWL.AnnotationProperty)) + + linter = OntologyLinter(enabled_rules={"unused-property"}) + issues = await linter.lint(g, PROJECT_ID) + + flagged_iris = {r.subject_iri for r in _results_with_rule(issues, "unused-property")} + assert flagged_iris == {str(EX.age), str(EX.note)} +``` + +- [ ] **Step 2: Run the tests to verify they fail** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k unused_property -v --no-cov +``` + +Expected: 3 failures, each because `OntologyLinter` produces no `unused-property` results (the rule does not exist yet). + +- [ ] **Step 3: Add the `LintRuleInfo` entry and level membership** + +In `ontokit/services/linter.py`, in the `LINT_RULES` list (currently ends ~line 184), append a new entry just before the closing `]`: + +```python + LintRuleInfo( + rule_id="unused-property", + name="Unused Property", + description="Property is declared but never used as a predicate in any triple", + severity=LintIssueType.WARNING.value, + scope=["property"], + ), +``` + +Then add `"unused-property"` to the L4 set at `_LEVEL_4_RULES` (~line 204): + +```python +_LEVEL_4_RULES: set[str] = _LEVEL_3_RULES | { + "missing-comment", + "label-per-language", + "redundant-regional-label", + "unused-property", +} +``` + +- [ ] **Step 4: Implement the check method** + +Add this method to `OntologyLinter` immediately after `_check_redundant_regional_label` (look for the comment `# Static helpers` or the `_determine_entity_type` static method around line 1297 — insert just before that line): + +```python + async def _check_unused_property(self, graph: Graph) -> list[LintResult]: + """Find declared properties that are never used as a predicate.""" + issues: list[LintResult] = [] + property_types = ( + OWL.ObjectProperty, + OWL.DatatypeProperty, + OWL.AnnotationProperty, + RDF.Property, + ) + seen: set[URIRef] = set() + for prop_type in property_types: + for prop in graph.subjects(RDF.type, prop_type): + if not isinstance(prop, URIRef) or prop in seen: + continue + seen.add(prop) + # `subjects(prop, None)` returns subjects of triples whose + # predicate is `prop`. Excluding `prop` itself is necessary + # because the rdf:type triple has the property as subject and + # would otherwise count as self-usage. + used = any(s != prop for s in graph.subjects(prop, None)) + if not used: + issues.append( + LintResult( + issue_type=LintIssueType.WARNING.value, + rule_id="unused-property", + message="Property is declared but never used as a predicate", + subject_iri=str(prop), + subject_type="property", + details={"local_name": self._get_local_name(prop)}, + ) + ) + return issues +``` + +- [ ] **Step 5: Run the tests to verify they pass** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k unused_property -v --no-cov +``` + +Expected: 3 PASSED. + +- [ ] **Step 6: Run ruff + mypy** + +```bash +.venv/bin/ruff check ontokit/services/linter.py tests/unit/test_linter.py +.venv/bin/ruff format --check ontokit/services/linter.py tests/unit/test_linter.py +.venv/bin/mypy ontokit/services/linter.py +``` + +Expected: all clean. + +- [ ] **Step 7: Commit** + +```bash +git add ontokit/services/linter.py tests/unit/test_linter.py +git commit -m "feat(lint): add unused-property rule (#99) + +Flags properties (ObjectProperty / DatatypeProperty / AnnotationProperty / +rdf:Property) declared in the ontology but never used as a predicate in +any triple. Mirrors consistency_service._check_unused_property; lives at +L4 (Quality)." +``` + +--- + +## Task 2: Add `orphan-individual` rule (warning, L2) + +**Files:** +- Modify: `ontokit/services/linter.py` +- Test: `tests/unit/test_linter.py` + +- [ ] **Step 1: Write the failing tests** + +Append at the end of `tests/unit/test_linter.py`: + +```python +# --------------------------------------------------------------------------- +# orphan-individual +# --------------------------------------------------------------------------- + + +async def test_orphan_individual_flags_undeclared_type() -> None: + """Individual whose rdf:type target is not declared as owl:Class is flagged.""" + g = Graph() + g.add((EX.Alice, RDF.type, OWL.NamedIndividual)) + g.add((EX.Alice, RDF.type, EX.Person)) # EX.Person is NOT declared as a class + + linter = OntologyLinter(enabled_rules={"orphan-individual"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "orphan-individual") + assert len(matches) == 1 + assert matches[0].issue_type == "warning" + assert matches[0].subject_iri == str(EX.Alice) + assert matches[0].subject_type == "individual" + assert matches[0].details is not None + assert matches[0].details["undeclared_type"] == str(EX.Person) + + +async def test_orphan_individual_does_not_flag_declared_type() -> None: + """Individual whose rdf:type target is a declared class is not flagged.""" + g = Graph() + g.add((EX.Person, RDF.type, OWL.Class)) + g.add((EX.Alice, RDF.type, OWL.NamedIndividual)) + g.add((EX.Alice, RDF.type, EX.Person)) + + linter = OntologyLinter(enabled_rules={"orphan-individual"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "orphan-individual") == [] + + +async def test_orphan_individual_emits_one_finding_per_undeclared_type() -> None: + """An individual with two undeclared types yields two findings.""" + g = Graph() + g.add((EX.Alice, RDF.type, OWL.NamedIndividual)) + g.add((EX.Alice, RDF.type, EX.Person)) + g.add((EX.Alice, RDF.type, EX.Employee)) + + linter = OntologyLinter(enabled_rules={"orphan-individual"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "orphan-individual") + flagged = {(m.subject_iri, m.details["undeclared_type"]) for m in matches if m.details} + assert flagged == { + (str(EX.Alice), str(EX.Person)), + (str(EX.Alice), str(EX.Employee)), + } +``` + +- [ ] **Step 2: Run the tests to verify they fail** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k orphan_individual -v --no-cov +``` + +Expected: 3 failures. + +- [ ] **Step 3: Add the `LintRuleInfo` entry and level membership** + +In `ontokit/services/linter.py`, append to `LINT_RULES`: + +```python + LintRuleInfo( + rule_id="orphan-individual", + name="Orphan Individual", + description="Individual's rdf:type target is not declared as owl:Class in this ontology", + severity=LintIssueType.WARNING.value, + scope=["individual"], + ), +``` + +Then add `"orphan-individual"` to `_LEVEL_2_RULES`: + +```python +_LEVEL_2_RULES: set[str] = _LEVEL_1_RULES | { + "orphan-class", + "duplicate-triple", + "disjoint-violation", + "missing-type-declaration", + "orphan-individual", +} +``` + +- [ ] **Step 4: Implement the check method** + +Add this method to `OntologyLinter` (insert before `_determine_entity_type`): + +```python + async def _check_orphan_individual(self, graph: Graph) -> list[LintResult]: + """Flag individuals whose rdf:type target is not declared as owl:Class.""" + issues: list[LintResult] = [] + declared_classes = { + c + for c in graph.subjects(RDF.type, OWL.Class) + if isinstance(c, URIRef) + } + # owl:Thing is implicitly a class even if not declared. + declared_classes.add(OWL.Thing) + + for ind in graph.subjects(RDF.type, OWL.NamedIndividual): + if not isinstance(ind, URIRef): + continue + for type_target in graph.objects(ind, RDF.type): + if not isinstance(type_target, URIRef): + continue + if type_target == OWL.NamedIndividual: + continue + if type_target in declared_classes: + continue + issues.append( + LintResult( + issue_type=LintIssueType.WARNING.value, + rule_id="orphan-individual", + message=f"Individual's type {type_target} is not declared as owl:Class", + subject_iri=str(ind), + subject_type="individual", + details={ + "local_name": self._get_local_name(ind), + "undeclared_type": str(type_target), + "undeclared_type_local": self._get_local_name(type_target), + }, + ) + ) + return issues +``` + +- [ ] **Step 5: Run the tests to verify they pass** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k orphan_individual -v --no-cov +``` + +Expected: 3 PASSED. + +- [ ] **Step 6: Commit** + +```bash +git add ontokit/services/linter.py tests/unit/test_linter.py +git commit -m "feat(lint): add orphan-individual rule (#99) + +Flags individuals whose rdf:type target is not declared as owl:Class +in this ontology. Mirrors consistency_service._check_orphan_individual; +lives at L2 (Consistency)." +``` + +--- + +## Task 3: Add `empty-domain` rule (info, L4) + +**Files:** +- Modify: `ontokit/services/linter.py` +- Test: `tests/unit/test_linter.py` + +- [ ] **Step 1: Write the failing tests** + +Append at the end of `tests/unit/test_linter.py`: + +```python +# --------------------------------------------------------------------------- +# empty-domain +# --------------------------------------------------------------------------- + + +async def test_empty_domain_flags_object_property_without_domain() -> None: + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + + linter = OntologyLinter(enabled_rules={"empty-domain"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "empty-domain") + assert len(matches) == 1 + assert matches[0].issue_type == "info" + assert matches[0].subject_iri == str(EX.knows) + assert matches[0].subject_type == "property" + + +async def test_empty_domain_flags_datatype_property_without_domain() -> None: + g = Graph() + g.add((EX.age, RDF.type, OWL.DatatypeProperty)) + + linter = OntologyLinter(enabled_rules={"empty-domain"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "empty-domain") + assert len(matches) == 1 + assert matches[0].subject_iri == str(EX.age) + + +async def test_empty_domain_does_not_flag_property_with_domain() -> None: + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + g.add((EX.knows, RDFS.domain, EX.Person)) + + linter = OntologyLinter(enabled_rules={"empty-domain"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "empty-domain") == [] + + +async def test_empty_domain_does_not_flag_annotation_property() -> None: + """AnnotationProperty is excluded from the empty-domain check.""" + g = Graph() + g.add((EX.note, RDF.type, OWL.AnnotationProperty)) + + linter = OntologyLinter(enabled_rules={"empty-domain"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "empty-domain") == [] +``` + +- [ ] **Step 2: Run the tests to verify they fail** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k empty_domain -v --no-cov +``` + +Expected: 4 failures. + +- [ ] **Step 3: Add the `LintRuleInfo` entry and level membership** + +Append to `LINT_RULES`: + +```python + LintRuleInfo( + rule_id="empty-domain", + name="Empty Domain", + description="ObjectProperty or DatatypeProperty has no rdfs:domain", + severity=LintIssueType.INFO.value, + scope=["property"], + ), +``` + +Add `"empty-domain"` to `_LEVEL_4_RULES`. + +- [ ] **Step 4: Implement the check method** + +Add to `OntologyLinter` before `_determine_entity_type`: + +```python + async def _check_empty_domain(self, graph: Graph) -> list[LintResult]: + """Flag ObjectProperty/DatatypeProperty declarations with no rdfs:domain.""" + issues: list[LintResult] = [] + for prop_type in (OWL.ObjectProperty, OWL.DatatypeProperty): + for prop in graph.subjects(RDF.type, prop_type): + if not isinstance(prop, URIRef): + continue + if any(graph.objects(prop, RDFS.domain)): + continue + issues.append( + LintResult( + issue_type=LintIssueType.INFO.value, + rule_id="empty-domain", + message="Property has no rdfs:domain", + subject_iri=str(prop), + subject_type="property", + details={"local_name": self._get_local_name(prop)}, + ) + ) + return issues +``` + +- [ ] **Step 5: Run the tests to verify they pass** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k empty_domain -v --no-cov +``` + +Expected: 4 PASSED. + +- [ ] **Step 6: Commit** + +```bash +git add ontokit/services/linter.py tests/unit/test_linter.py +git commit -m "feat(lint): add empty-domain rule (#99) + +Flags ObjectProperty and DatatypeProperty declarations with no +rdfs:domain. AnnotationProperty is intentionally excluded (annotations +are by convention domain-agnostic). L4 (Quality)." +``` + +--- + +## Task 4: Add `empty-range` rule (info, L4) + +**Files:** +- Modify: `ontokit/services/linter.py` +- Test: `tests/unit/test_linter.py` + +- [ ] **Step 1: Write the failing tests** + +Append at the end of `tests/unit/test_linter.py`: + +```python +# --------------------------------------------------------------------------- +# empty-range +# --------------------------------------------------------------------------- + + +async def test_empty_range_flags_object_property_without_range() -> None: + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + + linter = OntologyLinter(enabled_rules={"empty-range"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "empty-range") + assert len(matches) == 1 + assert matches[0].issue_type == "info" + assert matches[0].subject_iri == str(EX.knows) + + +async def test_empty_range_does_not_flag_property_with_range() -> None: + g = Graph() + g.add((EX.age, RDF.type, OWL.DatatypeProperty)) + g.add((EX.age, RDFS.range, XSD.integer)) + + linter = OntologyLinter(enabled_rules={"empty-range"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "empty-range") == [] +``` + +- [ ] **Step 2: Run the tests to verify they fail** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k empty_range -v --no-cov +``` + +Expected: 2 failures. + +- [ ] **Step 3: Add the `LintRuleInfo` entry and level membership** + +Append to `LINT_RULES`: + +```python + LintRuleInfo( + rule_id="empty-range", + name="Empty Range", + description="ObjectProperty or DatatypeProperty has no rdfs:range", + severity=LintIssueType.INFO.value, + scope=["property"], + ), +``` + +Add `"empty-range"` to `_LEVEL_4_RULES`. + +- [ ] **Step 4: Implement the check method** + +```python + async def _check_empty_range(self, graph: Graph) -> list[LintResult]: + """Flag ObjectProperty/DatatypeProperty declarations with no rdfs:range.""" + issues: list[LintResult] = [] + for prop_type in (OWL.ObjectProperty, OWL.DatatypeProperty): + for prop in graph.subjects(RDF.type, prop_type): + if not isinstance(prop, URIRef): + continue + if any(graph.objects(prop, RDFS.range)): + continue + issues.append( + LintResult( + issue_type=LintIssueType.INFO.value, + rule_id="empty-range", + message="Property has no rdfs:range", + subject_iri=str(prop), + subject_type="property", + details={"local_name": self._get_local_name(prop)}, + ) + ) + return issues +``` + +- [ ] **Step 5: Run the tests to verify they pass** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k empty_range -v --no-cov +``` + +Expected: 2 PASSED. + +- [ ] **Step 6: Commit** + +```bash +git add ontokit/services/linter.py tests/unit/test_linter.py +git commit -m "feat(lint): add empty-range rule (#99) + +Flags ObjectProperty and DatatypeProperty declarations with no +rdfs:range. AnnotationProperty intentionally excluded. L4 (Quality)." +``` + +--- + +## Task 5: Add `deprecated-parent` rule (warning, L2) + +**Files:** +- Modify: `ontokit/services/linter.py` (also adds an `is_deprecated` import) +- Test: `tests/unit/test_linter.py` + +- [ ] **Step 1: Write the failing tests** + +Append at the end of `tests/unit/test_linter.py`: + +```python +# --------------------------------------------------------------------------- +# deprecated-parent +# --------------------------------------------------------------------------- + + +async def test_deprecated_parent_flags_subclass_of_deprecated_class() -> None: + g = Graph() + g.add((EX.OldThing, RDF.type, OWL.Class)) + g.add((EX.OldThing, OWL.deprecated, Literal(True))) + g.add((EX.NewThing, RDF.type, OWL.Class)) + g.add((EX.NewThing, RDFS.subClassOf, EX.OldThing)) + + linter = OntologyLinter(enabled_rules={"deprecated-parent"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "deprecated-parent") + assert len(matches) == 1 + assert matches[0].issue_type == "warning" + assert matches[0].subject_iri == str(EX.NewThing) + assert matches[0].subject_type == "class" + assert matches[0].details is not None + assert matches[0].details["deprecated_parent"] == str(EX.OldThing) + + +async def test_deprecated_parent_does_not_flag_non_deprecated_parent() -> None: + g = Graph() + g.add((EX.Animal, RDF.type, OWL.Class)) + g.add((EX.Dog, RDF.type, OWL.Class)) + g.add((EX.Dog, RDFS.subClassOf, EX.Animal)) + + linter = OntologyLinter(enabled_rules={"deprecated-parent"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "deprecated-parent") == [] + + +async def test_deprecated_parent_recognizes_string_true() -> None: + """is_deprecated accepts case-insensitive 'true' / '1' literals.""" + g = Graph() + g.add((EX.OldThing, RDF.type, OWL.Class)) + g.add((EX.OldThing, OWL.deprecated, Literal("true"))) + g.add((EX.NewThing, RDF.type, OWL.Class)) + g.add((EX.NewThing, RDFS.subClassOf, EX.OldThing)) + + linter = OntologyLinter(enabled_rules={"deprecated-parent"}) + issues = await linter.lint(g, PROJECT_ID) + + assert len(_results_with_rule(issues, "deprecated-parent")) == 1 +``` + +- [ ] **Step 2: Run the tests to verify they fail** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k deprecated_parent -v --no-cov +``` + +Expected: 3 failures. + +- [ ] **Step 3: Add the import, `LintRuleInfo` entry, and level membership** + +In `ontokit/services/linter.py`, add the import near the other internal imports (line 14 area): + +```python +from ontokit.models.lint import LintIssueType +from ontokit.services.rdf_utils import is_deprecated +``` + +Append to `LINT_RULES`: + +```python + LintRuleInfo( + rule_id="deprecated-parent", + name="Deprecated Parent", + description="Class subclasses a class marked owl:deprecated", + severity=LintIssueType.WARNING.value, + scope=["class"], + ), +``` + +Add `"deprecated-parent"` to `_LEVEL_2_RULES`. + +- [ ] **Step 4: Implement the check method** + +```python + async def _check_deprecated_parent(self, graph: Graph) -> list[LintResult]: + """Flag classes that subclass an owl:deprecated class.""" + issues: list[LintResult] = [] + for cls in graph.subjects(RDF.type, OWL.Class): + if not isinstance(cls, URIRef): + continue + for parent in graph.objects(cls, RDFS.subClassOf): + if not isinstance(parent, URIRef): + continue + if not is_deprecated(graph, parent): + continue + issues.append( + LintResult( + issue_type=LintIssueType.WARNING.value, + rule_id="deprecated-parent", + message=f"Parent class {parent} is deprecated", + subject_iri=str(cls), + subject_type="class", + details={ + "local_name": self._get_local_name(cls), + "deprecated_parent": str(parent), + "deprecated_parent_local": self._get_local_name(parent), + }, + ) + ) + return issues +``` + +- [ ] **Step 5: Run the tests to verify they pass** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k deprecated_parent -v --no-cov +``` + +Expected: 3 PASSED. + +- [ ] **Step 6: Commit** + +```bash +git add ontokit/services/linter.py tests/unit/test_linter.py +git commit -m "feat(lint): add deprecated-parent rule (#99) + +Flags classes that subclass an owl:deprecated class. Reuses the +shared is_deprecated helper from rdf_utils which accepts both boolean +and string literal forms. L2 (Consistency)." +``` + +--- + +## Task 6: Add `multi-root` rule (info, L4) + +**Files:** +- Modify: `ontokit/services/linter.py` +- Test: `tests/unit/test_linter.py` + +- [ ] **Step 1: Write the failing tests** + +Append at the end of `tests/unit/test_linter.py`: + +```python +# --------------------------------------------------------------------------- +# multi-root +# --------------------------------------------------------------------------- + + +async def test_multi_root_does_not_fire_below_threshold() -> None: + """Five or fewer root classes does NOT fire.""" + g = Graph() + for i in range(5): + g.add((URIRef(f"http://example.org/Root{i}"), RDF.type, OWL.Class)) + + linter = OntologyLinter(enabled_rules={"multi-root"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "multi-root") == [] + + +async def test_multi_root_fires_above_threshold() -> None: + """Six root classes triggers a single ontology-scope finding.""" + g = Graph() + for i in range(6): + g.add((URIRef(f"http://example.org/Root{i}"), RDF.type, OWL.Class)) + + linter = OntologyLinter(enabled_rules={"multi-root"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "multi-root") + assert len(matches) == 1 + assert matches[0].issue_type == "info" + assert matches[0].subject_iri is None + assert matches[0].subject_type == "other" + assert matches[0].details is not None + assert matches[0].details["root_count"] == 6 + + +async def test_multi_root_excludes_classes_with_explicit_parent() -> None: + """Classes with a non-owl:Thing parent don't count as roots.""" + g = Graph() + g.add((EX.Animal, RDF.type, OWL.Class)) + # 5 roots + 1 non-root subclass = 5 roots total, no finding expected. + for i in range(5): + g.add((URIRef(f"http://example.org/Root{i}"), RDF.type, OWL.Class)) + g.add((EX.Dog, RDF.type, OWL.Class)) + g.add((EX.Dog, RDFS.subClassOf, EX.Animal)) + # EX.Animal itself is a root, so we have 6 roots when including it + # → fires. Verify the count excludes EX.Dog. + + linter = OntologyLinter(enabled_rules={"multi-root"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "multi-root") + assert len(matches) == 1 + assert matches[0].details is not None + assert matches[0].details["root_count"] == 6 + assert str(EX.Dog) not in matches[0].details["root_iris"] +``` + +- [ ] **Step 2: Run the tests to verify they fail** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k multi_root -v --no-cov +``` + +Expected: 3 failures. + +- [ ] **Step 3: Add the `LintRuleInfo` entry and level membership** + +Append to `LINT_RULES`: + +```python + LintRuleInfo( + rule_id="multi-root", + name="Multiple Root Classes", + description="Ontology has more than 5 root classes (classes with no parent except owl:Thing)", + severity=LintIssueType.INFO.value, + scope=[], + ), +``` + +Add `"multi-root"` to `_LEVEL_4_RULES`. + +- [ ] **Step 4: Implement the check method** + +```python + async def _check_multi_root(self, graph: Graph) -> list[LintResult]: + """Fire once if the ontology has more than 5 root classes.""" + root_iris: list[str] = [] + for cls in graph.subjects(RDF.type, OWL.Class): + if not isinstance(cls, URIRef) or cls == OWL.Thing: + continue + has_real_parent = any( + isinstance(p, URIRef) and p != OWL.Thing + for p in graph.objects(cls, RDFS.subClassOf) + ) + if not has_real_parent: + root_iris.append(str(cls)) + + if len(root_iris) <= 5: + return [] + + return [ + LintResult( + issue_type=LintIssueType.INFO.value, + rule_id="multi-root", + message=f"Ontology has {len(root_iris)} root classes (classes with no parent)", + subject_iri=None, + subject_type="other", + details={ + "root_count": len(root_iris), + # Cap at 20 to keep the payload small even on huge ontologies. + "root_iris": sorted(root_iris)[:20], + }, + ) + ] +``` + +- [ ] **Step 5: Run the tests to verify they pass** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k multi_root -v --no-cov +``` + +Expected: 3 PASSED. + +- [ ] **Step 6: Commit** + +```bash +git add ontokit/services/linter.py tests/unit/test_linter.py +git commit -m "feat(lint): add multi-root rule (#99) + +Fires once when an ontology has more than 5 root classes (classes with +no parent except owl:Thing). Ontology-scope finding: subject_iri=None, +subject_type='other'. L4 (Quality)." +``` + +--- + +## Task 7: Rename `undefined-parent` → `dangling-ref` (no behavior change) + +**Files:** +- Modify: `ontokit/services/linter.py` +- Test: `tests/unit/test_linter.py` + +- [ ] **Step 1: Update the existing tests to expect the new rule_id** + +In `tests/unit/test_linter.py`, find the three callsites referencing `"undefined-parent"`: +- Line 224 (`enabled_rules={"undefined-parent"}`) +- Line 227 (`_results_with_rule(issues, "undefined-parent")`) +- Line 242 (`enabled_rules={"undefined-parent"}`) +- Line 245 (`_results_with_rule(issues, "undefined-parent")`) +- Line 286 (inside an `enabled_rules={...}` set literal) + +Replace every occurrence of the string `"undefined-parent"` in this file with `"dangling-ref"`. + +- [ ] **Step 2: Run the tests to verify they fail** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k undefined_parent -v --no-cov +.venv/bin/pytest tests/unit/test_linter.py -k dangling_ref -v --no-cov +``` + +Expected: the tests now collect under `dangling_ref` and FAIL because the rule_id `dangling-ref` does not exist yet (linter still emits `undefined-parent`). + +- [ ] **Step 3: Rename the rule in `linter.py`** + +In `ontokit/services/linter.py`: + +a. In `LINT_RULES` (~line 70), change the existing entry: + +```python + LintRuleInfo( + rule_id="dangling-ref", + name="Dangling Reference", + description="Reference to a URI not defined in the ontology (in subClassOf, rdfs:domain, or rdfs:range)", + severity=LintIssueType.ERROR.value, + scope=["class", "property"], + ), +``` + +b. In `_LEVEL_1_RULES` (~line 190), replace `"undefined-parent"` with `"dangling-ref"`: + +```python +_LEVEL_1_RULES: set[str] = {"dangling-ref", "circular-hierarchy", "undefined-prefix"} +``` + +c. Rename the method `_check_undefined_parent` to `_check_dangling_ref` (line 395). Inside the method, change the `rule_id="undefined-parent"` literal to `rule_id="dangling-ref"`. **Do not yet expand it to cover domain/range** — that's Task 8. + +d. Update the L1 description in `LINT_LEVEL_DEFINITIONS` (~line 233) to reference dangling references instead of undefined parents: + +```python + 1: LintLevelDefinition( + "Critical", + "Dangling references, circular hierarchies, undefined prefixes", + LINT_LEVELS[1], + ), +``` + +- [ ] **Step 4: Run the tests to verify they pass** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k dangling_ref -v --no-cov +``` + +Expected: the renamed tests PASS. + +- [ ] **Step 5: Commit** + +```bash +git add ontokit/services/linter.py tests/unit/test_linter.py +git commit -m "refactor(lint): rename undefined-parent to dangling-ref (#99) + +Pure rename. Behavior unchanged — the rule still only checks +subClassOf targets. Predicate-axis expansion comes in the next +commit. Existing LintIssue rows with rule_id='undefined-parent' are +left in place; they're snapshots of past runs and lint results are +regenerable." +``` + +--- + +## Task 8: Expand `dangling-ref` to cover `rdfs:domain` and `rdfs:range` + +**Files:** +- Modify: `ontokit/services/linter.py` +- Test: `tests/unit/test_linter.py` + +- [ ] **Step 1: Write the failing tests** + +Append at the end of `tests/unit/test_linter.py`: + +```python +# --------------------------------------------------------------------------- +# dangling-ref (domain/range expansion) +# --------------------------------------------------------------------------- + + +async def test_dangling_ref_flags_undefined_domain() -> None: + """Property whose rdfs:domain points to an undeclared URI is flagged.""" + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + g.add((EX.knows, RDFS.domain, EX.UndeclaredClass)) + + linter = OntologyLinter(enabled_rules={"dangling-ref"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "dangling-ref") + assert len(matches) == 1 + assert matches[0].subject_iri == str(EX.knows) + assert matches[0].details is not None + assert matches[0].details["predicate"] == str(RDFS.domain) + assert matches[0].details["dangling_target"] == str(EX.UndeclaredClass) + + +async def test_dangling_ref_flags_undefined_range() -> None: + g = Graph() + g.add((EX.age, RDF.type, OWL.DatatypeProperty)) + g.add((EX.age, RDFS.range, EX.UndeclaredDatatype)) + + linter = OntologyLinter(enabled_rules={"dangling-ref"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "dangling-ref") + assert len(matches) == 1 + assert matches[0].subject_iri == str(EX.age) + assert matches[0].details is not None + assert matches[0].details["predicate"] == str(RDFS.range) + + +async def test_dangling_ref_subclassof_includes_predicate_detail() -> None: + """The existing subClassOf path now also reports details.predicate.""" + g = Graph() + g.add((EX.Dog, RDF.type, OWL.Class)) + g.add((EX.Dog, RDFS.subClassOf, EX.UndeclaredAnimal)) + + linter = OntologyLinter(enabled_rules={"dangling-ref"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "dangling-ref") + assert len(matches) == 1 + assert matches[0].details is not None + assert matches[0].details["predicate"] == str(RDFS.subClassOf) + + +async def test_dangling_ref_skips_well_known_namespaces() -> None: + """References into rdf/rdfs/owl/xsd/skos/dcterms must not be flagged.""" + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + g.add((EX.knows, RDFS.range, XSD.string)) + g.add((EX.related, RDF.type, OWL.ObjectProperty)) + g.add((EX.related, RDFS.range, SKOS.Concept)) + + linter = OntologyLinter(enabled_rules={"dangling-ref"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "dangling-ref") == [] + + +async def test_dangling_ref_skips_imported_namespaces() -> None: + """References into namespaces declared via owl:imports must not be flagged.""" + g = Graph() + imported_ns = URIRef("http://other.org/onto") + g.add((URIRef("http://example.org/myonto"), OWL.imports, imported_ns)) + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + g.add((EX.knows, RDFS.range, URIRef("http://other.org/onto/Person"))) + + linter = OntologyLinter(enabled_rules={"dangling-ref"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "dangling-ref") == [] +``` + +- [ ] **Step 2: Run the tests to verify they fail** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k dangling_ref -v --no-cov +``` + +Expected: the 5 new tests fail (existing dangling-ref tests still pass). The new failures are because the rule still only inspects subClassOf and has no `details.predicate` field. + +- [ ] **Step 3: Replace `_check_dangling_ref` with the expanded version** + +In `ontokit/services/linter.py`, replace the entire `_check_dangling_ref` method (lines roughly 395–434) with: + +```python + async def _check_dangling_ref(self, graph: Graph) -> list[LintResult]: + """Find references to URIs that aren't declared in this ontology. + + Scans rdfs:subClassOf, rdfs:domain, and rdfs:range. References into + well-known vocabularies (rdf/rdfs/owl/xsd/skos/dcterms/dc) and into + namespaces brought in via owl:imports are not flagged. + """ + issues: list[LintResult] = [] + + # A URI is "known" if it appears as a subject of any rdf:type triple + # OR as a subject of any triple at all (covers blank-node-free uses). + declared_subjects: set[URIRef] = { + s for s in graph.subjects(RDF.type, None) if isinstance(s, URIRef) + } + all_subjects: set[URIRef] = { + s for s in graph.subjects() if isinstance(s, URIRef) + } + known: set[URIRef] = declared_subjects | all_subjects | {OWL.Thing} + + well_known_ns = { + str(RDF), + str(RDFS), + str(OWL), + str(XSD), + str(SKOS), + str(DC), + str(DCTERMS), + } + imported_ns: set[str] = set() + for _ontology, _pred, imported in graph.triples((None, OWL.imports, None)): + if isinstance(imported, URIRef): + imp_str = str(imported) + if not imp_str.endswith(("/", "#")): + imp_str += "/" + imported_ns.add(imp_str) + external_ns = well_known_ns | imported_ns + + # (subject_iri, predicate, target) keyed reporting to deduplicate + # when the same triple would be reported by multiple iterations. + reported: set[tuple[str, str, str]] = set() + + for predicate in (RDFS.subClassOf, RDFS.domain, RDFS.range): + for subj, _p, obj in graph.triples((None, predicate, None)): + if not isinstance(obj, URIRef) or not isinstance(subj, URIRef): + continue + if obj == OWL.Thing or obj in known: + continue + obj_str = str(obj) + if any(obj_str.startswith(ns) for ns in external_ns): + continue + key = (str(subj), str(predicate), obj_str) + if key in reported: + continue + reported.add(key) + issues.append( + LintResult( + issue_type=LintIssueType.ERROR.value, + rule_id="dangling-ref", + message=f"References undeclared entity {obj}", + subject_iri=str(subj), + subject_type=self._determine_entity_type(graph, subj), + details={ + "local_name": self._get_local_name(subj), + "predicate": str(predicate), + "dangling_target": obj_str, + "dangling_target_local": self._get_local_name(obj), + }, + ) + ) + return issues +``` + +- [ ] **Step 4: Run all dangling-ref tests to verify they pass** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k dangling_ref -v --no-cov +``` + +Expected: every test passes (the renamed ones from Task 7 plus the 5 new ones). + +- [ ] **Step 5: Commit** + +```bash +git add ontokit/services/linter.py tests/unit/test_linter.py +git commit -m "feat(lint): expand dangling-ref to cover domain and range (#99) + +The rule now inspects rdfs:subClassOf, rdfs:domain, and rdfs:range +targets uniformly. Each finding carries details.predicate so the UI +can show which axis triggered the dangling reference. References into +well-known namespaces (rdf/rdfs/owl/xsd/skos/dc/dcterms) and into +namespaces declared via owl:imports are skipped, mirroring +consistency_service._check_dangling_ref." +``` + +--- + +## Task 9: Broaden `duplicate-label` semantics (case-insensitive + same-type + all entity types) + +**Files:** +- Modify: `ontokit/services/linter.py` +- Test: `tests/unit/test_linter.py` + +- [ ] **Step 1: Write the failing tests** + +Append at the end of `tests/unit/test_linter.py`: + +```python +# --------------------------------------------------------------------------- +# duplicate-label (broader semantics) +# --------------------------------------------------------------------------- + + +async def test_duplicate_label_case_insensitive_within_classes() -> None: + """Existing behavior preserved: classes with same label (any case) flagged.""" + g = Graph() + g.add((EX.A, RDF.type, OWL.Class)) + g.add((EX.A, RDFS.label, Literal("Animal", lang="en"))) + g.add((EX.B, RDF.type, OWL.Class)) + g.add((EX.B, RDFS.label, Literal("ANIMAL", lang="en"))) + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "duplicate-label") + assert {m.subject_iri for m in matches} == {str(EX.A), str(EX.B)} + + +async def test_duplicate_label_flags_property_duplicates() -> None: + """Two ObjectProperties sharing a label (case-insensitive) are flagged.""" + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + g.add((EX.knows, RDFS.label, Literal("knows", lang="en"))) + g.add((EX.acquaintedWith, RDF.type, OWL.ObjectProperty)) + g.add((EX.acquaintedWith, RDFS.label, Literal("Knows", lang="en"))) + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "duplicate-label") + assert {m.subject_iri for m in matches} == {str(EX.knows), str(EX.acquaintedWith)} + for m in matches: + assert m.subject_type == "property" + + +async def test_duplicate_label_flags_individual_duplicates() -> None: + g = Graph() + g.add((EX.Person, RDF.type, OWL.Class)) + g.add((EX.alice1, RDF.type, EX.Person)) + g.add((EX.alice1, RDFS.label, Literal("Alice", lang="en"))) + g.add((EX.alice2, RDF.type, EX.Person)) + g.add((EX.alice2, RDFS.label, Literal("alice", lang="en"))) + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "duplicate-label") + assert {m.subject_iri for m in matches} == {str(EX.alice1), str(EX.alice2)} + + +async def test_duplicate_label_does_not_flag_across_entity_types() -> None: + """A class and a property sharing a label are NOT cross-flagged.""" + g = Graph() + g.add((EX.Knows, RDF.type, OWL.Class)) + g.add((EX.Knows, RDFS.label, Literal("knows", lang="en"))) + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + g.add((EX.knows, RDFS.label, Literal("Knows", lang="en"))) + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "duplicate-label") == [] + + +async def test_duplicate_label_separates_languages() -> None: + """Same label string in different languages is not a duplicate.""" + g = Graph() + g.add((EX.A, RDF.type, OWL.Class)) + g.add((EX.A, RDFS.label, Literal("Hund", lang="de"))) + g.add((EX.B, RDF.type, OWL.Class)) + g.add((EX.B, RDFS.label, Literal("Hund", lang="en"))) # English happens to coincide + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "duplicate-label") == [] +``` + +- [ ] **Step 2: Update the existing duplicate-label tests if needed** + +Look at the existing `test_duplicate_label` test (around line 192). Currently both classes share a label and are flagged. Verify this still passes after the implementation change — no test edits expected, but if the existing test asserted `subject_type == "class"` exactly, that should still hold. + +- [ ] **Step 3: Run the tests to verify they fail** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k duplicate_label -v --no-cov +``` + +Expected: the 5 new tests fail because the current implementation either misses non-class entity types (`flags_property_duplicates`, `flags_individual_duplicates`) or cross-flags across types (`does_not_flag_across_entity_types`) or ignores language tags (`separates_languages`). + +- [ ] **Step 4: Replace `_check_duplicate_label` with the broadened version** + +In `ontokit/services/linter.py`, also update the rule's `LintRuleInfo.scope` from `_ALL` (it's already `_ALL` per the spec? — verify by reading line 91-95; if scope is currently `_ALL` no change is needed; if it is `["class"]` change it to `_ALL`): + +```python + LintRuleInfo( + rule_id="duplicate-label", + name="Duplicate Label", + description="Multiple resources of the same entity type share the same label (case-insensitive, per language)", + severity=LintIssueType.WARNING.value, + scope=_ALL, + ), +``` + +Replace the entire `_check_duplicate_label` method (line 524 area): + +```python + async def _check_duplicate_label(self, graph: Graph) -> list[LintResult]: + """Find resources of the same entity type sharing a label (case-insensitive, per language).""" + issues: list[LintResult] = [] + + # Group by (entity_type, label_lower, lang) → list of resource IRIs. + # Skip resources whose entity_type is "other" — we only group concrete + # types that the schema knows how to navigate. + groups: dict[tuple[str, str, str | None], list[str]] = defaultdict(list) + original_label_for: dict[str, str] = {} + + for subject in self._uri_subjects: + etype = self._determine_entity_type(graph, subject) + if etype == "other": + continue + for label in graph.objects(subject, RDFS.label): + if not isinstance(label, RDFLiteral): + continue + label_str = str(label).strip() + if not label_str: + continue + key = (etype, label_str.lower(), label.language) + groups[key].append(str(subject)) + original_label_for.setdefault(str(subject), label_str) + + reported_iris: set[str] = set() + for (_etype, _lower, lang), iris in groups.items(): + if len(iris) < 2: + continue + for iri in iris: + if iri in reported_iris: + continue + reported_iris.add(iri) + others = [o for o in iris if o != iri] + lang_str = f"@{lang}" if lang else "" + shown_label = original_label_for[iri] + issues.append( + LintResult( + issue_type=LintIssueType.WARNING.value, + rule_id="duplicate-label", + message=( + f'Label "{shown_label}"{lang_str} is shared with ' + f"{len(others)} other resource(s) of the same type" + ), + subject_iri=iri, + subject_type=self._determine_entity_type(graph, URIRef(iri)), + details={ + "local_name": self._get_local_name(URIRef(iri)), + "label": shown_label, + "language": lang, + "duplicate_iris": others[:5], + "total_duplicates": len(others), + }, + ) + ) + return issues +``` + +- [ ] **Step 5: Run the tests to verify they pass** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k duplicate_label -v --no-cov +``` + +Expected: all duplicate-label tests pass (existing + 5 new). + +- [ ] **Step 6: Commit** + +```bash +git add ontokit/services/linter.py tests/unit/test_linter.py +git commit -m "feat(lint): broaden duplicate-label to all entity types, same-type only (#99) + +Matching is now case-insensitive, per language, and grouped by entity +type — so a class and a property sharing 'knows' is no longer a false +positive, but two ObjectProperties with the same label still are. +Scope expanded from class-only to all entity types. Mirrors the per- +type semantics from consistency_service._check_duplicate_label while +keeping the lint rule's case-insensitive matching." +``` + +--- + +## Task 10: Update `LINT_LEVEL_DEFINITIONS` descriptions + +**Files:** +- Modify: `ontokit/services/linter.py` + +- [ ] **Step 1: Edit the L2 and L4 descriptions** + +In `ontokit/services/linter.py`, find `LINT_LEVEL_DEFINITIONS` (around line 230) and update: + +```python +LINT_LEVEL_DEFINITIONS: dict[int, LintLevelDefinition] = { + 1: LintLevelDefinition( + "Critical", + "Dangling references, circular hierarchies, undefined prefixes", + LINT_LEVELS[1], + ), + 2: LintLevelDefinition( + "Consistency", + ( + "Orphan classes, duplicate triples, disjointness violations, " + "orphan individuals, deprecated parent classes" + ), + LINT_LEVELS[2], + ), + 3: LintLevelDefinition( + "Labels", + "Missing, empty, and duplicate label checks", + LINT_LEVELS[3], + ), + 4: LintLevelDefinition( + "Quality", + ( + "Comments, per-language label checks, redundant regional variants, " + "unused properties, empty domain/range, multi-root warnings" + ), + LINT_LEVELS[4], + ), + 5: LintLevelDefinition( + "All", + "All available rules including domain/range and cardinality", + LINT_LEVELS[5], + ), +} +``` + +(Task 7 already updated L1's description, so it's listed here only for reference; do not edit it again if it's already correct.) + +- [ ] **Step 2: Verify ruff + mypy clean** + +```bash +.venv/bin/ruff check ontokit/services/linter.py +.venv/bin/ruff format --check ontokit/services/linter.py +.venv/bin/mypy ontokit/services/linter.py +``` + +Expected: all clean. + +- [ ] **Step 3: Commit** + +```bash +git add ontokit/services/linter.py +git commit -m "feat(lint): refresh L2 and L4 descriptions for new rules (#99) + +L2 now mentions orphan-individual and deprecated-parent. L4 now +mentions unused-property, empty-domain/range, and multi-root." +``` + +--- + +## Task 11: Add level-membership coverage test + +**Files:** +- Test: `tests/unit/test_linter.py` + +- [ ] **Step 1: Write the failing test** + +Append at the end of `tests/unit/test_linter.py`: + +```python +# --------------------------------------------------------------------------- +# Level membership for new and renamed rules (#99) +# --------------------------------------------------------------------------- + + +def test_lint_levels_include_new_and_renamed_rules() -> None: + """Each rule introduced or renamed in #99 is in the expected lint level.""" + from ontokit.services.linter import LINT_LEVELS + + # L1 — dangling-ref replaces undefined-parent. + assert "dangling-ref" in LINT_LEVELS[1] + assert "undefined-parent" not in LINT_LEVELS[1] + assert "undefined-parent" not in LINT_LEVELS[5] + + # L2 — orphan-individual and deprecated-parent join existing consistency rules. + assert "orphan-individual" in LINT_LEVELS[2] + assert "deprecated-parent" in LINT_LEVELS[2] + + # L4 — quality-style additions. + assert "unused-property" in LINT_LEVELS[4] + assert "empty-domain" in LINT_LEVELS[4] + assert "empty-range" in LINT_LEVELS[4] + assert "multi-root" in LINT_LEVELS[4] +``` + +- [ ] **Step 2: Run the test to verify it passes** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -k lint_levels_include_new -v --no-cov +``` + +Expected: PASS (all level memberships were established by Tasks 1–7). + +- [ ] **Step 3: Commit** + +```bash +git add tests/unit/test_linter.py +git commit -m "test(lint): assert level membership for new/renamed rules (#99) + +Locks in the level placements from the design doc so a future +accidental edit to LINT_LEVELS gets caught at test time." +``` + +--- + +## Task 12: Final regression — full linter test suite + open the PR + +**Files:** none + +- [ ] **Step 1: Run the full linter test suite** + +```bash +.venv/bin/pytest tests/unit/test_linter.py -v --no-cov +``` + +Expected: every test passes. Verify the test count grew by approximately the number of new tests added across Tasks 1–11. + +- [ ] **Step 2: Run the full backend test suite** + +```bash +.venv/bin/pytest tests/ -q --no-cov +``` + +Expected: green. Any failure outside `tests/unit/test_linter.py` indicates a regression — investigate before opening the PR. + +- [ ] **Step 3: Final ruff + mypy + format pass** + +```bash +.venv/bin/ruff check ontokit/services/linter.py tests/unit/test_linter.py +.venv/bin/ruff format --check ontokit/services/linter.py tests/unit/test_linter.py +.venv/bin/mypy ontokit/services/linter.py +``` + +Expected: all clean. + +- [ ] **Step 4: Push the branch** + +```bash +git push -u origin feat/issue-99-consolidate-consistency +``` + +- [ ] **Step 5: Open the PR** + +```bash +gh pr create --base dev \ + --title "feat(lint): consolidate consistency rules into linter (#99 part 1)" \ + --body "$(cat <<'EOF' +## Summary + +PR1 of two for issue #99. Adds 6 new rules to the linter and reconciles 3 partial-overlap rules so the lint system covers everything `consistency_service.py` checks. `consistency_service.py`, its routes, the worker task, and the frontend Consistency tab are all left in place; PR2 will remove them once we have confidence the lint pipeline produces equivalent findings. + +### New rules + +| `rule_id` | Severity | Level | Notes | +|-----------|----------|-------|-------| +| `unused-property` | warning | L4 | Property declared but never used as predicate | +| `orphan-individual` | warning | L2 | Individual's `rdf:type` not declared as `owl:Class` | +| `empty-domain` | info | L4 | Object/Datatype property with no `rdfs:domain` | +| `empty-range` | info | L4 | Same for `rdfs:range` | +| `deprecated-parent` | warning | L2 | Class subclasses an `owl:deprecated` class | +| `multi-root` | info | L4 | Fires once if >5 root classes; ontology-scope finding | + +### Reconciled rules + +- **`undefined-parent` → `dangling-ref`** (rename + expand). Now scans `rdfs:subClassOf`, `rdfs:domain`, and `rdfs:range`. Each finding carries `details.predicate`. Stays at L1 (Critical). Existing `LintIssue` rows with the old `rule_id` are left in place; lint runs are user-triggered and regenerable. +- **`duplicate-label`** (broaden). Scope expanded from class-only to all entity types; matching key is now `(entity_type, label_lower, lang)` so cross-type collisions and language overlaps are no longer false positives. Stays at L3. +- **`orphan-class`** — no code change. The "no instances" variant from consistency simply disappears with PR2. + +### Out of scope + +- Removal of `consistency_service.py` and its routes (PR2) +- Discussion #87's broader rdflib-vs-SQL question +- Frontend Consistency tab (PR2) + +## Design + +`docs/superpowers/specs/2026-05-03-issue-99-consolidate-consistency-into-lint-design.md` + +## Test plan + +- [x] Per-rule unit tests for every new and reconciled rule +- [x] Level-membership coverage test +- [x] Full `pytest tests/` regression green +- [x] ruff + mypy clean + +🤖 Generated with [Claude Code](https://claude.com/claude-code) +EOF +)" +``` + +--- + +## Self-Review Checklist (run before declaring the plan done) + +1. **Spec coverage:** + - 6 new rules → Tasks 1–6 ✓ + - `orphan-class` no change → noted in PR description, no task needed ✓ + - `undefined-parent` → `dangling-ref` rename → Task 7 ✓ + - `dangling-ref` domain/range expansion → Task 8 ✓ + - `duplicate-label` broadening → Task 9 ✓ + - `LINT_LEVEL_DEFINITIONS` description updates → Task 10 ✓ + - Level-membership coverage test → Task 11 ✓ + - Full regression + PR open → Task 12 ✓ + +2. **Placeholder scan:** every step contains either concrete code, a concrete command, or a concrete file edit. No "TBD" / "implement appropriate" / "similar to above" placeholders. + +3. **Type consistency:** every test uses `OntologyLinter(enabled_rules={...})` and `_results_with_rule(issues, rule_id)`; every rule's `LintRuleInfo` matches the method name (kebab → snake) and the `LintResult.rule_id` literal. `is_deprecated` import added in Task 5 is used in the same task; `defaultdict` is already imported in `linter.py` at line 4 (used by Task 9). + +4. **Out-of-scope creep check:** no task touches `consistency_service.py`, `quality.py`, `worker.py`, schemas, or the frontend. From f7e978960e56a25beee270683b3b73c31f7b8c15 Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 14:05:22 +0200 Subject: [PATCH 03/20] feat(lint): add unused-property rule (#99) Flags properties (ObjectProperty / DatatypeProperty / AnnotationProperty / rdf:Property) declared in the ontology but never used as a predicate in any triple. Mirrors consistency_service._check_unused_property; lives at L4 (Quality). --- ontokit/services/linter.py | 41 +++++++++++++++++++++++++++++++++ tests/unit/test_linter.py | 46 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index 0eb2c3fb..83085153 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -181,6 +181,13 @@ class LintRuleInfo: severity=LintIssueType.WARNING.value, scope=_ALL, ), + LintRuleInfo( + rule_id="unused-property", + name="Unused Property", + description="Property is declared but never used as a predicate in any triple", + severity=LintIssueType.WARNING.value, + scope=["property"], + ), ] # Map rule IDs to their info @@ -205,6 +212,7 @@ class LintRuleInfo: "missing-comment", "label-per-language", "redundant-regional-label", + "unused-property", } _LEVEL_5_RULES: set[str] = {r.rule_id for r in LINT_RULES} @@ -1294,6 +1302,39 @@ async def _check_missing_type_declaration(self, graph: Graph) -> list[LintResult return issues + async def _check_unused_property(self, graph: Graph) -> list[LintResult]: + """Find declared properties that are never used as a predicate.""" + issues: list[LintResult] = [] + property_types = ( + OWL.ObjectProperty, + OWL.DatatypeProperty, + OWL.AnnotationProperty, + RDF.Property, + ) + seen: set[URIRef] = set() + for prop_type in property_types: + for prop in graph.subjects(RDF.type, prop_type): + if not isinstance(prop, URIRef) or prop in seen: + continue + seen.add(prop) + # `subjects(prop, None)` returns subjects of triples whose + # predicate is `prop`. Excluding `prop` itself is necessary + # because the rdf:type triple has the property as subject and + # would otherwise count as self-usage. + used = any(s != prop for s in graph.subjects(prop, None)) + if not used: + issues.append( + LintResult( + issue_type=LintIssueType.WARNING.value, + rule_id="unused-property", + message="Property is declared but never used as a predicate", + subject_iri=str(prop), + subject_type="property", + details={"local_name": self._get_local_name(prop)}, + ) + ) + return issues + @staticmethod def _determine_entity_type(graph: Graph, uri: URIRef) -> str: """Return 'class', 'property', 'individual', or 'other' for a URI.""" diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index 921f230e..27335079 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -912,3 +912,49 @@ async def test_undefined_prefix_subject_type_reflects_entity() -> None: ] assert matches, "expected an undefined-prefix issue for the bad class IRI" assert matches[0].subject_type == "class" + + +# --------------------------------------------------------------------------- +# unused-property +# --------------------------------------------------------------------------- + + +async def test_unused_property_flags_property_with_no_usage() -> None: + """An ObjectProperty declared but never used as a predicate is flagged.""" + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + # No (?, EX.knows, ?) triples anywhere. + + linter = OntologyLinter(enabled_rules={"unused-property"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "unused-property") + assert len(matches) == 1 + assert matches[0].issue_type == "warning" + assert matches[0].subject_iri == str(EX.knows) + assert matches[0].subject_type == "property" + + +async def test_unused_property_does_not_flag_used_property() -> None: + """A property used as a predicate at least once is not flagged.""" + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + g.add((EX.Alice, EX.knows, EX.Bob)) + + linter = OntologyLinter(enabled_rules={"unused-property"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "unused-property") == [] + + +async def test_unused_property_covers_datatype_and_annotation_properties() -> None: + """DatatypeProperty and AnnotationProperty are also covered.""" + g = Graph() + g.add((EX.age, RDF.type, OWL.DatatypeProperty)) + g.add((EX.note, RDF.type, OWL.AnnotationProperty)) + + linter = OntologyLinter(enabled_rules={"unused-property"}) + issues = await linter.lint(g, PROJECT_ID) + + flagged_iris = {r.subject_iri for r in _results_with_rule(issues, "unused-property")} + assert flagged_iris == {str(EX.age), str(EX.note)} From 06cf62b7b4ee32ac1486bdae068c9af0d4897f79 Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 14:39:36 +0200 Subject: [PATCH 04/20] fix(lint): clarify unused-property comment + number test section Address code review on Task 1 (#99): * The s != prop guard protects against a degenerate (prop, prop, X) triple, not against the rdf:type declaration as the previous comment claimed. graph.subjects(prop, None) queries triples where prop is the predicate, so the rdf:type triple was never in scope. * Number the test section header to match the file's convention. Co-Authored-By: Claude Sonnet 4.6 --- ontokit/services/linter.py | 8 ++++---- tests/unit/test_linter.py | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index 83085153..5a7f6759 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -1317,10 +1317,10 @@ async def _check_unused_property(self, graph: Graph) -> list[LintResult]: if not isinstance(prop, URIRef) or prop in seen: continue seen.add(prop) - # `subjects(prop, None)` returns subjects of triples whose - # predicate is `prop`. Excluding `prop` itself is necessary - # because the rdf:type triple has the property as subject and - # would otherwise count as self-usage. + # `graph.subjects(prop, None)` returns subjects of triples where `prop` + # is the predicate. We exclude the property itself as a subject to avoid + # treating a self-referential triple like (prop, prop, X) as evidence + # that prop is "used" in any meaningful sense. used = any(s != prop for s in graph.subjects(prop, None)) if not used: issues.append( diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index 27335079..59ec2ade 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -915,7 +915,7 @@ async def test_undefined_prefix_subject_type_reflects_entity() -> None: # --------------------------------------------------------------------------- -# unused-property +# 24. unused-property # --------------------------------------------------------------------------- From 5cde4b937d86104d8391bd740d68e5f7eaeb4dd4 Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 14:45:59 +0200 Subject: [PATCH 05/20] feat(lint): add orphan-individual rule (#99) Flags individuals whose rdf:type target is not declared as owl:Class in this ontology. One finding per (individual, undeclared-type) pair. Mirrors consistency_service._check_orphan_individual; lives at L2 (Consistency). --- ontokit/services/linter.py | 41 +++++++++++++++++++++++++++++ tests/unit/test_linter.py | 54 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 95 insertions(+) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index 5a7f6759..a1320752 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -188,6 +188,13 @@ class LintRuleInfo: severity=LintIssueType.WARNING.value, scope=["property"], ), + LintRuleInfo( + rule_id="orphan-individual", + name="Orphan Individual", + description="Individual's rdf:type target is not declared as owl:Class in this ontology", + severity=LintIssueType.WARNING.value, + scope=["individual"], + ), ] # Map rule IDs to their info @@ -200,6 +207,7 @@ class LintRuleInfo: "duplicate-triple", "disjoint-violation", "missing-type-declaration", + "orphan-individual", } _LEVEL_3_RULES: set[str] = _LEVEL_2_RULES | { "missing-label", @@ -1335,6 +1343,39 @@ async def _check_unused_property(self, graph: Graph) -> list[LintResult]: ) return issues + async def _check_orphan_individual(self, graph: Graph) -> list[LintResult]: + """Flag individuals whose rdf:type target is not declared as owl:Class.""" + issues: list[LintResult] = [] + declared_classes = {c for c in graph.subjects(RDF.type, OWL.Class) if isinstance(c, URIRef)} + # owl:Thing is implicitly a class even if not declared. + declared_classes.add(OWL.Thing) + + for ind in graph.subjects(RDF.type, OWL.NamedIndividual): + if not isinstance(ind, URIRef): + continue + for type_target in graph.objects(ind, RDF.type): + if not isinstance(type_target, URIRef): + continue + if type_target == OWL.NamedIndividual: + continue + if type_target in declared_classes: + continue + issues.append( + LintResult( + issue_type=LintIssueType.WARNING.value, + rule_id="orphan-individual", + message=f"Individual's type {type_target} is not declared as owl:Class", + subject_iri=str(ind), + subject_type="individual", + details={ + "local_name": self._get_local_name(ind), + "undeclared_type": str(type_target), + "undeclared_type_local": self._get_local_name(type_target), + }, + ) + ) + return issues + @staticmethod def _determine_entity_type(graph: Graph, uri: URIRef) -> str: """Return 'class', 'property', 'individual', or 'other' for a URI.""" diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index 59ec2ade..02785bc9 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -958,3 +958,57 @@ async def test_unused_property_covers_datatype_and_annotation_properties() -> No flagged_iris = {r.subject_iri for r in _results_with_rule(issues, "unused-property")} assert flagged_iris == {str(EX.age), str(EX.note)} + + +# --------------------------------------------------------------------------- +# 25. orphan-individual +# --------------------------------------------------------------------------- + + +async def test_orphan_individual_flags_undeclared_type() -> None: + """Individual whose rdf:type target is not declared as owl:Class is flagged.""" + g = Graph() + g.add((EX.Alice, RDF.type, OWL.NamedIndividual)) + g.add((EX.Alice, RDF.type, EX.Person)) # EX.Person is NOT declared as a class + + linter = OntologyLinter(enabled_rules={"orphan-individual"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "orphan-individual") + assert len(matches) == 1 + assert matches[0].issue_type == "warning" + assert matches[0].subject_iri == str(EX.Alice) + assert matches[0].subject_type == "individual" + assert matches[0].details is not None + assert matches[0].details["undeclared_type"] == str(EX.Person) + + +async def test_orphan_individual_does_not_flag_declared_type() -> None: + """Individual whose rdf:type target is a declared class is not flagged.""" + g = Graph() + g.add((EX.Person, RDF.type, OWL.Class)) + g.add((EX.Alice, RDF.type, OWL.NamedIndividual)) + g.add((EX.Alice, RDF.type, EX.Person)) + + linter = OntologyLinter(enabled_rules={"orphan-individual"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "orphan-individual") == [] + + +async def test_orphan_individual_emits_one_finding_per_undeclared_type() -> None: + """An individual with two undeclared types yields two findings.""" + g = Graph() + g.add((EX.Alice, RDF.type, OWL.NamedIndividual)) + g.add((EX.Alice, RDF.type, EX.Person)) + g.add((EX.Alice, RDF.type, EX.Employee)) + + linter = OntologyLinter(enabled_rules={"orphan-individual"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "orphan-individual") + flagged = {(m.subject_iri, m.details["undeclared_type"]) for m in matches if m.details} + assert flagged == { + (str(EX.Alice), str(EX.Person)), + (str(EX.Alice), str(EX.Employee)), + } From ae82bb5aaa75ce9b467fad5ac446d5277cc8eb00 Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 14:51:07 +0200 Subject: [PATCH 06/20] feat(lint): add empty-domain rule (#99) Flags ObjectProperty and DatatypeProperty declarations with no rdfs:domain. AnnotationProperty is intentionally excluded (annotations are by convention domain-agnostic). L4 (Quality). --- ontokit/services/linter.py | 29 +++++++++++++++++++++ tests/unit/test_linter.py | 53 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 82 insertions(+) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index a1320752..b70f3d28 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -195,6 +195,13 @@ class LintRuleInfo: severity=LintIssueType.WARNING.value, scope=["individual"], ), + LintRuleInfo( + rule_id="empty-domain", + name="Empty Domain", + description="ObjectProperty or DatatypeProperty has no rdfs:domain", + severity=LintIssueType.INFO.value, + scope=["property"], + ), ] # Map rule IDs to their info @@ -221,6 +228,7 @@ class LintRuleInfo: "label-per-language", "redundant-regional-label", "unused-property", + "empty-domain", } _LEVEL_5_RULES: set[str] = {r.rule_id for r in LINT_RULES} @@ -1376,6 +1384,27 @@ async def _check_orphan_individual(self, graph: Graph) -> list[LintResult]: ) return issues + async def _check_empty_domain(self, graph: Graph) -> list[LintResult]: + """Flag ObjectProperty/DatatypeProperty declarations with no rdfs:domain.""" + issues: list[LintResult] = [] + for prop_type in (OWL.ObjectProperty, OWL.DatatypeProperty): + for prop in graph.subjects(RDF.type, prop_type): + if not isinstance(prop, URIRef): + continue + if any(graph.objects(prop, RDFS.domain)): + continue + issues.append( + LintResult( + issue_type=LintIssueType.INFO.value, + rule_id="empty-domain", + message="Property has no rdfs:domain", + subject_iri=str(prop), + subject_type="property", + details={"local_name": self._get_local_name(prop)}, + ) + ) + return issues + @staticmethod def _determine_entity_type(graph: Graph, uri: URIRef) -> str: """Return 'class', 'property', 'individual', or 'other' for a URI.""" diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index 02785bc9..ab42b079 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -1012,3 +1012,56 @@ async def test_orphan_individual_emits_one_finding_per_undeclared_type() -> None (str(EX.Alice), str(EX.Person)), (str(EX.Alice), str(EX.Employee)), } + + +# --------------------------------------------------------------------------- +# 26. empty-domain +# --------------------------------------------------------------------------- + + +async def test_empty_domain_flags_object_property_without_domain() -> None: + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + + linter = OntologyLinter(enabled_rules={"empty-domain"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "empty-domain") + assert len(matches) == 1 + assert matches[0].issue_type == "info" + assert matches[0].subject_iri == str(EX.knows) + assert matches[0].subject_type == "property" + + +async def test_empty_domain_flags_datatype_property_without_domain() -> None: + g = Graph() + g.add((EX.age, RDF.type, OWL.DatatypeProperty)) + + linter = OntologyLinter(enabled_rules={"empty-domain"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "empty-domain") + assert len(matches) == 1 + assert matches[0].subject_iri == str(EX.age) + + +async def test_empty_domain_does_not_flag_property_with_domain() -> None: + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + g.add((EX.knows, RDFS.domain, EX.Person)) + + linter = OntologyLinter(enabled_rules={"empty-domain"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "empty-domain") == [] + + +async def test_empty_domain_does_not_flag_annotation_property() -> None: + """AnnotationProperty is excluded from the empty-domain check.""" + g = Graph() + g.add((EX.note, RDF.type, OWL.AnnotationProperty)) + + linter = OntologyLinter(enabled_rules={"empty-domain"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "empty-domain") == [] From 69b3b8d3a1c27bac26cccda10a534c9bb7a27339 Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 14:56:04 +0200 Subject: [PATCH 07/20] feat(lint): add empty-range rule (#99) Flags ObjectProperty and DatatypeProperty declarations with no rdfs:range. AnnotationProperty intentionally excluded. L4 (Quality). --- ontokit/services/linter.py | 29 +++++++++++++++++++++++++++++ tests/unit/test_linter.py | 29 +++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index b70f3d28..120463ac 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -202,6 +202,13 @@ class LintRuleInfo: severity=LintIssueType.INFO.value, scope=["property"], ), + LintRuleInfo( + rule_id="empty-range", + name="Empty Range", + description="ObjectProperty or DatatypeProperty has no rdfs:range", + severity=LintIssueType.INFO.value, + scope=["property"], + ), ] # Map rule IDs to their info @@ -229,6 +236,7 @@ class LintRuleInfo: "redundant-regional-label", "unused-property", "empty-domain", + "empty-range", } _LEVEL_5_RULES: set[str] = {r.rule_id for r in LINT_RULES} @@ -1405,6 +1413,27 @@ async def _check_empty_domain(self, graph: Graph) -> list[LintResult]: ) return issues + async def _check_empty_range(self, graph: Graph) -> list[LintResult]: + """Flag ObjectProperty/DatatypeProperty declarations with no rdfs:range.""" + issues: list[LintResult] = [] + for prop_type in (OWL.ObjectProperty, OWL.DatatypeProperty): + for prop in graph.subjects(RDF.type, prop_type): + if not isinstance(prop, URIRef): + continue + if any(graph.objects(prop, RDFS.range)): + continue + issues.append( + LintResult( + issue_type=LintIssueType.INFO.value, + rule_id="empty-range", + message="Property has no rdfs:range", + subject_iri=str(prop), + subject_type="property", + details={"local_name": self._get_local_name(prop)}, + ) + ) + return issues + @staticmethod def _determine_entity_type(graph: Graph, uri: URIRef) -> str: """Return 'class', 'property', 'individual', or 'other' for a URI.""" diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index ab42b079..e7587fe4 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -1065,3 +1065,32 @@ async def test_empty_domain_does_not_flag_annotation_property() -> None: issues = await linter.lint(g, PROJECT_ID) assert _results_with_rule(issues, "empty-domain") == [] + + +# --------------------------------------------------------------------------- +# 27. empty-range +# --------------------------------------------------------------------------- + + +async def test_empty_range_flags_object_property_without_range() -> None: + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + + linter = OntologyLinter(enabled_rules={"empty-range"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "empty-range") + assert len(matches) == 1 + assert matches[0].issue_type == "info" + assert matches[0].subject_iri == str(EX.knows) + + +async def test_empty_range_does_not_flag_property_with_range() -> None: + g = Graph() + g.add((EX.age, RDF.type, OWL.DatatypeProperty)) + g.add((EX.age, RDFS.range, XSD.integer)) + + linter = OntologyLinter(enabled_rules={"empty-range"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "empty-range") == [] From dfe8778952bc80a8ad9d0e2e8e7a5c4f81e16c2b Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 15:03:51 +0200 Subject: [PATCH 08/20] feat(lint): add deprecated-parent rule (#99) Flags classes that subclass an owl:deprecated class. Reuses the shared is_deprecated helper from rdf_utils which accepts both boolean and string literal forms. L2 (Consistency). Co-Authored-By: Claude Sonnet 4.6 --- ontokit/services/linter.py | 36 +++++++++++++++++++++++++++ tests/unit/test_linter.py | 50 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 86 insertions(+) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index 120463ac..ad7b4cde 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -12,6 +12,7 @@ from rdflib.namespace import OWL, RDF, RDFS, SKOS, XSD from ontokit.models.lint import LintIssueType +from ontokit.services.rdf_utils import is_deprecated DC = Namespace("http://purl.org/dc/elements/1.1/") DCTERMS = Namespace("http://purl.org/dc/terms/") @@ -209,6 +210,13 @@ class LintRuleInfo: severity=LintIssueType.INFO.value, scope=["property"], ), + LintRuleInfo( + rule_id="deprecated-parent", + name="Deprecated Parent", + description="Class subclasses a class marked owl:deprecated", + severity=LintIssueType.WARNING.value, + scope=["class"], + ), ] # Map rule IDs to their info @@ -222,6 +230,7 @@ class LintRuleInfo: "disjoint-violation", "missing-type-declaration", "orphan-individual", + "deprecated-parent", } _LEVEL_3_RULES: set[str] = _LEVEL_2_RULES | { "missing-label", @@ -1434,6 +1443,33 @@ async def _check_empty_range(self, graph: Graph) -> list[LintResult]: ) return issues + async def _check_deprecated_parent(self, graph: Graph) -> list[LintResult]: + """Flag classes that subclass an owl:deprecated class.""" + issues: list[LintResult] = [] + for cls in graph.subjects(RDF.type, OWL.Class): + if not isinstance(cls, URIRef): + continue + for parent in graph.objects(cls, RDFS.subClassOf): + if not isinstance(parent, URIRef): + continue + if not is_deprecated(graph, parent): + continue + issues.append( + LintResult( + issue_type=LintIssueType.WARNING.value, + rule_id="deprecated-parent", + message=f"Parent class {parent} is deprecated", + subject_iri=str(cls), + subject_type="class", + details={ + "local_name": self._get_local_name(cls), + "deprecated_parent": str(parent), + "deprecated_parent_local": self._get_local_name(parent), + }, + ) + ) + return issues + @staticmethod def _determine_entity_type(graph: Graph, uri: URIRef) -> str: """Return 'class', 'property', 'individual', or 'other' for a URI.""" diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index e7587fe4..6b93a99b 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -1094,3 +1094,53 @@ async def test_empty_range_does_not_flag_property_with_range() -> None: issues = await linter.lint(g, PROJECT_ID) assert _results_with_rule(issues, "empty-range") == [] + + +# --------------------------------------------------------------------------- +# 28. deprecated-parent +# --------------------------------------------------------------------------- + + +async def test_deprecated_parent_flags_subclass_of_deprecated_class() -> None: + g = Graph() + g.add((EX.OldThing, RDF.type, OWL.Class)) + g.add((EX.OldThing, OWL.deprecated, Literal(True))) + g.add((EX.NewThing, RDF.type, OWL.Class)) + g.add((EX.NewThing, RDFS.subClassOf, EX.OldThing)) + + linter = OntologyLinter(enabled_rules={"deprecated-parent"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "deprecated-parent") + assert len(matches) == 1 + assert matches[0].issue_type == "warning" + assert matches[0].subject_iri == str(EX.NewThing) + assert matches[0].subject_type == "class" + assert matches[0].details is not None + assert matches[0].details["deprecated_parent"] == str(EX.OldThing) + + +async def test_deprecated_parent_does_not_flag_non_deprecated_parent() -> None: + g = Graph() + g.add((EX.Animal, RDF.type, OWL.Class)) + g.add((EX.Dog, RDF.type, OWL.Class)) + g.add((EX.Dog, RDFS.subClassOf, EX.Animal)) + + linter = OntologyLinter(enabled_rules={"deprecated-parent"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "deprecated-parent") == [] + + +async def test_deprecated_parent_recognizes_string_true() -> None: + """is_deprecated accepts case-insensitive 'true' / '1' literals.""" + g = Graph() + g.add((EX.OldThing, RDF.type, OWL.Class)) + g.add((EX.OldThing, OWL.deprecated, Literal("true"))) + g.add((EX.NewThing, RDF.type, OWL.Class)) + g.add((EX.NewThing, RDFS.subClassOf, EX.OldThing)) + + linter = OntologyLinter(enabled_rules={"deprecated-parent"}) + issues = await linter.lint(g, PROJECT_ID) + + assert len(_results_with_rule(issues, "deprecated-parent")) == 1 From 83b2ae35e122336333e63e8d59165171d369d38f Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 15:10:39 +0200 Subject: [PATCH 09/20] feat(lint): add multi-root rule (#99) Fires once when an ontology has more than 5 root classes (classes with no parent except owl:Thing). Ontology-scope finding: subject_iri=None, subject_type='other'. L4 (Quality). Co-Authored-By: Claude Sonnet 4.6 --- ontokit/services/linter.py | 39 ++++++++++++++++++++++++++ tests/unit/test_linter.py | 57 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 96 insertions(+) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index ad7b4cde..821e8de1 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -217,6 +217,13 @@ class LintRuleInfo: severity=LintIssueType.WARNING.value, scope=["class"], ), + LintRuleInfo( + rule_id="multi-root", + name="Multiple Root Classes", + description="Ontology has more than 5 root classes (classes with no parent except owl:Thing)", + severity=LintIssueType.INFO.value, + scope=[], + ), ] # Map rule IDs to their info @@ -246,6 +253,7 @@ class LintRuleInfo: "unused-property", "empty-domain", "empty-range", + "multi-root", } _LEVEL_5_RULES: set[str] = {r.rule_id for r in LINT_RULES} @@ -1470,6 +1478,37 @@ async def _check_deprecated_parent(self, graph: Graph) -> list[LintResult]: ) return issues + async def _check_multi_root(self, graph: Graph) -> list[LintResult]: + """Fire once if the ontology has more than 5 root classes.""" + root_iris: list[str] = [] + for cls in graph.subjects(RDF.type, OWL.Class): + if not isinstance(cls, URIRef) or cls == OWL.Thing: + continue + has_real_parent = any( + isinstance(p, URIRef) and p != OWL.Thing + for p in graph.objects(cls, RDFS.subClassOf) + ) + if not has_real_parent: + root_iris.append(str(cls)) + + if len(root_iris) <= 5: + return [] + + return [ + LintResult( + issue_type=LintIssueType.INFO.value, + rule_id="multi-root", + message=f"Ontology has {len(root_iris)} root classes (classes with no parent)", + subject_iri=None, + subject_type="other", + details={ + "root_count": len(root_iris), + # Cap at 20 to keep the payload small even on huge ontologies. + "root_iris": sorted(root_iris)[:20], + }, + ) + ] + @staticmethod def _determine_entity_type(graph: Graph, uri: URIRef) -> str: """Return 'class', 'property', 'individual', or 'other' for a URI.""" diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index 6b93a99b..1a7142e3 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -1144,3 +1144,60 @@ async def test_deprecated_parent_recognizes_string_true() -> None: issues = await linter.lint(g, PROJECT_ID) assert len(_results_with_rule(issues, "deprecated-parent")) == 1 + + +# --------------------------------------------------------------------------- +# 29. multi-root +# --------------------------------------------------------------------------- + + +async def test_multi_root_does_not_fire_below_threshold() -> None: + """Five or fewer root classes does NOT fire.""" + g = Graph() + for i in range(5): + g.add((URIRef(f"http://example.org/Root{i}"), RDF.type, OWL.Class)) + + linter = OntologyLinter(enabled_rules={"multi-root"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "multi-root") == [] + + +async def test_multi_root_fires_above_threshold() -> None: + """Six root classes triggers a single ontology-scope finding.""" + g = Graph() + for i in range(6): + g.add((URIRef(f"http://example.org/Root{i}"), RDF.type, OWL.Class)) + + linter = OntologyLinter(enabled_rules={"multi-root"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "multi-root") + assert len(matches) == 1 + assert matches[0].issue_type == "info" + assert matches[0].subject_iri is None + assert matches[0].subject_type == "other" + assert matches[0].details is not None + assert matches[0].details["root_count"] == 6 + + +async def test_multi_root_excludes_classes_with_explicit_parent() -> None: + """Classes with a non-owl:Thing parent don't count as roots.""" + g = Graph() + g.add((EX.Animal, RDF.type, OWL.Class)) + # 5 roots + 1 non-root subclass = still 5 numeric roots... PLUS Animal = 6 roots total. + for i in range(5): + g.add((URIRef(f"http://example.org/Root{i}"), RDF.type, OWL.Class)) + g.add((EX.Dog, RDF.type, OWL.Class)) + g.add((EX.Dog, RDFS.subClassOf, EX.Animal)) + # EX.Animal itself is a root, so we have 6 roots when including it + # → fires. Verify the count excludes EX.Dog. + + linter = OntologyLinter(enabled_rules={"multi-root"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "multi-root") + assert len(matches) == 1 + assert matches[0].details is not None + assert matches[0].details["root_count"] == 6 + assert str(EX.Dog) not in matches[0].details["root_iris"] From 49b5d5d9689efa3bda2755d5c3aba4b69df24199 Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 15:22:34 +0200 Subject: [PATCH 10/20] refactor(lint): rename undefined-parent to dangling-ref (#99) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pure rename. Behavior unchanged — the rule still only checks subClassOf targets. Predicate-axis expansion comes in the next commit. Existing LintIssue rows with rule_id='undefined-parent' are left in place; they're snapshots of past runs and lint results are regenerable. --- ontokit/services/linter.py | 16 ++++++++-------- tests/unit/test_lint_config.py | 2 +- tests/unit/test_linter.py | 10 +++++----- 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index 821e8de1..b6c14be9 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -68,11 +68,11 @@ class LintRuleInfo: scope=["class"], ), LintRuleInfo( - rule_id="undefined-parent", - name="Undefined Parent", - description="Class references a parent that is not defined in the ontology", + rule_id="dangling-ref", + name="Dangling Reference", + description="Reference to a URI not defined in the ontology (in subClassOf, rdfs:domain, or rdfs:range)", severity=LintIssueType.ERROR.value, - scope=["class"], + scope=["class", "property"], ), LintRuleInfo( rule_id="circular-hierarchy", @@ -230,7 +230,7 @@ class LintRuleInfo: LINT_RULES_MAP: dict[str, LintRuleInfo] = {rule.rule_id: rule for rule in LINT_RULES} # Progressive lint levels — each level cumulatively includes the previous -_LEVEL_1_RULES: set[str] = {"undefined-parent", "circular-hierarchy", "undefined-prefix"} +_LEVEL_1_RULES: set[str] = {"dangling-ref", "circular-hierarchy", "undefined-prefix"} _LEVEL_2_RULES: set[str] = _LEVEL_1_RULES | { "orphan-class", "duplicate-triple", @@ -279,7 +279,7 @@ class LintLevelDefinition(NamedTuple): LINT_LEVEL_DEFINITIONS: dict[int, LintLevelDefinition] = { 1: LintLevelDefinition( "Critical", - "Undefined parents, circular hierarchies, undefined prefixes", + "Dangling references, circular hierarchies, undefined prefixes", LINT_LEVELS[1], ), 2: LintLevelDefinition( @@ -441,7 +441,7 @@ async def _check_orphan_class(self, graph: Graph) -> list[LintResult]: return issues - async def _check_undefined_parent(self, graph: Graph) -> list[LintResult]: + async def _check_dangling_ref(self, graph: Graph) -> list[LintResult]: """Find classes that reference undefined parent classes.""" issues = [] @@ -467,7 +467,7 @@ async def _check_undefined_parent(self, graph: Graph) -> list[LintResult]: issues.append( LintResult( issue_type=LintIssueType.ERROR.value, - rule_id="undefined-parent", + rule_id="dangling-ref", message="References undefined parent class", subject_iri=str(class_uri), subject_type="class", diff --git a/tests/unit/test_lint_config.py b/tests/unit/test_lint_config.py index 18c700d4..39682dc6 100644 --- a/tests/unit/test_lint_config.py +++ b/tests/unit/test_lint_config.py @@ -42,7 +42,7 @@ class TestLintLevels: def test_level_1_critical_rules(self) -> None: """Level 1 contains only critical structural rules.""" rules = get_rules_for_level(1) - assert rules == {"undefined-parent", "circular-hierarchy", "undefined-prefix"} + assert rules == {"dangling-ref", "circular-hierarchy", "undefined-prefix"} def test_level_2_includes_level_1(self) -> None: """Level 2 is a superset of level 1.""" diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index 1a7142e3..a8d5acb1 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -221,10 +221,10 @@ async def test_undefined_parent() -> None: # Parent is NOT declared as an owl:Class in the graph g.add((EX.Child, RDFS.subClassOf, EX.Phantom)) - linter = OntologyLinter(enabled_rules={"undefined-parent"}) + linter = OntologyLinter(enabled_rules={"dangling-ref"}) issues = await linter.lint(g, PROJECT_ID) - matches = _results_with_rule(issues, "undefined-parent") + matches = _results_with_rule(issues, "dangling-ref") assert len(matches) == 1 assert matches[0].issue_type == "error" assert matches[0].subject_iri == str(EX.Child) @@ -239,10 +239,10 @@ async def test_no_undefined_parent_when_defined() -> None: g.add((EX.Child, RDF.type, OWL.Class)) g.add((EX.Child, RDFS.subClassOf, EX.Parent)) - linter = OntologyLinter(enabled_rules={"undefined-parent"}) + linter = OntologyLinter(enabled_rules={"dangling-ref"}) issues = await linter.lint(g, PROJECT_ID) - matches = _results_with_rule(issues, "undefined-parent") + matches = _results_with_rule(issues, "dangling-ref") assert len(matches) == 0 @@ -283,7 +283,7 @@ async def test_lint_all_rules() -> None: "circular-hierarchy", "empty-label", "duplicate-label", - "undefined-parent", + "dangling-ref", ): assert _results_with_rule(issues, rule_id) == [], f"Unexpected issue for rule '{rule_id}'" From 3ad4f3ad7af8b9cba7aab17891646ce6e8f4fadf Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 15:33:20 +0200 Subject: [PATCH 11/20] fix(lint): refresh stale comment + docstring after dangling-ref rename MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Address code review on Task 7 (#99): * test_linter.py:279 — comment said "undefined-parent issues expected" * linter.py:_check_dangling_ref — docstring still described the old rule. Both updated to refer to dangling-ref; the docstring also notes that domain/range coverage arrives in the next commit. --- ontokit/services/linter.py | 6 +++++- tests/unit/test_linter.py | 2 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index b6c14be9..5269b05a 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -442,7 +442,11 @@ async def _check_orphan_class(self, graph: Graph) -> list[LintResult]: return issues async def _check_dangling_ref(self, graph: Graph) -> list[LintResult]: - """Find classes that reference undefined parent classes.""" + """Find references to URIs not defined in the ontology. + + Currently checks subClassOf targets; domain and range coverage is + added in the next commit. + """ issues = [] # Build set of all defined classes diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index a8d5acb1..176e7b27 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -276,7 +276,7 @@ async def test_lint_all_rules() -> None: issues = await linter.lint(g, PROJECT_ID) # No missing-label, missing-comment, orphan, circular, empty, duplicate, - # or undefined-parent issues expected + # or dangling-ref issues expected for rule_id in ( "missing-label", "missing-comment", From 9865910ea2ac79b2f3022612f77f01672b090bf4 Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 15:38:30 +0200 Subject: [PATCH 12/20] feat(lint): expand dangling-ref to cover domain and range (#99) The rule now inspects rdfs:subClassOf, rdfs:domain, and rdfs:range targets uniformly. Each finding carries details.predicate so the UI can show which axis triggered the dangling reference. References into well-known namespaces (rdf/rdfs/owl/xsd/skos/dc/dcterms) and into namespaces declared via owl:imports are skipped, mirroring consistency_service._check_dangling_ref. The details payload is also reshaped: undefined_parent / undefined_parent_local are renamed to dangling_target / dangling_target_local since the rule no longer applies only to parent classes. --- ontokit/services/linter.py | 94 ++++++++++++++++++++++++-------------- tests/unit/test_linter.py | 82 ++++++++++++++++++++++++++++++++- 2 files changed, 140 insertions(+), 36 deletions(-) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index 5269b05a..c295a8b9 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -442,48 +442,72 @@ async def _check_orphan_class(self, graph: Graph) -> list[LintResult]: return issues async def _check_dangling_ref(self, graph: Graph) -> list[LintResult]: - """Find references to URIs not defined in the ontology. + """Find references to URIs that aren't declared in this ontology. - Currently checks subClassOf targets; domain and range coverage is - added in the next commit. + Scans rdfs:subClassOf, rdfs:domain, and rdfs:range. References into + well-known vocabularies (rdf/rdfs/owl/xsd/skos/dc/dcterms) and into + namespaces brought in via owl:imports are not flagged. """ - issues = [] + issues: list[LintResult] = [] - # Build set of all defined classes - defined_classes = { - str(c) for c in graph.subjects(RDF.type, OWL.Class) if isinstance(c, URIRef) + # A URI is "known" if it appears as a subject of any rdf:type triple + # OR as a subject of any triple at all (covers blank-node-free uses). + declared_subjects: set[URIRef] = { + s for s in graph.subjects(RDF.type, None) if isinstance(s, URIRef) } - # Add owl:Thing as it's always implicitly defined - defined_classes.add(str(OWL.Thing)) - - for class_uri in graph.subjects(RDF.type, OWL.Class): - if not isinstance(class_uri, URIRef): - continue + all_subjects: set[URIRef] = {s for s in graph.subjects() if isinstance(s, URIRef)} + known: set[URIRef] = declared_subjects | all_subjects | {OWL.Thing} - # Check each parent - for parent_uri in graph.objects(class_uri, RDFS.subClassOf): - if not isinstance(parent_uri, URIRef): + well_known_ns = { + str(RDF), + str(RDFS), + str(OWL), + str(XSD), + str(SKOS), + str(DC), + str(DCTERMS), + } + imported_ns: set[str] = set() + for _ontology, _pred, imported in graph.triples((None, OWL.imports, None)): + if isinstance(imported, URIRef): + imp_str = str(imported) + if not imp_str.endswith(("/", "#")): + imp_str += "/" + imported_ns.add(imp_str) + external_ns = well_known_ns | imported_ns + + # (subject_iri, predicate, target) keyed reporting to deduplicate + # when the same triple would be reported by multiple iterations. + reported: set[tuple[str, str, str]] = set() + + for predicate in (RDFS.subClassOf, RDFS.domain, RDFS.range): + for subj, _p, obj in graph.triples((None, predicate, None)): + if not isinstance(obj, URIRef) or not isinstance(subj, URIRef): continue - - parent_str = str(parent_uri) - if parent_str not in defined_classes: - label = self._get_label(graph, class_uri) - issues.append( - LintResult( - issue_type=LintIssueType.ERROR.value, - rule_id="dangling-ref", - message="References undefined parent class", - subject_iri=str(class_uri), - subject_type="class", - details={ - "local_name": self._get_local_name(class_uri), - "label": label, - "undefined_parent": parent_str, - "undefined_parent_local": self._get_local_name(parent_uri), - }, - ) + if obj == OWL.Thing or obj in known: + continue + obj_str = str(obj) + if any(obj_str.startswith(ns) for ns in external_ns): + continue + key = (str(subj), str(predicate), obj_str) + if key in reported: + continue + reported.add(key) + issues.append( + LintResult( + issue_type=LintIssueType.ERROR.value, + rule_id="dangling-ref", + message=f"References undeclared entity {obj}", + subject_iri=str(subj), + subject_type=self._determine_entity_type(graph, subj), + details={ + "local_name": self._get_local_name(subj), + "predicate": str(predicate), + "dangling_target": obj_str, + "dangling_target_local": self._get_local_name(obj), + }, ) - + ) return issues async def _check_circular_hierarchy(self, graph: Graph) -> list[LintResult]: diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index 176e7b27..59a83bed 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -229,7 +229,7 @@ async def test_undefined_parent() -> None: assert matches[0].issue_type == "error" assert matches[0].subject_iri == str(EX.Child) assert matches[0].details is not None - assert matches[0].details["undefined_parent"] == str(EX.Phantom) + assert matches[0].details["dangling_target"] == str(EX.Phantom) async def test_no_undefined_parent_when_defined() -> None: @@ -1201,3 +1201,83 @@ async def test_multi_root_excludes_classes_with_explicit_parent() -> None: assert matches[0].details is not None assert matches[0].details["root_count"] == 6 assert str(EX.Dog) not in matches[0].details["root_iris"] + + +# --------------------------------------------------------------------------- +# 30. dangling-ref (domain/range expansion) +# --------------------------------------------------------------------------- + + +async def test_dangling_ref_flags_undefined_domain() -> None: + """Property whose rdfs:domain points to an undeclared URI is flagged.""" + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + g.add((EX.knows, RDFS.domain, EX.UndeclaredClass)) + + linter = OntologyLinter(enabled_rules={"dangling-ref"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "dangling-ref") + assert len(matches) == 1 + assert matches[0].subject_iri == str(EX.knows) + assert matches[0].details is not None + assert matches[0].details["predicate"] == str(RDFS.domain) + assert matches[0].details["dangling_target"] == str(EX.UndeclaredClass) + + +async def test_dangling_ref_flags_undefined_range() -> None: + g = Graph() + g.add((EX.age, RDF.type, OWL.DatatypeProperty)) + g.add((EX.age, RDFS.range, EX.UndeclaredDatatype)) + + linter = OntologyLinter(enabled_rules={"dangling-ref"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "dangling-ref") + assert len(matches) == 1 + assert matches[0].subject_iri == str(EX.age) + assert matches[0].details is not None + assert matches[0].details["predicate"] == str(RDFS.range) + + +async def test_dangling_ref_subclassof_includes_predicate_detail() -> None: + """The existing subClassOf path now also reports details.predicate.""" + g = Graph() + g.add((EX.Dog, RDF.type, OWL.Class)) + g.add((EX.Dog, RDFS.subClassOf, EX.UndeclaredAnimal)) + + linter = OntologyLinter(enabled_rules={"dangling-ref"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "dangling-ref") + assert len(matches) == 1 + assert matches[0].details is not None + assert matches[0].details["predicate"] == str(RDFS.subClassOf) + + +async def test_dangling_ref_skips_well_known_namespaces() -> None: + """References into rdf/rdfs/owl/xsd/skos/dcterms must not be flagged.""" + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + g.add((EX.knows, RDFS.range, XSD.string)) + g.add((EX.related, RDF.type, OWL.ObjectProperty)) + g.add((EX.related, RDFS.range, SKOS.Concept)) + + linter = OntologyLinter(enabled_rules={"dangling-ref"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "dangling-ref") == [] + + +async def test_dangling_ref_skips_imported_namespaces() -> None: + """References into namespaces declared via owl:imports must not be flagged.""" + g = Graph() + imported_ns = URIRef("http://other.org/onto") + g.add((URIRef("http://example.org/myonto"), OWL.imports, imported_ns)) + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + g.add((EX.knows, RDFS.range, URIRef("http://other.org/onto/Person"))) + + linter = OntologyLinter(enabled_rules={"dangling-ref"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "dangling-ref") == [] From e2e5adbe700ce64dd60742609774460e3c17cb12 Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 15:47:14 +0200 Subject: [PATCH 13/20] fix(lint): tidy _check_dangling_ref after review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Address code review on Task 8 (#99): * Drop the redundant declared_subjects intermediate — graph.subjects() already returns a superset, so the union was a no-op. * Drop the now-redundant 'obj == OWL.Thing or' guard since OWL.Thing is already in the known set. * Test docstring listed 6 of 7 well-known namespaces; add the missing 'dc' so the doc matches the implementation. Co-Authored-By: Claude Sonnet 4.6 --- ontokit/services/linter.py | 11 +++-------- tests/unit/test_linter.py | 2 +- 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index c295a8b9..d1f50e7f 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -450,13 +450,8 @@ async def _check_dangling_ref(self, graph: Graph) -> list[LintResult]: """ issues: list[LintResult] = [] - # A URI is "known" if it appears as a subject of any rdf:type triple - # OR as a subject of any triple at all (covers blank-node-free uses). - declared_subjects: set[URIRef] = { - s for s in graph.subjects(RDF.type, None) if isinstance(s, URIRef) - } - all_subjects: set[URIRef] = {s for s in graph.subjects() if isinstance(s, URIRef)} - known: set[URIRef] = declared_subjects | all_subjects | {OWL.Thing} + # A URI is "known" if it appears as a subject of any triple in this graph. + known: set[URIRef] = {s for s in graph.subjects() if isinstance(s, URIRef)} | {OWL.Thing} well_known_ns = { str(RDF), @@ -484,7 +479,7 @@ async def _check_dangling_ref(self, graph: Graph) -> list[LintResult]: for subj, _p, obj in graph.triples((None, predicate, None)): if not isinstance(obj, URIRef) or not isinstance(subj, URIRef): continue - if obj == OWL.Thing or obj in known: + if obj in known: continue obj_str = str(obj) if any(obj_str.startswith(ns) for ns in external_ns): diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index 59a83bed..a93c2fc4 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -1256,7 +1256,7 @@ async def test_dangling_ref_subclassof_includes_predicate_detail() -> None: async def test_dangling_ref_skips_well_known_namespaces() -> None: - """References into rdf/rdfs/owl/xsd/skos/dcterms must not be flagged.""" + """References into rdf/rdfs/owl/xsd/skos/dc/dcterms must not be flagged.""" g = Graph() g.add((EX.knows, RDF.type, OWL.ObjectProperty)) g.add((EX.knows, RDFS.range, XSD.string)) From ef91669a9a08125a4447b42678f3ebcb7f1bdc3b Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 15:55:13 +0200 Subject: [PATCH 14/20] feat(lint): broaden duplicate-label to all entity types, same-type only (#99) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Matching is now case-insensitive, per language, and grouped by entity type — so a class and a property sharing 'knows' is no longer a false positive, but two ObjectProperties with the same label still are. Scope expanded from class-only to all entity types. Mirrors the per- type semantics from consistency_service._check_duplicate_label while keeping the lint rule's case-insensitive matching. --- ontokit/services/linter.py | 84 ++++++++++++++++++++++---------------- tests/unit/test_linter.py | 80 ++++++++++++++++++++++++++++++++++++ 2 files changed, 128 insertions(+), 36 deletions(-) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index d1f50e7f..9f31907d 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -91,7 +91,7 @@ class LintRuleInfo: LintRuleInfo( rule_id="duplicate-label", name="Duplicate Label", - description="Multiple resources share the same label, which may cause confusion", + description="Multiple resources of the same entity type share the same label (case-insensitive, per language)", severity=LintIssueType.WARNING.value, scope=_ALL, ), @@ -594,47 +594,59 @@ async def _check_empty_label(self, graph: Graph) -> list[LintResult]: return issues async def _check_duplicate_label(self, graph: Graph) -> list[LintResult]: - """Find resources that share the same label.""" - issues = [] + """Find resources of the same entity type sharing a label (case-insensitive, per language).""" + issues: list[LintResult] = [] - # Build map of label → list of resource IRIs - label_to_resources: dict[str, list[str]] = defaultdict(list) + # Group by (entity_type, label_lower, lang) → list of resource IRIs. + # Skip resources whose entity_type is "other" — we only group concrete + # types that the schema knows how to navigate. + groups: dict[tuple[str, str, str | None], list[str]] = defaultdict(list) + original_label_for: dict[str, str] = {} for subject in self._uri_subjects: + etype = self._determine_entity_type(graph, subject) + if etype == "other": + continue for label in graph.objects(subject, RDFS.label): - if isinstance(label, RDFLiteral): - label_str = str(label).strip().lower() - if label_str: # Skip empty labels - label_to_resources[label_str].append(str(subject)) + if not isinstance(label, RDFLiteral): + continue + label_str = str(label).strip() + if not label_str: + continue + key = (etype, label_str.lower(), label.language) + groups[key].append(str(subject)) + original_label_for.setdefault(str(subject), label_str) - # Report duplicates reported_iris: set[str] = set() - for _label_str, resource_iris in label_to_resources.items(): - if len(resource_iris) > 1: - for resource_iri in resource_iris: - if resource_iri not in reported_iris: - reported_iris.add(resource_iri) - # Get original (non-lowercased) label - original_label = self._get_label(graph, URIRef(resource_iri)) - other_resources = [c for c in resource_iris if c != resource_iri] - issues.append( - LintResult( - issue_type=LintIssueType.WARNING.value, - rule_id="duplicate-label", - message=f"Label '{original_label}' is shared with {len(other_resources)} other resource(s)", - subject_iri=resource_iri, - subject_type=self._determine_entity_type( - graph, URIRef(resource_iri) - ), - details={ - "local_name": self._get_local_name(URIRef(resource_iri)), - "label": original_label, - "duplicate_iris": other_resources[:5], # Limit to 5 - "total_duplicates": len(other_resources), - }, - ) - ) - + for (_etype, _lower, lang), iris in groups.items(): + if len(iris) < 2: + continue + for iri in iris: + if iri in reported_iris: + continue + reported_iris.add(iri) + others = [o for o in iris if o != iri] + lang_str = f"@{lang}" if lang else "" + shown_label = original_label_for[iri] + issues.append( + LintResult( + issue_type=LintIssueType.WARNING.value, + rule_id="duplicate-label", + message=( + f'Label "{shown_label}"{lang_str} is shared with ' + f"{len(others)} other resource(s) of the same type" + ), + subject_iri=iri, + subject_type=self._determine_entity_type(graph, URIRef(iri)), + details={ + "local_name": self._get_local_name(URIRef(iri)), + "label": shown_label, + "language": lang, + "duplicate_iris": others[:5], + "total_duplicates": len(others), + }, + ) + ) return issues async def _check_label_per_language(self, graph: Graph) -> list[LintResult]: diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index a93c2fc4..e4c5f331 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -1281,3 +1281,83 @@ async def test_dangling_ref_skips_imported_namespaces() -> None: issues = await linter.lint(g, PROJECT_ID) assert _results_with_rule(issues, "dangling-ref") == [] + + +# --------------------------------------------------------------------------- +# 31. duplicate-label (broader semantics) +# --------------------------------------------------------------------------- + + +async def test_duplicate_label_case_insensitive_within_classes() -> None: + """Existing behavior preserved: classes with same label (any case) flagged.""" + g = Graph() + g.add((EX.A, RDF.type, OWL.Class)) + g.add((EX.A, RDFS.label, Literal("Animal", lang="en"))) + g.add((EX.B, RDF.type, OWL.Class)) + g.add((EX.B, RDFS.label, Literal("ANIMAL", lang="en"))) + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "duplicate-label") + assert {m.subject_iri for m in matches} == {str(EX.A), str(EX.B)} + + +async def test_duplicate_label_flags_property_duplicates() -> None: + """Two ObjectProperties sharing a label (case-insensitive) are flagged.""" + g = Graph() + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + g.add((EX.knows, RDFS.label, Literal("knows", lang="en"))) + g.add((EX.acquaintedWith, RDF.type, OWL.ObjectProperty)) + g.add((EX.acquaintedWith, RDFS.label, Literal("Knows", lang="en"))) + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "duplicate-label") + assert {m.subject_iri for m in matches} == {str(EX.knows), str(EX.acquaintedWith)} + for m in matches: + assert m.subject_type == "property" + + +async def test_duplicate_label_flags_individual_duplicates() -> None: + g = Graph() + g.add((EX.Person, RDF.type, OWL.Class)) + g.add((EX.alice1, RDF.type, EX.Person)) + g.add((EX.alice1, RDFS.label, Literal("Alice", lang="en"))) + g.add((EX.alice2, RDF.type, EX.Person)) + g.add((EX.alice2, RDFS.label, Literal("alice", lang="en"))) + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "duplicate-label") + assert {m.subject_iri for m in matches} == {str(EX.alice1), str(EX.alice2)} + + +async def test_duplicate_label_does_not_flag_across_entity_types() -> None: + """A class and a property sharing a label are NOT cross-flagged.""" + g = Graph() + g.add((EX.Knows, RDF.type, OWL.Class)) + g.add((EX.Knows, RDFS.label, Literal("knows", lang="en"))) + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + g.add((EX.knows, RDFS.label, Literal("Knows", lang="en"))) + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "duplicate-label") == [] + + +async def test_duplicate_label_separates_languages() -> None: + """Same label string in different languages is not a duplicate.""" + g = Graph() + g.add((EX.A, RDF.type, OWL.Class)) + g.add((EX.A, RDFS.label, Literal("Hund", lang="de"))) + g.add((EX.B, RDF.type, OWL.Class)) + g.add((EX.B, RDFS.label, Literal("Hund", lang="en"))) + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "duplicate-label") == [] From 995c361560e45ebfa901acc07d85f7c94783c6ac Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 16:18:17 +0200 Subject: [PATCH 15/20] fix(lint): tidy _check_duplicate_label after review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Address code review on Task 9 (#99): * Dedup IRIs in the per-group list — a resource whose two labels fold to the same key (e.g. 'Apple' and 'apple') was being inserted twice and inflating total_duplicates / duplicate_iris. * Normalise language tags with .lower() in the grouping key, matching _check_label_per_language and BCP 47's case-insensitive semantics. * Use the etype from the group key for subject_type instead of re-invoking _determine_entity_type per emit. * Tests: add explicit subject_type assertions in the class and individual duplicate tests so the per-type emission is verified. Co-Authored-By: Claude Sonnet 4.6 --- ontokit/services/linter.py | 13 ++++++++----- tests/unit/test_linter.py | 4 ++++ 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index 9f31907d..7435dd9c 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -613,12 +613,15 @@ async def _check_duplicate_label(self, graph: Graph) -> list[LintResult]: label_str = str(label).strip() if not label_str: continue - key = (etype, label_str.lower(), label.language) - groups[key].append(str(subject)) - original_label_for.setdefault(str(subject), label_str) + lang_key = label.language.lower() if label.language else None + key = (etype, label_str.lower(), lang_key) + subj_iri = str(subject) + if subj_iri not in groups[key]: + groups[key].append(subj_iri) + original_label_for.setdefault(subj_iri, label_str) reported_iris: set[str] = set() - for (_etype, _lower, lang), iris in groups.items(): + for (etype, _lower, lang), iris in groups.items(): if len(iris) < 2: continue for iri in iris: @@ -637,7 +640,7 @@ async def _check_duplicate_label(self, graph: Graph) -> list[LintResult]: f"{len(others)} other resource(s) of the same type" ), subject_iri=iri, - subject_type=self._determine_entity_type(graph, URIRef(iri)), + subject_type=etype, details={ "local_name": self._get_local_name(URIRef(iri)), "label": shown_label, diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index e4c5f331..be359373 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -1301,6 +1301,8 @@ async def test_duplicate_label_case_insensitive_within_classes() -> None: matches = _results_with_rule(issues, "duplicate-label") assert {m.subject_iri for m in matches} == {str(EX.A), str(EX.B)} + for m in matches: + assert m.subject_type == "class" async def test_duplicate_label_flags_property_duplicates() -> None: @@ -1333,6 +1335,8 @@ async def test_duplicate_label_flags_individual_duplicates() -> None: matches = _results_with_rule(issues, "duplicate-label") assert {m.subject_iri for m in matches} == {str(EX.alice1), str(EX.alice2)} + for m in matches: + assert m.subject_type == "individual" async def test_duplicate_label_does_not_flag_across_entity_types() -> None: From 729c8373bfcd22378fab781bdb5d9882d91a4aa4 Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 16:20:31 +0200 Subject: [PATCH 16/20] feat(lint): refresh L2 and L4 descriptions for new rules (#99) L2 now mentions orphan-individual and deprecated-parent. L4 now mentions unused-property, empty-domain/range, and multi-root. --- ontokit/services/linter.py | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index 7435dd9c..af81acef 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -284,7 +284,10 @@ class LintLevelDefinition(NamedTuple): ), 2: LintLevelDefinition( "Consistency", - "Orphan classes, duplicate triples, and disjointness violations", + ( + "Orphan classes, duplicate triples, disjointness violations, " + "orphan individuals, deprecated parent classes" + ), LINT_LEVELS[2], ), 3: LintLevelDefinition( @@ -294,7 +297,10 @@ class LintLevelDefinition(NamedTuple): ), 4: LintLevelDefinition( "Quality", - "Comments, per-language label checks, and redundant regional variants", + ( + "Comments, per-language label checks, redundant regional variants, " + "unused properties, empty domain/range, multi-root warnings" + ), LINT_LEVELS[4], ), 5: LintLevelDefinition( From c1f03aea950bfbc9a6e3b80b00d268065a36bc86 Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 16:25:19 +0200 Subject: [PATCH 17/20] test(lint): assert level membership for new/renamed rules (#99) Locks in the level placements from the design doc so a future accidental edit to LINT_LEVELS gets caught at test time. Co-Authored-By: Claude Sonnet 4.6 --- tests/unit/test_linter.py | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index be359373..3a461c3f 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -1365,3 +1365,28 @@ async def test_duplicate_label_separates_languages() -> None: issues = await linter.lint(g, PROJECT_ID) assert _results_with_rule(issues, "duplicate-label") == [] + + +# --------------------------------------------------------------------------- +# 32. Level membership for new and renamed rules (#99) +# --------------------------------------------------------------------------- + + +def test_lint_levels_include_new_and_renamed_rules() -> None: + """Each rule introduced or renamed in #99 is in the expected lint level.""" + from ontokit.services.linter import LINT_LEVELS + + # L1 — dangling-ref replaces undefined-parent. + assert "dangling-ref" in LINT_LEVELS[1] + assert "undefined-parent" not in LINT_LEVELS[1] + assert "undefined-parent" not in LINT_LEVELS[5] + + # L2 — orphan-individual and deprecated-parent join existing consistency rules. + assert "orphan-individual" in LINT_LEVELS[2] + assert "deprecated-parent" in LINT_LEVELS[2] + + # L4 — quality-style additions. + assert "unused-property" in LINT_LEVELS[4] + assert "empty-domain" in LINT_LEVELS[4] + assert "empty-range" in LINT_LEVELS[4] + assert "multi-root" in LINT_LEVELS[4] From 905291396153b504ec9e55fa1ff3c13ecdc4cfba Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 18:04:35 +0200 Subject: [PATCH 18/20] fix(lint): polish PR after review (#99) Address CodeRabbit + cumulative-review feedback in one pass: * _check_orphan_individual now iterates all subjects with rdf:type and filters via _determine_entity_type, catching implicit individuals declared with just rdf:type ex:Person (no owl:NamedIndividual marker). * RDFS.Class coverage extended across deprecated-parent, orphan-individual, and multi-root via a new _class_subjects helper; previously only owl:Class subjects were considered. * _check_duplicate_label keys original_label_for by group key instead of subject IRI so a subject with two labels in different groups reports each group's actual label rather than the first-seen one. * owl:imports namespace handling now considers both the slash and the hash form when the imported IRI has no trailing separator, so hash-style imported terms are no longer flagged as dangling. * Stale test names from the undefined-parent rename are updated. * duplicate-label gets a details-shape assertion. Co-Authored-By: Claude Sonnet 4.6 --- ontokit/services/linter.py | 51 +++++++++++++---- tests/unit/test_linter.py | 112 ++++++++++++++++++++++++++++++++++++- 2 files changed, 148 insertions(+), 15 deletions(-) diff --git a/ontokit/services/linter.py b/ontokit/services/linter.py index af81acef..31bad12d 100644 --- a/ontokit/services/linter.py +++ b/ontokit/services/linter.py @@ -472,9 +472,14 @@ async def _check_dangling_ref(self, graph: Graph) -> list[LintResult]: for _ontology, _pred, imported in graph.triples((None, OWL.imports, None)): if isinstance(imported, URIRef): imp_str = str(imported) - if not imp_str.endswith(("/", "#")): - imp_str += "/" - imported_ns.add(imp_str) + # Both slash and hash forms are valid namespace shapes; if the + # owl:imports IRI already ends in one, keep it as-is, otherwise + # add both variants so terms can be matched either way. + if imp_str.endswith(("/", "#")): + imported_ns.add(imp_str) + else: + imported_ns.add(imp_str + "/") + imported_ns.add(imp_str + "#") external_ns = well_known_ns | imported_ns # (subject_iri, predicate, target) keyed reporting to deduplicate @@ -607,7 +612,10 @@ async def _check_duplicate_label(self, graph: Graph) -> list[LintResult]: # Skip resources whose entity_type is "other" — we only group concrete # types that the schema knows how to navigate. groups: dict[tuple[str, str, str | None], list[str]] = defaultdict(list) - original_label_for: dict[str, str] = {} + # Track one canonical display label per group key (first-seen wins) so the + # message reports the casing that goes with the matched group rather than + # whichever label happened to be iterated first for the subject. + original_label_for_group: dict[tuple[str, str, str | None], str] = {} for subject in self._uri_subjects: etype = self._determine_entity_type(graph, subject) @@ -624,19 +632,19 @@ async def _check_duplicate_label(self, graph: Graph) -> list[LintResult]: subj_iri = str(subject) if subj_iri not in groups[key]: groups[key].append(subj_iri) - original_label_for.setdefault(subj_iri, label_str) + original_label_for_group.setdefault(key, label_str) reported_iris: set[str] = set() for (etype, _lower, lang), iris in groups.items(): if len(iris) < 2: continue + shown_label = original_label_for_group[(etype, _lower, lang)] for iri in iris: if iri in reported_iris: continue reported_iris.add(iri) others = [o for o in iris if o != iri] lang_str = f"@{lang}" if lang else "" - shown_label = original_label_for[iri] issues.append( LintResult( issue_type=LintIssueType.WARNING.value, @@ -1421,14 +1429,23 @@ async def _check_unused_property(self, graph: Graph) -> list[LintResult]: return issues async def _check_orphan_individual(self, graph: Graph) -> list[LintResult]: - """Flag individuals whose rdf:type target is not declared as owl:Class.""" + """Flag individuals whose rdf:type target is not declared as owl:Class or rdfs:Class.""" issues: list[LintResult] = [] - declared_classes = {c for c in graph.subjects(RDF.type, OWL.Class) if isinstance(c, URIRef)} + declared_classes = self._class_subjects(graph) # owl:Thing is implicitly a class even if not declared. declared_classes.add(OWL.Thing) - for ind in graph.subjects(RDF.type, OWL.NamedIndividual): - if not isinstance(ind, URIRef): + # Dedup subjects since graph.subjects(RDF.type, None) may yield the + # same subject multiple times when it has multiple rdf:type values. + seen_individuals: set[URIRef] = set() + for ind in graph.subjects(RDF.type, None): + if not isinstance(ind, URIRef) or ind in seen_individuals: + continue + seen_individuals.add(ind) + # Skip subjects that are themselves classes, properties, or untyped — + # _determine_entity_type returns "individual" only for resources that + # have an rdf:type but aren't declared as a class or property. + if self._determine_entity_type(graph, ind) != "individual": continue for type_target in graph.objects(ind, RDF.type): if not isinstance(type_target, URIRef): @@ -1498,7 +1515,7 @@ async def _check_empty_range(self, graph: Graph) -> list[LintResult]: async def _check_deprecated_parent(self, graph: Graph) -> list[LintResult]: """Flag classes that subclass an owl:deprecated class.""" issues: list[LintResult] = [] - for cls in graph.subjects(RDF.type, OWL.Class): + for cls in self._class_subjects(graph): if not isinstance(cls, URIRef): continue for parent in graph.objects(cls, RDFS.subClassOf): @@ -1525,7 +1542,7 @@ async def _check_deprecated_parent(self, graph: Graph) -> list[LintResult]: async def _check_multi_root(self, graph: Graph) -> list[LintResult]: """Fire once if the ontology has more than 5 root classes.""" root_iris: list[str] = [] - for cls in graph.subjects(RDF.type, OWL.Class): + for cls in self._class_subjects(graph): if not isinstance(cls, URIRef) or cls == OWL.Thing: continue has_real_parent = any( @@ -1553,6 +1570,16 @@ async def _check_multi_root(self, graph: Graph) -> list[LintResult]: ) ] + @staticmethod + def _class_subjects(graph: Graph) -> set[URIRef]: + """Return all URIRef subjects declared as owl:Class or rdfs:Class.""" + classes: set[URIRef] = set() + for cls_type in (OWL.Class, RDFS.Class): + for cls in graph.subjects(RDF.type, cls_type): + if isinstance(cls, URIRef): + classes.add(cls) + return classes + @staticmethod def _determine_entity_type(graph: Graph, uri: URIRef) -> str: """Return 'class', 'property', 'individual', or 'other' for a URI.""" diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index 3a461c3f..6e742e14 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -208,13 +208,22 @@ async def test_duplicate_label() -> None: for m in matches: assert m.issue_type == "warning" + # Verify the details payload shape (#99 review feedback) + for m in matches: + assert m.details is not None + assert m.details["label"] == "Thing" + assert m.details["language"] == "en" + assert m.details["total_duplicates"] == 1 + assert isinstance(m.details["duplicate_iris"], list) + assert len(m.details["duplicate_iris"]) == 1 + # --------------------------------------------------------------------------- -# 8. test_undefined_parent +# 8. dangling-ref (subClassOf path) # --------------------------------------------------------------------------- -async def test_undefined_parent() -> None: +async def test_dangling_ref_flags_undeclared_subclass_target() -> None: """A class referencing a parent not defined in the ontology generates an error.""" g = Graph() g.add((EX.Child, RDF.type, OWL.Class)) @@ -232,7 +241,7 @@ async def test_undefined_parent() -> None: assert matches[0].details["dangling_target"] == str(EX.Phantom) -async def test_no_undefined_parent_when_defined() -> None: +async def test_dangling_ref_does_not_flag_declared_subclass_target() -> None: """A parent that IS defined as owl:Class should not trigger the rule.""" g = Graph() g.add((EX.Parent, RDF.type, OWL.Class)) @@ -1390,3 +1399,100 @@ def test_lint_levels_include_new_and_renamed_rules() -> None: assert "empty-domain" in LINT_LEVELS[4] assert "empty-range" in LINT_LEVELS[4] assert "multi-root" in LINT_LEVELS[4] + + +# --------------------------------------------------------------------------- +# 33. Review-feedback regressions (#99) +# --------------------------------------------------------------------------- + + +async def test_orphan_individual_catches_implicit_individuals() -> None: + """An individual lacking owl:NamedIndividual but typed as a class should still be flagged.""" + g = Graph() + # No owl:NamedIndividual declaration on EX.Alice. + g.add((EX.Alice, RDF.type, EX.UndeclaredPerson)) + + linter = OntologyLinter(enabled_rules={"orphan-individual"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "orphan-individual") + assert len(matches) == 1 + assert matches[0].subject_iri == str(EX.Alice) + assert matches[0].details is not None + assert matches[0].details["undeclared_type"] == str(EX.UndeclaredPerson) + + +async def test_orphan_individual_does_not_flag_classes_themselves() -> None: + """A subject typed as owl:Class is not an individual and must not be flagged.""" + g = Graph() + g.add((EX.Person, RDF.type, OWL.Class)) + + linter = OntologyLinter(enabled_rules={"orphan-individual"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "orphan-individual") == [] + + +async def test_deprecated_parent_recognizes_rdfs_class() -> None: + """A class declared via rdfs:Class with a deprecated rdfs:Class parent should be flagged.""" + g = Graph() + g.add((EX.OldThing, RDF.type, RDFS.Class)) + g.add((EX.OldThing, OWL.deprecated, Literal(True))) + g.add((EX.NewThing, RDF.type, RDFS.Class)) + g.add((EX.NewThing, RDFS.subClassOf, EX.OldThing)) + + linter = OntologyLinter(enabled_rules={"deprecated-parent"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "deprecated-parent") + assert len(matches) == 1 + assert matches[0].subject_iri == str(EX.NewThing) + + +async def test_orphan_individual_recognizes_rdfs_class() -> None: + """When a class is declared via rdfs:Class, individuals typed as it should NOT be orphaned.""" + g = Graph() + g.add((EX.Person, RDF.type, RDFS.Class)) + g.add((EX.Alice, RDF.type, EX.Person)) + + linter = OntologyLinter(enabled_rules={"orphan-individual"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "orphan-individual") == [] + + +async def test_duplicate_label_uses_group_specific_label_casing() -> None: + """A subject with two labels that fall in different groups should report each group's actual label.""" + g = Graph() + g.add((EX.A, RDF.type, OWL.Class)) + g.add((EX.A, RDFS.label, Literal("Apple", lang="en"))) + g.add((EX.A, RDFS.label, Literal("Banana", lang="en"))) + g.add((EX.B, RDF.type, OWL.Class)) + g.add((EX.B, RDFS.label, Literal("apple", lang="en"))) + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "duplicate-label") + # Both A and B share the "apple" group; "Banana" group has size 1, no finding. + assert {m.subject_iri for m in matches} == {str(EX.A), str(EX.B)} + # Both findings must report the "apple" group's label, not whichever label + # of A was seen first. + for m in matches: + assert m.details is not None + assert m.details["label"].lower() == "apple" + + +async def test_dangling_ref_skips_imported_hash_namespaces() -> None: + """References into namespaces declared via owl:imports using # form must not be flagged.""" + g = Graph() + imported_ns = URIRef("http://other.org/onto") + g.add((URIRef("http://example.org/myonto"), OWL.imports, imported_ns)) + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) + # The imported ontology uses hash IRIs; the linter must accept this form. + g.add((EX.knows, RDFS.range, URIRef("http://other.org/onto#Person"))) + + linter = OntologyLinter(enabled_rules={"dangling-ref"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "dangling-ref") == [] From fd2897b0fa304104f8ec662e5ce58e094fed1a74 Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 18:21:52 +0200 Subject: [PATCH 19/20] test(lint): cover defensive guards in new rules (#99) Lift PR patch coverage above the codecov target by adding targeted tests for the blank-node and edge-case guards in the new check methods. Each new test exercises one of the previously-uncovered 'continue' branches: blank-node iterations in unused-property, orphan-individual, empty-domain, empty-range, deprecated-parent, and multi-root; plus the no-type / non-literal / empty-label / multi-group guards in duplicate-label. --- tests/unit/test_linter.py | 156 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 156 insertions(+) diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index 6e742e14..dffa8e42 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -1496,3 +1496,159 @@ async def test_dangling_ref_skips_imported_hash_namespaces() -> None: issues = await linter.lint(g, PROJECT_ID) assert _results_with_rule(issues, "dangling-ref") == [] + + +# --------------------------------------------------------------------------- +# 34. Defensive-guard coverage (#99) +# --------------------------------------------------------------------------- + + +async def test_unused_property_skips_blank_node_property() -> None: + """A blank-node typed as a property must not crash the rule.""" + g = Graph() + g.add((BNode(), RDF.type, OWL.ObjectProperty)) + g.add((EX.knows, RDF.type, OWL.ObjectProperty)) # ensures rule reaches the URIRef path + g.add((EX.Alice, EX.knows, EX.Bob)) + + linter = OntologyLinter(enabled_rules={"unused-property"}) + issues = await linter.lint(g, PROJECT_ID) + + # Used property is not flagged; blank-node property is silently skipped. + assert _results_with_rule(issues, "unused-property") == [] + + +async def test_orphan_individual_skips_blank_node_type_target() -> None: + """Individuals with blank-node type targets do not crash the rule.""" + g = Graph() + g.add((EX.Alice, RDF.type, OWL.NamedIndividual)) + g.add((EX.Alice, RDF.type, BNode())) # blank-node restriction-style type target + + linter = OntologyLinter(enabled_rules={"orphan-individual"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "orphan-individual") == [] + + +async def test_empty_domain_skips_blank_node_property() -> None: + g = Graph() + g.add((BNode(), RDF.type, OWL.ObjectProperty)) + g.add((EX.named, RDF.type, OWL.ObjectProperty)) + g.add((EX.named, RDFS.domain, EX.Person)) + + linter = OntologyLinter(enabled_rules={"empty-domain"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "empty-domain") == [] + + +async def test_empty_range_skips_blank_node_property() -> None: + g = Graph() + g.add((BNode(), RDF.type, OWL.ObjectProperty)) + g.add((EX.named, RDF.type, OWL.ObjectProperty)) + g.add((EX.named, RDFS.range, EX.Person)) + + linter = OntologyLinter(enabled_rules={"empty-range"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "empty-range") == [] + + +async def test_deprecated_parent_skips_blank_nodes() -> None: + """Both blank-node classes and blank-node parents must be skipped silently.""" + g = Graph() + # Blank-node class (e.g., owl:Restriction) — should not be iterated. + g.add((BNode(), RDF.type, OWL.Class)) + # A real class with a blank-node parent (e.g., from owl:Restriction) — should be skipped. + g.add((EX.Foo, RDF.type, OWL.Class)) + g.add((EX.Foo, RDFS.subClassOf, BNode())) + + linter = OntologyLinter(enabled_rules={"deprecated-parent"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "deprecated-parent") == [] + + +async def test_multi_root_skips_blank_node_classes() -> None: + """Blank-node classes don't count as roots.""" + g = Graph() + # 5 real roots (under the threshold)… + for i in range(5): + g.add((URIRef(f"http://example.org/Root{i}"), RDF.type, OWL.Class)) + # …plus 5 blank-node classes; if the guard worked, none of these count. + for _ in range(5): + g.add((BNode(), RDF.type, OWL.Class)) + + linter = OntologyLinter(enabled_rules={"multi-root"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "multi-root") == [] + + +async def test_duplicate_label_skips_subject_with_no_type() -> None: + """A subject that has rdfs:label but no rdf:type is treated as 'other' and skipped.""" + g = Graph() + # No rdf:type for EX.Anon, just a label. + g.add((EX.Anon, RDFS.label, Literal("Foo", lang="en"))) + g.add((EX.AnonTwo, RDFS.label, Literal("Foo", lang="en"))) + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + # Both subjects resolve to entity type "other" and are silently dropped. + assert _results_with_rule(issues, "duplicate-label") == [] + + +async def test_duplicate_label_skips_non_literal_label_objects() -> None: + """rdfs:label values that aren't literals (e.g., URIRefs) must be ignored, not crash.""" + g = Graph() + g.add((EX.A, RDF.type, OWL.Class)) + g.add((EX.A, RDFS.label, EX.SomeIRI)) # non-literal — skipped + g.add((EX.A, RDFS.label, Literal("Real", lang="en"))) + g.add((EX.B, RDF.type, OWL.Class)) + g.add((EX.B, RDFS.label, Literal("real", lang="en"))) + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + # The literal labels still match case-insensitively; the IRI label is ignored. + matches = _results_with_rule(issues, "duplicate-label") + assert {m.subject_iri for m in matches} == {str(EX.A), str(EX.B)} + + +async def test_duplicate_label_skips_empty_and_whitespace_labels() -> None: + """Empty or whitespace-only labels must not be grouped (would otherwise spuriously match).""" + g = Graph() + g.add((EX.A, RDF.type, OWL.Class)) + g.add((EX.A, RDFS.label, Literal("", lang="en"))) + g.add((EX.B, RDF.type, OWL.Class)) + g.add((EX.B, RDFS.label, Literal(" ", lang="en"))) + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + assert _results_with_rule(issues, "duplicate-label") == [] + + +async def test_duplicate_label_subject_in_two_duplicate_groups_reported_once() -> None: + """When the same subject is in two duplicate groups, it should be reported only once.""" + # A has both "Apple"@en and "Banana"@en. B duplicates "apple". C duplicates "banana". + # Both groups fire; A appears in both. The reported_iris guard ensures A only + # gets one finding, not two. + g = Graph() + g.add((EX.A, RDF.type, OWL.Class)) + g.add((EX.A, RDFS.label, Literal("Apple", lang="en"))) + g.add((EX.A, RDFS.label, Literal("Banana", lang="en"))) + g.add((EX.B, RDF.type, OWL.Class)) + g.add((EX.B, RDFS.label, Literal("apple", lang="en"))) + g.add((EX.C, RDF.type, OWL.Class)) + g.add((EX.C, RDFS.label, Literal("banana", lang="en"))) + + linter = OntologyLinter(enabled_rules={"duplicate-label"}) + issues = await linter.lint(g, PROJECT_ID) + + matches = _results_with_rule(issues, "duplicate-label") + flagged_iris = [m.subject_iri for m in matches] + # Each subject appears at most once even though A is in two groups. + assert flagged_iris.count(str(EX.A)) == 1 + # All three subjects flagged. + assert set(flagged_iris) == {str(EX.A), str(EX.B), str(EX.C)} From 8b2dbb402314887924f9f7a74dd5c49c5a807cdd Mon Sep 17 00:00:00 2001 From: "John R. D'Orazio" Date: Sun, 3 May 2026 20:45:44 +0200 Subject: [PATCH 20/20] test(lint): tighten dangling-ref coverage after review (#99) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two tiny CodeRabbit-flagged gaps in the dangling-ref tests: * test_dangling_ref_flags_undefined_range only asserted details.predicate, not the dangling_target IRI. Add the target assertion so a regression in either field is caught. * test_dangling_ref_skips_well_known_namespaces only exercised XSD and SKOS — the implementation also skips DC and DCTERMS. Add a DC.title range and a DCTERMS.creator range so all 7 well-known namespaces are represented in the test. Co-Authored-By: Claude Opus 4.7 (1M context) --- tests/unit/test_linter.py | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tests/unit/test_linter.py b/tests/unit/test_linter.py index dffa8e42..62dccfc0 100644 --- a/tests/unit/test_linter.py +++ b/tests/unit/test_linter.py @@ -3,7 +3,7 @@ from uuid import uuid4 from rdflib import BNode, Graph, Literal, Namespace, URIRef -from rdflib.namespace import OWL, RDF, RDFS, SKOS, XSD +from rdflib.namespace import DC, DCTERMS, OWL, RDF, RDFS, SKOS, XSD from ontokit.services.linter import ( LINT_RULES, @@ -1247,6 +1247,7 @@ async def test_dangling_ref_flags_undefined_range() -> None: assert matches[0].subject_iri == str(EX.age) assert matches[0].details is not None assert matches[0].details["predicate"] == str(RDFS.range) + assert matches[0].details["dangling_target"] == str(EX.UndeclaredDatatype) async def test_dangling_ref_subclassof_includes_predicate_detail() -> None: @@ -1271,6 +1272,12 @@ async def test_dangling_ref_skips_well_known_namespaces() -> None: g.add((EX.knows, RDFS.range, XSD.string)) g.add((EX.related, RDF.type, OWL.ObjectProperty)) g.add((EX.related, RDFS.range, SKOS.Concept)) + # Also exercise the DC and DCTERMS skiplist entries so all 7 well-known + # namespaces are covered by this test, not just XSD and SKOS. + g.add((EX.titledBy, RDF.type, OWL.AnnotationProperty)) + g.add((EX.titledBy, RDFS.range, DC.title)) + g.add((EX.createdBy, RDF.type, OWL.AnnotationProperty)) + g.add((EX.createdBy, RDFS.range, DCTERMS.creator)) linter = OntologyLinter(enabled_rules={"dangling-ref"}) issues = await linter.lint(g, PROJECT_ID)