Skip to content

Updated Analysis: D4D ↔ FAIRSCAPE Alignment with Pydantic Integration and SKOS Mappings #131

@realmarcin

Description

@realmarcin

Summary

Updated analysis of D4D ↔ FAIRSCAPE alignment based on the current state of the repository with FAIRSCAPE Pydantic models integration, SKOS semantic alignment, and updated slot URI coverage.

This updates and extends the analysis from issue #130 with significant new developments.

Major Updates Since Issue #130

1. FAIRSCAPE Pydantic Models Integration ✅

Status: Fully integrated as git submodule

Location: fairscape_models/ (from https://github.com/fairscape/fairscape_models)

Converter Implementation: src/fairscape_integration/d4d_to_fairscape.py

  • Converts D4D YAML → FAIRSCAPE RO-Crate with Pydantic validation
  • Tested and working (VOICE D4D → FAIRSCAPE RO-Crate validated ✓)
  • Returns (ROCrateV1_2, validation_result) tuple

Available Models:

  • ROCrateV1_2 - Top-level RO-Crate container
  • ROCrateMetadataElem - Root dataset entity
  • ROCrateMetadataFileElem - Metadata descriptor
  • Dataset, Software, Computation, Annotation, Experiment - Entity types
  • IdentifierValue, PropertyValue - Supporting types

Key Benefit: Runtime validation and type safety for RO-Crate generation

2. SKOS Semantic Alignment ✅

File: src/data_sheets_schema/alignment/d4d_rocrate_skos_alignment.ttl

Total Mappings: 88 D4D properties mapped to RO-Crate/FAIRSCAPE

Mapping Quality:

  • skos:exactMatch: 52 (59.1%) - Direct 1:1 mappings
  • skos:closeMatch: 20 (22.7%) - Semantic equivalence with transformation
  • skos:relatedMatch: 10 (11.4%) - Related concepts, complex mapping
  • skos:narrowMatch: 5 (5.7%) - D4D term narrower than RO-Crate
  • skos:broadMatch: 1 (1.1%) - D4D term broader than RO-Crate

Target Vocabularies:

  • schema.org: ~43 property mappings
  • RAI (Responsible AI): ~30 property mappings
  • EVI (FAIRSCAPE Evidence): ~9 property mappings
  • D4D-specific: ~23 property mappings

Example Mappings:

# Exact matches
d4d:title skos:exactMatch schema:name .
d4d:description skos:exactMatch schema:description .
d4d:doi skos:exactMatch schema:identifier .
d4d:license skos:exactMatch schema:license .

# Close matches (transformation required)
d4d:creators skos:closeMatch schema:author .  # String to Person object
d4d:created_by skos:closeMatch schema:creator .

# Related matches (complex mapping)
d4d:instances skos:relatedMatch schema:variableMeasured .
d4d:subpopulations skos:relatedMatch schema:variableMeasured .

3. Reference Implementation

File: data/ro-crate/profiles/fairscape/full-ro-crate-metadata.json

Source: CM4AI (Cell Maps for AI) January 2026 data release

Content: Real-world FAIRSCAPE RO-Crate metadata (19.1 TB dataset, 647 entities)

Validation: ✅ Validates against FAIRSCAPE Pydantic models

Key Patterns:

  • FAIRSCAPE @context: {"@vocab": "https://schema.org/", "evi": "...", "rai": "...", "d4d": "..."}
  • EVI properties: evi:datasetCount, evi:computationCount, evi:formats
  • RAI properties: rai:dataUseCases, rai:dataBiases, rai:dataLimitations
  • PropertyValue pattern for custom metadata

Updated Slot URI Coverage Analysis

Current State (Full Merged Schema)

File: src/data_sheets_schema/schema/data_sheets_schema_all.yaml

Overall Coverage: 31/33 slots (93.9%)

Vocabulary Usage:

  • dcterms: (Dublin Core) — 20 mappings (60.6%)
  • dcat: (Data Catalog) — 8 mappings (24.2%)
  • schema: (Schema.org) — 2 mappings (6.1%)
  • prov: (Provenance) — 1 mapping (3.0%)

Unmapped Slots (2):

  • dialect
  • resources

Comparison with Issue #130

Metric Issue #130 Current (2026-03-19)
Total slots analyzed ~414 domain attributes 33 top-level slots
Mapped slots ~116 (28%) 31 (93.9%)
Primary vocabulary dcterms (70 mappings) dcterms (20 mappings)
SKOS alignment None 88 mappings
FAIRSCAPE integration None ✅ Complete

Note: The discrepancy suggests significant schema consolidation/restructuring since issue #130. The current analysis focuses on top-level reusable slots rather than all class attributes.

D4D ↔ FAIRSCAPE Alignment at URI Level

Core Metadata (✅ Aligned)

Both D4D and FAIRSCAPE use Schema.org for core metadata:

Concept D4D slot_uri FAIRSCAPE JSON-LD Alignment
Title schema:name schema:name ✅ Exact
Description schema:description schema:description ✅ Exact
Identifier schema:identifier schema:identifier ✅ Exact
License schema:license schema:license ✅ Exact
URL schema:url schema:url ✅ Exact

Provenance (⚠️ Vocabulary Tension)

D4D prefers Dublin Core, FAIRSCAPE uses Schema.org:

Concept D4D slot_uri FAIRSCAPE JSON-LD Alignment
Created date dcterms:created schema:dateCreated ⚠️ Same concept, different vocab
Modified date dcterms:modified schema:dateModified ⚠️ Same concept, different vocab
Creator dcterms:creator schema:author ⚠️ Same concept, different vocab
Download URL dcat:downloadURL schema:contentUrl ⚠️ Same concept, different vocab

Impact: SKOS alignment captures these equivalences as skos:closeMatch with transformation notes.

Extended Metadata (✅ Complementary)

FAIRSCAPE extends Schema.org with custom namespaces:

EVI (Evidence) namespace:

  • evi:datasetCount, evi:computationCount, evi:softwareCount
  • evi:formats, evi:entitiesWithSummaryStats
  • evi:md5, evi:sha256

RAI (Responsible AI) namespace:

  • rai:dataUseCases, rai:dataBiases, rai:dataLimitations
  • rai:dataCollection, rai:prohibitedUses
  • rai:ethicalReview, rai:personalSensitiveInformation

D4D coverage: Many of these concepts mapped via SKOS alignment (88 total mappings)

Interface Mapping Coverage

File: data/ro-crate_mapping/d4d_rocrate_interface_mapping.tsv

Total Mappings: 129 field-level mappings

Updated with FAIRSCAPE Patterns:

  • EVI property examples from CM4AI dataset
  • FAIRSCAPE @context structure documented
  • Target updated from @type='ROCrate' to @type='Dataset'

Remaining Gaps and Recommendations

1. Dublin Core ↔ Schema.org Harmonization

Current State: D4D uses dcterms: (60% of mappings), FAIRSCAPE uses schema:

Options:

  • A) Add exact_mappings in D4D schema pointing dcterms slots to schema.org equivalents
  • B) Migrate D4D to prefer schema.org for consistency with FAIRSCAPE
  • C) Maintain both with formal SKOS equivalences (current approach ✅)

Recommendation: Continue with option C (SKOS alignment) - captures the equivalence formally without forcing schema changes.

2. Unmapped Slots

Add slot_uri for remaining slots:

  • dialectschema:encodingFormat or custom D4D property
  • resourcesschema:hasPart or dcat:distribution

3. SSSOM Export

Generate SSSOM (Simple Standard for Sharing Ontology Mappings) from SKOS alignment for better interoperability:

# Convert SKOS TTL → SSSOM TSV
robot convert --input d4d_rocrate_skos_alignment.ttl \
              --output d4d_rocrate_sssom_mapping.tsv

4. Extended Namespace Mappings

Consider adding slot_uri mappings to:

  • DUO (Data Use Ontology) - for consent and data use restrictions
  • OBI (Ontology for Biomedical Investigations) - for assay and protocol metadata
  • IAO (Information Artifact Ontology) - for information content entities

5. Bidirectional Transformation

Current converter: D4D → FAIRSCAPE ✅

TODO: Implement FAIRSCAPE → D4D converter using:

  • fairscape_models/conversion/mapping/d4d_to_rocrate.py (existing mapping)
  • SKOS alignment for semantic guidance
  • Handle vocabulary translations (schema → dcterms)

Files and Documentation

Key Files Created/Updated

  1. FAIRSCAPE Integration:

    • fairscape_models/ (git submodule)
    • src/fairscape_integration/__init__.py
    • src/fairscape_integration/d4d_to_fairscape.py
  2. Semantic Alignment:

    • src/data_sheets_schema/alignment/d4d_rocrate_skos_alignment.ttl
  3. Reference Implementation:

    • data/ro-crate/profiles/fairscape/full-ro-crate-metadata.json
  4. Examples:

    • data/ro-crate/examples/voice_fairscape_test.json
  5. Documentation:

    • notes/FAIRSCAPE_JSON_PYDANTIC_RELATIONSHIP.md
    • data/ro-crate/profiles/D4D/d4d-profile-spec.md (updated)
    • data/ro-crate/profiles/D4D/README.md (updated)

Deprecated Files

Moved to data/ro-crate/DEPRECATED/:

  • Custom RO-Crate examples (replaced by FAIRSCAPE Pydantic generation)
  • D4D profile v1 (replaced by FAIRSCAPE-aligned approach)

Related Work

Next Steps

  1. ✅ FAIRSCAPE Pydantic integration (complete)
  2. ✅ SKOS semantic alignment (complete)
  3. ✅ D4D → FAIRSCAPE converter (complete)
  4. 🔄 Generate FAIRSCAPE RO-Crates for all 4 projects (in progress - VOICE done)
  5. 🔄 SSSOM export from SKOS alignment
  6. 📋 Implement FAIRSCAPE → D4D reverse converter
  7. 📋 Add slot_uri for remaining unmapped slots (dialect, resources)
  8. 📋 Consider schema.org migration or dual vocabulary support

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions