Summary
Updated analysis of D4D ↔ FAIRSCAPE alignment based on the current state of the repository with FAIRSCAPE Pydantic models integration, SKOS semantic alignment, and updated slot URI coverage.
This updates and extends the analysis from issue #130 with significant new developments.
Major Updates Since Issue #130
1. FAIRSCAPE Pydantic Models Integration ✅
Status: Fully integrated as git submodule
Location: fairscape_models/ (from https://github.com/fairscape/fairscape_models)
Converter Implementation: src/fairscape_integration/d4d_to_fairscape.py
- Converts D4D YAML → FAIRSCAPE RO-Crate with Pydantic validation
- Tested and working (VOICE D4D → FAIRSCAPE RO-Crate validated ✓)
- Returns
(ROCrateV1_2, validation_result) tuple
Available Models:
ROCrateV1_2 - Top-level RO-Crate container
ROCrateMetadataElem - Root dataset entity
ROCrateMetadataFileElem - Metadata descriptor
Dataset, Software, Computation, Annotation, Experiment - Entity types
IdentifierValue, PropertyValue - Supporting types
Key Benefit: Runtime validation and type safety for RO-Crate generation
2. SKOS Semantic Alignment ✅
File: src/data_sheets_schema/alignment/d4d_rocrate_skos_alignment.ttl
Total Mappings: 88 D4D properties mapped to RO-Crate/FAIRSCAPE
Mapping Quality:
skos:exactMatch: 52 (59.1%) - Direct 1:1 mappings
skos:closeMatch: 20 (22.7%) - Semantic equivalence with transformation
skos:relatedMatch: 10 (11.4%) - Related concepts, complex mapping
skos:narrowMatch: 5 (5.7%) - D4D term narrower than RO-Crate
skos:broadMatch: 1 (1.1%) - D4D term broader than RO-Crate
Target Vocabularies:
- schema.org: ~43 property mappings
- RAI (Responsible AI): ~30 property mappings
- EVI (FAIRSCAPE Evidence): ~9 property mappings
- D4D-specific: ~23 property mappings
Example Mappings:
# Exact matches
d4d:title skos:exactMatch schema:name .
d4d:description skos:exactMatch schema:description .
d4d:doi skos:exactMatch schema:identifier .
d4d:license skos:exactMatch schema:license .
# Close matches (transformation required)
d4d:creators skos:closeMatch schema:author . # String to Person object
d4d:created_by skos:closeMatch schema:creator .
# Related matches (complex mapping)
d4d:instances skos:relatedMatch schema:variableMeasured .
d4d:subpopulations skos:relatedMatch schema:variableMeasured .
3. Reference Implementation
File: data/ro-crate/profiles/fairscape/full-ro-crate-metadata.json
Source: CM4AI (Cell Maps for AI) January 2026 data release
Content: Real-world FAIRSCAPE RO-Crate metadata (19.1 TB dataset, 647 entities)
Validation: ✅ Validates against FAIRSCAPE Pydantic models
Key Patterns:
- FAIRSCAPE @context:
{"@vocab": "https://schema.org/", "evi": "...", "rai": "...", "d4d": "..."}
- EVI properties:
evi:datasetCount, evi:computationCount, evi:formats
- RAI properties:
rai:dataUseCases, rai:dataBiases, rai:dataLimitations
- PropertyValue pattern for custom metadata
Updated Slot URI Coverage Analysis
Current State (Full Merged Schema)
File: src/data_sheets_schema/schema/data_sheets_schema_all.yaml
Overall Coverage: 31/33 slots (93.9%)
Vocabulary Usage:
dcterms: (Dublin Core) — 20 mappings (60.6%)
dcat: (Data Catalog) — 8 mappings (24.2%)
schema: (Schema.org) — 2 mappings (6.1%)
prov: (Provenance) — 1 mapping (3.0%)
Unmapped Slots (2):
Comparison with Issue #130
| Metric |
Issue #130 |
Current (2026-03-19) |
| Total slots analyzed |
~414 domain attributes |
33 top-level slots |
| Mapped slots |
~116 (28%) |
31 (93.9%) |
| Primary vocabulary |
dcterms (70 mappings) |
dcterms (20 mappings) |
| SKOS alignment |
None |
88 mappings |
| FAIRSCAPE integration |
None |
✅ Complete |
Note: The discrepancy suggests significant schema consolidation/restructuring since issue #130. The current analysis focuses on top-level reusable slots rather than all class attributes.
D4D ↔ FAIRSCAPE Alignment at URI Level
Core Metadata (✅ Aligned)
Both D4D and FAIRSCAPE use Schema.org for core metadata:
| Concept |
D4D slot_uri |
FAIRSCAPE JSON-LD |
Alignment |
| Title |
schema:name |
schema:name |
✅ Exact |
| Description |
schema:description |
schema:description |
✅ Exact |
| Identifier |
schema:identifier |
schema:identifier |
✅ Exact |
| License |
schema:license |
schema:license |
✅ Exact |
| URL |
schema:url |
schema:url |
✅ Exact |
Provenance (⚠️ Vocabulary Tension)
D4D prefers Dublin Core, FAIRSCAPE uses Schema.org:
| Concept |
D4D slot_uri |
FAIRSCAPE JSON-LD |
Alignment |
| Created date |
dcterms:created |
schema:dateCreated |
⚠️ Same concept, different vocab |
| Modified date |
dcterms:modified |
schema:dateModified |
⚠️ Same concept, different vocab |
| Creator |
dcterms:creator |
schema:author |
⚠️ Same concept, different vocab |
| Download URL |
dcat:downloadURL |
schema:contentUrl |
⚠️ Same concept, different vocab |
Impact: SKOS alignment captures these equivalences as skos:closeMatch with transformation notes.
Extended Metadata (✅ Complementary)
FAIRSCAPE extends Schema.org with custom namespaces:
EVI (Evidence) namespace:
evi:datasetCount, evi:computationCount, evi:softwareCount
evi:formats, evi:entitiesWithSummaryStats
evi:md5, evi:sha256
RAI (Responsible AI) namespace:
rai:dataUseCases, rai:dataBiases, rai:dataLimitations
rai:dataCollection, rai:prohibitedUses
rai:ethicalReview, rai:personalSensitiveInformation
D4D coverage: Many of these concepts mapped via SKOS alignment (88 total mappings)
Interface Mapping Coverage
File: data/ro-crate_mapping/d4d_rocrate_interface_mapping.tsv
Total Mappings: 129 field-level mappings
Updated with FAIRSCAPE Patterns:
- EVI property examples from CM4AI dataset
- FAIRSCAPE @context structure documented
- Target updated from
@type='ROCrate' to @type='Dataset'
Remaining Gaps and Recommendations
1. Dublin Core ↔ Schema.org Harmonization
Current State: D4D uses dcterms: (60% of mappings), FAIRSCAPE uses schema:
Options:
- A) Add
exact_mappings in D4D schema pointing dcterms slots to schema.org equivalents
- B) Migrate D4D to prefer schema.org for consistency with FAIRSCAPE
- C) Maintain both with formal SKOS equivalences (current approach ✅)
Recommendation: Continue with option C (SKOS alignment) - captures the equivalence formally without forcing schema changes.
2. Unmapped Slots
Add slot_uri for remaining slots:
dialect → schema:encodingFormat or custom D4D property
resources → schema:hasPart or dcat:distribution
3. SSSOM Export
Generate SSSOM (Simple Standard for Sharing Ontology Mappings) from SKOS alignment for better interoperability:
# Convert SKOS TTL → SSSOM TSV
robot convert --input d4d_rocrate_skos_alignment.ttl \
--output d4d_rocrate_sssom_mapping.tsv
4. Extended Namespace Mappings
Consider adding slot_uri mappings to:
- DUO (Data Use Ontology) - for consent and data use restrictions
- OBI (Ontology for Biomedical Investigations) - for assay and protocol metadata
- IAO (Information Artifact Ontology) - for information content entities
5. Bidirectional Transformation
Current converter: D4D → FAIRSCAPE ✅
TODO: Implement FAIRSCAPE → D4D converter using:
fairscape_models/conversion/mapping/d4d_to_rocrate.py (existing mapping)
- SKOS alignment for semantic guidance
- Handle vocabulary translations (schema → dcterms)
Files and Documentation
Key Files Created/Updated
-
FAIRSCAPE Integration:
fairscape_models/ (git submodule)
src/fairscape_integration/__init__.py
src/fairscape_integration/d4d_to_fairscape.py
-
Semantic Alignment:
src/data_sheets_schema/alignment/d4d_rocrate_skos_alignment.ttl
-
Reference Implementation:
data/ro-crate/profiles/fairscape/full-ro-crate-metadata.json
-
Examples:
data/ro-crate/examples/voice_fairscape_test.json
-
Documentation:
notes/FAIRSCAPE_JSON_PYDANTIC_RELATIONSHIP.md
data/ro-crate/profiles/D4D/d4d-profile-spec.md (updated)
data/ro-crate/profiles/D4D/README.md (updated)
Deprecated Files
Moved to data/ro-crate/DEPRECATED/:
- Custom RO-Crate examples (replaced by FAIRSCAPE Pydantic generation)
- D4D profile v1 (replaced by FAIRSCAPE-aligned approach)
Related Work
Next Steps
- ✅ FAIRSCAPE Pydantic integration (complete)
- ✅ SKOS semantic alignment (complete)
- ✅ D4D → FAIRSCAPE converter (complete)
- 🔄 Generate FAIRSCAPE RO-Crates for all 4 projects (in progress - VOICE done)
- 🔄 SSSOM export from SKOS alignment
- 📋 Implement FAIRSCAPE → D4D reverse converter
- 📋 Add slot_uri for remaining unmapped slots (dialect, resources)
- 📋 Consider schema.org migration or dual vocabulary support
Summary
Updated analysis of D4D ↔ FAIRSCAPE alignment based on the current state of the repository with FAIRSCAPE Pydantic models integration, SKOS semantic alignment, and updated slot URI coverage.
This updates and extends the analysis from issue #130 with significant new developments.
Major Updates Since Issue #130
1. FAIRSCAPE Pydantic Models Integration ✅
Status: Fully integrated as git submodule
Location:
fairscape_models/(from https://github.com/fairscape/fairscape_models)Converter Implementation:
src/fairscape_integration/d4d_to_fairscape.py(ROCrateV1_2, validation_result)tupleAvailable Models:
ROCrateV1_2- Top-level RO-Crate containerROCrateMetadataElem- Root dataset entityROCrateMetadataFileElem- Metadata descriptorDataset,Software,Computation,Annotation,Experiment- Entity typesIdentifierValue,PropertyValue- Supporting typesKey Benefit: Runtime validation and type safety for RO-Crate generation
2. SKOS Semantic Alignment ✅
File:
src/data_sheets_schema/alignment/d4d_rocrate_skos_alignment.ttlTotal Mappings: 88 D4D properties mapped to RO-Crate/FAIRSCAPE
Mapping Quality:
skos:exactMatch: 52 (59.1%) - Direct 1:1 mappingsskos:closeMatch: 20 (22.7%) - Semantic equivalence with transformationskos:relatedMatch: 10 (11.4%) - Related concepts, complex mappingskos:narrowMatch: 5 (5.7%) - D4D term narrower than RO-Crateskos:broadMatch: 1 (1.1%) - D4D term broader than RO-CrateTarget Vocabularies:
Example Mappings:
3. Reference Implementation
File:
data/ro-crate/profiles/fairscape/full-ro-crate-metadata.jsonSource: CM4AI (Cell Maps for AI) January 2026 data release
Content: Real-world FAIRSCAPE RO-Crate metadata (19.1 TB dataset, 647 entities)
Validation: ✅ Validates against FAIRSCAPE Pydantic models
Key Patterns:
{"@vocab": "https://schema.org/", "evi": "...", "rai": "...", "d4d": "..."}evi:datasetCount,evi:computationCount,evi:formatsrai:dataUseCases,rai:dataBiases,rai:dataLimitationsUpdated Slot URI Coverage Analysis
Current State (Full Merged Schema)
File:
src/data_sheets_schema/schema/data_sheets_schema_all.yamlOverall Coverage: 31/33 slots (93.9%)
Vocabulary Usage:
dcterms:(Dublin Core) — 20 mappings (60.6%)dcat:(Data Catalog) — 8 mappings (24.2%)schema:(Schema.org) — 2 mappings (6.1%)prov:(Provenance) — 1 mapping (3.0%)Unmapped Slots (2):
dialectresourcesComparison with Issue #130
Note: The discrepancy suggests significant schema consolidation/restructuring since issue #130. The current analysis focuses on top-level reusable slots rather than all class attributes.
D4D ↔ FAIRSCAPE Alignment at URI Level
Core Metadata (✅ Aligned)
Both D4D and FAIRSCAPE use Schema.org for core metadata:
schema:nameschema:nameschema:descriptionschema:descriptionschema:identifierschema:identifierschema:licenseschema:licenseschema:urlschema:urlProvenance (⚠️ Vocabulary Tension)
D4D prefers Dublin Core, FAIRSCAPE uses Schema.org:
dcterms:createdschema:dateCreateddcterms:modifiedschema:dateModifieddcterms:creatorschema:authordcat:downloadURLschema:contentUrlImpact: SKOS alignment captures these equivalences as
skos:closeMatchwith transformation notes.Extended Metadata (✅ Complementary)
FAIRSCAPE extends Schema.org with custom namespaces:
EVI (Evidence) namespace:
evi:datasetCount,evi:computationCount,evi:softwareCountevi:formats,evi:entitiesWithSummaryStatsevi:md5,evi:sha256RAI (Responsible AI) namespace:
rai:dataUseCases,rai:dataBiases,rai:dataLimitationsrai:dataCollection,rai:prohibitedUsesrai:ethicalReview,rai:personalSensitiveInformationD4D coverage: Many of these concepts mapped via SKOS alignment (88 total mappings)
Interface Mapping Coverage
File:
data/ro-crate_mapping/d4d_rocrate_interface_mapping.tsvTotal Mappings: 129 field-level mappings
Updated with FAIRSCAPE Patterns:
@type='ROCrate'to@type='Dataset'Remaining Gaps and Recommendations
1. Dublin Core ↔ Schema.org Harmonization
Current State: D4D uses
dcterms:(60% of mappings), FAIRSCAPE usesschema:Options:
exact_mappingsin D4D schema pointing dcterms slots to schema.org equivalentsRecommendation: Continue with option C (SKOS alignment) - captures the equivalence formally without forcing schema changes.
2. Unmapped Slots
Add
slot_urifor remaining slots:dialect→schema:encodingFormator custom D4D propertyresources→schema:hasPartordcat:distribution3. SSSOM Export
Generate SSSOM (Simple Standard for Sharing Ontology Mappings) from SKOS alignment for better interoperability:
# Convert SKOS TTL → SSSOM TSV robot convert --input d4d_rocrate_skos_alignment.ttl \ --output d4d_rocrate_sssom_mapping.tsv4. Extended Namespace Mappings
Consider adding slot_uri mappings to:
5. Bidirectional Transformation
Current converter: D4D → FAIRSCAPE ✅
TODO: Implement FAIRSCAPE → D4D converter using:
fairscape_models/conversion/mapping/d4d_to_rocrate.py(existing mapping)Files and Documentation
Key Files Created/Updated
FAIRSCAPE Integration:
fairscape_models/(git submodule)src/fairscape_integration/__init__.pysrc/fairscape_integration/d4d_to_fairscape.pySemantic Alignment:
src/data_sheets_schema/alignment/d4d_rocrate_skos_alignment.ttlReference Implementation:
data/ro-crate/profiles/fairscape/full-ro-crate-metadata.jsonExamples:
data/ro-crate/examples/voice_fairscape_test.jsonDocumentation:
notes/FAIRSCAPE_JSON_PYDANTIC_RELATIONSHIP.mddata/ro-crate/profiles/D4D/d4d-profile-spec.md(updated)data/ro-crate/profiles/D4D/README.md(updated)Deprecated Files
Moved to
data/ro-crate/DEPRECATED/:Related Work
Next Steps