import-nanopub-chain: follow the FORRT backbone, not just refersToNanopub#8
import-nanopub-chain: follow the FORRT backbone, not just refersToNanopub#8annefou wants to merge 1 commit into
Conversation
…opub The constellation importer walked only the curated KnowledgePixels npa:refersToNanopub graph, which in practice links just CiTO <-> Outcome — so a BFS from the CiTO stopped after 2 nodes and missed the Study, Claim, AIDA and Quote. Those steps are connected by domain predicates the network graph does not index: Outcome --isOutcomeOf--> Study --targetsClaim--> Claim Claim --asAidaStatement--> <purl.org/aida/...> --(asserted by)--> AIDA AIDA --related--> Quote Add backbone_neighbours(): read every nanopub a node points at from its TriG, resolve the Claim->AIDA hop via a new aida-sentence-nanopub.rq query, and keep only targets that are themselves FORRT chain steps (chain_step_kind) so value-lists/templates/papers are dropped and never crawled. walk() now merges these with the refersToNanopub neighbours (edge relation 'backbone'). Verified against a published 6-step chain: entering from the CiTO now returns all six steps (Quote, AIDA, Claim, Study, Outcome, CiTO) instead of two.
|
Closing — wrong layer and wrong approach. The canonical constellation is This PR instead added hand-coded link-walking + heuristic step-type classification to the legacy |
Problem
The constellation importer (
scripts/import-nanopub-chain.py) discovered neighbours only through the curated KnowledgePixelsnpa:refersToNanopubgraph. For real FORRT chains that graph materialises essentially one edge — CiTO ↔ Outcome — so a BFS entering from the CiTO stopped after 2 nodes and never reached the Study, Claim, AIDA or Quote.Those steps are linked by domain predicates the network graph doesn't index:
Fix
backbone_neighbours()— reads every nanopub a node points at straight from its TriG, plus resolves the Claim→AIDA hop via a newaida-sentence-nanopub.rqquery, then keeps only targets that are themselves FORRT chain steps (chain_step_kind()). Value-lists, templates, papers and other noise are dropped and never crawled, so this stays robust without hard-coding a predicate list.walk()merges these with the existingrefersToNanopubneighbours (new edge relationbackbone).Verified
Against a published 6-step chain (white-shark geolocation), entering from the CiTO:
Before:
Imported 2 nanopubs(CiTO + Outcome).After:
Imported 6 nanopubs, 8 edges— Quote, AIDA, Claim, Study, Outcome, CiTO, each classified by template.No new dependencies (
rdflibalready required). This also flows downstream to the per-replication repos and the GRID4EARTH benchmark template on their next sync.