My wish is that I be able to supply a fully-resolved schema document to Gen3 deployment config dictionary_url that looks like this:
{
"node1": {...},
"node2": {...},
}
where by "fully resolved" I mean it contains no $ref expressions, need not contain any magic keys like _definitions.yaml or _terms.yaml, and all top-level keys correspond to data nodes.
I've been playing with dictionaryutils and seem to find that although it creates a fully resolved schema of this nature, it cannot be initialized in this way — and the reason it would be super nice to be able to do that is that such a doc can be easily generated with a number of data modelling systems that don't know anything about Gen3, e.g. LinkML or Hackolade.
Currently if I model in those frameworks I have to painstakingly back out my generic JSON schema document into a bespoke input format
{
"node1.yaml": {...},
"node2.yaml": {...},
"_definitions.yaml": {...},
"_terms.yaml": {...},
}
just so that it passes the DataDictionary.__init__, only to be resolved on the way out in the DataDictionary.schema
To be clear, this is nothing about the Gen3 data model e.g. internal custom structure like links, but only about what format is considered valid as input for the dictionary.
I have noted this here in dictionaryutils , tho I guess for it to be operationally useful there would also need to be an alternative configuration point added to the deployment config, something like dictionary_resolved_url, so that datamodelutils, peregrine etc would know what they are starting with.
I am happy to contribute some time to the work required, if it is considered worthwhile.
My wish is that I be able to supply a fully-resolved schema document to Gen3 deployment config
dictionary_urlthat looks like this:{ "node1": {...}, "node2": {...}, }where by "fully resolved" I mean it contains no
$refexpressions, need not contain any magic keys like_definitions.yamlor_terms.yaml, and all top-level keys correspond to data nodes.I've been playing with
dictionaryutilsand seem to find that although it creates a fully resolved schema of this nature, it cannot be initialized in this way — and the reason it would be super nice to be able to do that is that such a doc can be easily generated with a number of data modelling systems that don't know anything about Gen3, e.g. LinkML or Hackolade.Currently if I model in those frameworks I have to painstakingly back out my generic JSON schema document into a bespoke input format
{ "node1.yaml": {...}, "node2.yaml": {...}, "_definitions.yaml": {...}, "_terms.yaml": {...}, }just so that it passes the
DataDictionary.__init__, only to be resolved on the way out in theDataDictionary.schemaTo be clear, this is nothing about the Gen3 data model e.g. internal custom structure like
links, but only about what format is considered valid as input for the dictionary.I have noted this here in
dictionaryutils, tho I guess for it to be operationally useful there would also need to be an alternative configuration point added to the deployment config, something likedictionary_resolved_url, so thatdatamodelutils,peregrineetc would know what they are starting with.I am happy to contribute some time to the work required, if it is considered worthwhile.