feat: dataset provenance form with view/edit/validation#534
Conversation
|
First comment: wipes out existing datasets in web, although public seems fine. Or perhaps I'm meant to migrate? It's not clear how to do that, however. |
There was a problem hiding this comment.
First-pass attempt, I haven't tried everything yet.
-
Existing datasets go missing, see comment above.
-
Overall, the form is pretty long, so I suspect will be easier to submit as a document. I haven't tried this yet
-
Because it's so long it should be a little more forgiving, in particular a mis-click outside of the modal should not close the modal and wipe all content.
-
Does it need to be a modal?
-
The "person" and "organization" roles are all together ("Site" doesn't make much sense as a role for a person)
-
it wasn't immediately clear to me that "Primary Contact" had to be a single entity... I guess that's on me for not reading the word "primary", although possibly I was put off by "add name" which looks like you can add more contacts.
-
"add Ontology resource" is bound to be filled in incorrectly without more controls, even in day-to-day bento operations you can find data with incorrect iri_prefix. This is more of a bento bookkeeping comment than a review of this particular PR. It refuses ontology additions without resources, so not sure if this will discourage ontology use rather than encourage it. I'm not sure how to make this easier.
-
Validation errors produce the message "Zod validation error" which is a little odd sounding
-
Are stakeholders really required? What if the primary contact is an organization and also the stakeholder?
-
after all that, it actually failed to create a dataset, since katsu seems to require
release_datebut the form doesn't, so it failed (and I lost everything I typed in...)
gsfk
left a comment
There was a problem hiding this comment.
Coming along!
- can't actually create a dataset, even when form passes validation, although that now looks like a katsu error. It produces
400 Bad Requestand this message: "Error validating discovery scope: project-dataset pair does not exist."which seems like a nonsensical error when trying to create a dataset - some fields are required dynamically (license url is required only if some other license fields are filled in) but are never marked as required in the form
- I'm able to fill in keywords using both string and ontology even though the form says it should be one or the other
- submission by file seems to work well, although can't test because of first issue
less important stuff:
- validation errors could show which section of the form they're from (currently they show intenal form codes like
rootortaxa.0) - validation errors are sometimes incomplete: if I add ontologies without a resource and a license without a url, it only shows the license error. If I correct the license, the ontology error appears
gsfk
left a comment
There was a problem hiding this comment.
Validation errors are missing... now only shows errors on the current tab, if any.
If you don't have errors on the current tab it just blocks submission without telling you anything.
gsfk
left a comment
There was a problem hiding this comment.
working as expected, pending migration away from "_v2"
There was a problem hiding this comment.
Re-testing after katsu updates:
- adding database Linked Field set is broken
- if you try to edit database info more than once, the information is often stale and behaves weirdly
- editing sometimes modifies the wrong dataset (I had dataset "A" and dataset "B".... editing the name of A changed B instead and left A untouched)
There was a problem hiding this comment.
Form data still looks stale, if I click through multiple datasets in turn, I get information for the previous dataset I clicked rather than the current one.
Also having trouble with a dataset creation features, lots of stuff not related to provenance is not working. Will this magically fix if we stop naming stuff "v2"?
- can't ingest data into a new dataset (katsu says
error encountered while ingesting: dataset does not exist). But ingestion does work for the v2 datasets that were created by the migration process. - data summaries are broken (the modal when you click on "Clinical Data", "Experiments" or "Variants".. values are all zero or "N/A")
|
does it do anything with the GeoJSON or just store it as a string? |
Nothing for now, it is just a possible input for a field in provenance |
|
bento_web search not working for new v2 datasets |
Show Zod validation errors below the form after JSON import instead of logging to console only.
- Use Array.isArray before .map() in handleFinish to prevent TypeError when imported JSON has non-array values for publications/keywords/links - Strip non-array values for Form.List fields in prepareInitialValues to prevent Ant Design console warnings on import
…ting bad values Validate imported JSON before setting form values; strip top-level fields with Zod errors and surface them in a UI alert rather than letting invalid values reach the form.
…ovenance # Conflicts: # package-lock.json
z.url() uses WHATWG parser which accepts any scheme, allowing typos like htttp:// to pass. Refine with regex to enforce http/https only.
Backend returns spatial_coverage as a JSON object; textarea displayed [object Object]. Stringify in prepareInitialValues to match the string form that handleFinish expects to parse back on submit.
- Add COMMON_ONTOLOGIES and COMMON_KEYWORD_PRESETS constants sourced from bento_lib Python definitions (EFO, HP, MONDO, NCBITaxon, NCIT, OBI, SNOMED, SO, UBERON; species, assays, specimens, anatomy terms) - Quick-add dropdown on ontology resources section - Quick-add grouped dropdown on taxonomy section (Species/Assays/ Specimens/Sequence types/Anatomy) - Add ontology keyword/taxonomy buttons for blank ontology entries - Auto-fill matching resource when ontology ID is typed in keyword or taxonomy fields
- Add 10 new roles: Collaborating/Principal Laboratory, Research Group, Hosting Institution, Distributor, Editor, Translator, Data Collector, Data Manager, Contact Person - Filter taxa quick-add presets to Species category only
- COMMON_KEYWORD_PRESETS -> COMMON_ONTOLOGY_PRESETS - COMMON_ONTOLOGIES -> COMMON_ONTOLOGY_RESOURCE_PRESETS - KeywordOntologyPreset -> OntologyPreset - Add TODO to fetch COMMON_ONTOLOGY_RESOURCE_PRESETS from an endpoint
gsfk
left a comment
There was a problem hiding this comment.
Preliminary approval since I'm out of issues for this latest iteration. No comments on code.
It's now possible to have ontology resource mismatches between the dataset info, and the data itself, by each having different versions, but:
- that's now easy to edit for the dataset info
- it's not clear that it even matters
For actually using the ontologies in phenopacket view in bento_public, it seems odd that we don't simply use the resources that are in the phenopacket itself. But this is a separate issue.

Summary
DatasetFormwith tabbed UI covering all dataset metadata fields (core info, contacts, links, classification, study details, publications/funding, PCGL info)project.datasetsreferences toproject.datasets_v2and removes the deprecateddatasetsfield from theProjecttypePre-requisite