Skip to content

feat: dataset provenance form with view/edit/validation#534

Merged
SanjeevLakhwani merged 90 commits into
masterfrom
feat/new-dataset-provenance
Jun 1, 2026
Merged

feat: dataset provenance form with view/edit/validation#534
SanjeevLakhwani merged 90 commits into
masterfrom
feat/new-dataset-provenance

Conversation

@SanjeevLakhwani
Copy link
Copy Markdown
Contributor

@SanjeevLakhwani SanjeevLakhwani commented Mar 23, 2026

Summary

  • Adds a new DatasetForm with tabbed UI covering all dataset metadata fields (core info, contacts, links, classification, study details, publications/funding, PCGL info)
  • Adds a View Provenance button on the Dataset card (visible to all users) that opens a read-only modal of all dataset metadata
  • Migrates all project.datasets references to project.datasets_v2 and removes the deprecated datasets field from the Project type

Pre-requisite

@SanjeevLakhwani SanjeevLakhwani changed the title feat: dataset provenance form with view/edit/validation WIP-Web: feat: dataset provenance form with view/edit/validation Apr 1, 2026
@SanjeevLakhwani SanjeevLakhwani requested a review from gsfk April 8, 2026 13:30
@gsfk
Copy link
Copy Markdown
Member

gsfk commented Apr 9, 2026

First comment: wipes out existing datasets in web, although public seems fine.

Or perhaps I'm meant to migrate? It's not clear how to do that, however.

Copy link
Copy Markdown
Member

@gsfk gsfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First-pass attempt, I haven't tried everything yet.

  • Existing datasets go missing, see comment above.

  • Overall, the form is pretty long, so I suspect will be easier to submit as a document. I haven't tried this yet

  • Because it's so long it should be a little more forgiving, in particular a mis-click outside of the modal should not close the modal and wipe all content.

  • Does it need to be a modal?

  • The "person" and "organization" roles are all together ("Site" doesn't make much sense as a role for a person)

  • it wasn't immediately clear to me that "Primary Contact" had to be a single entity... I guess that's on me for not reading the word "primary", although possibly I was put off by "add name" which looks like you can add more contacts.

  • "add Ontology resource" is bound to be filled in incorrectly without more controls, even in day-to-day bento operations you can find data with incorrect iri_prefix. This is more of a bento bookkeeping comment than a review of this particular PR. It refuses ontology additions without resources, so not sure if this will discourage ontology use rather than encourage it. I'm not sure how to make this easier.

  • Validation errors produce the message "Zod validation error" which is a little odd sounding

  • Are stakeholders really required? What if the primary contact is an organization and also the stakeholder?

  • after all that, it actually failed to create a dataset, since katsu seems to require release_date but the form doesn't, so it failed (and I lost everything I typed in...)

@SanjeevLakhwani SanjeevLakhwani requested a review from gsfk April 22, 2026 17:38
Copy link
Copy Markdown
Member

@gsfk gsfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming along!

  • can't actually create a dataset, even when form passes validation, although that now looks like a katsu error. It produces 400 Bad Request and this message: "Error validating discovery scope: project-dataset pair does not exist." which seems like a nonsensical error when trying to create a dataset
  • some fields are required dynamically (license url is required only if some other license fields are filled in) but are never marked as required in the form
  • I'm able to fill in keywords using both string and ontology even though the form says it should be one or the other
  • submission by file seems to work well, although can't test because of first issue

less important stuff:

  • validation errors could show which section of the form they're from (currently they show intenal form codes like root or taxa.0)
  • validation errors are sometimes incomplete: if I add ontologies without a resource and a license without a url, it only shows the license error. If I correct the license, the ontology error appears

@SanjeevLakhwani SanjeevLakhwani requested a review from gsfk April 29, 2026 14:48
Copy link
Copy Markdown
Member

@gsfk gsfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation errors are missing... now only shows errors on the current tab, if any.

If you don't have errors on the current tab it just blocks submission without telling you anything.

@SanjeevLakhwani SanjeevLakhwani requested a review from gsfk April 29, 2026 19:44
Copy link
Copy Markdown
Member

@gsfk gsfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working as expected, pending migration away from "_v2"

@gsfk gsfk self-requested a review May 7, 2026 18:52
Copy link
Copy Markdown
Member

@gsfk gsfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-testing after katsu updates:

  • adding database Linked Field set is broken
  • if you try to edit database info more than once, the information is often stale and behaves weirdly
  • editing sometimes modifies the wrong dataset (I had dataset "A" and dataset "B".... editing the name of A changed B instead and left A untouched)

@SanjeevLakhwani SanjeevLakhwani requested a review from gsfk May 8, 2026 10:49
Copy link
Copy Markdown
Member

@gsfk gsfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Form data still looks stale, if I click through multiple datasets in turn, I get information for the previous dataset I clicked rather than the current one.

Also having trouble with a dataset creation features, lots of stuff not related to provenance is not working. Will this magically fix if we stop naming stuff "v2"?

  • can't ingest data into a new dataset (katsu says error encountered while ingesting: dataset does not exist ). But ingestion does work for the v2 datasets that were created by the migration process.
  • data summaries are broken (the modal when you click on "Clinical Data", "Experiments" or "Variants".. values are all zero or "N/A")

@SanjeevLakhwani SanjeevLakhwani requested a review from gsfk May 11, 2026 17:30
@gsfk
Copy link
Copy Markdown
Member

gsfk commented May 12, 2026

does it do anything with the GeoJSON or just store it as a string?

@SanjeevLakhwani
Copy link
Copy Markdown
Contributor Author

does it do anything with the GeoJSON or just store it as a string?

Nothing for now, it is just a possible input for a field in provenance

@gsfk
Copy link
Copy Markdown
Member

gsfk commented May 14, 2026

bento_web search not working for new v2 datasets

Show Zod validation errors below the form after JSON import instead of
logging to console only.
- Use Array.isArray before .map() in handleFinish to prevent TypeError
  when imported JSON has non-array values for publications/keywords/links
- Strip non-array values for Form.List fields in prepareInitialValues
  to prevent Ant Design console warnings on import
…ting bad values

Validate imported JSON before setting form values; strip top-level fields
with Zod errors and surface them in a UI alert rather than letting invalid
values reach the form.
Comment thread src/components/datasets/DatasetForm/tabs/ContactsTab.tsx Outdated
Comment thread src/components/datasets/DatasetForm/tabs/CoreInfoTab.tsx Outdated
Comment thread src/components/datasets/DatasetForm/tabs/PcglInfoTab.tsx Outdated
Comment thread src/modules/datasets/actions.js Outdated
Comment thread src/components/datasets/DatasetFormModal.js
Comment thread src/components/datasets/DatasetForm/tabs/LinksMediaTab.tsx Outdated
Comment thread src/components/datasets/DatasetForm/fields/PublicationVenueFields.tsx Outdated
Comment thread src/components/datasets/DatasetForm/tabs/CoreInfoTab.tsx Outdated
@davidlougheed davidlougheed changed the title WIP-Web: feat: dataset provenance form with view/edit/validation feat: dataset provenance form with view/edit/validation May 27, 2026
Comment thread src/components/datasets/DatasetForm/tabs/ClassificationTab.tsx Outdated
Comment thread src/components/datasets/DatasetForm/tabs/ClassificationTab.tsx Outdated
Comment thread src/components/datasets/DatasetForm/tabs/ClassificationTab.tsx Outdated
Copy link
Copy Markdown
Member

@gsfk gsfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • shouldn't there be a field for data access codes?
  • geoJson serializes weirdly:
Image

z.url() uses WHATWG parser which accepts any scheme, allowing typos like
htttp:// to pass. Refine with regex to enforce http/https only.
Backend returns spatial_coverage as a JSON object; textarea displayed
[object Object]. Stringify in prepareInitialValues to match the string
form that handleFinish expects to parse back on submit.
- Add COMMON_ONTOLOGIES and COMMON_KEYWORD_PRESETS constants sourced
  from bento_lib Python definitions (EFO, HP, MONDO, NCBITaxon, NCIT,
  OBI, SNOMED, SO, UBERON; species, assays, specimens, anatomy terms)
- Quick-add dropdown on ontology resources section
- Quick-add grouped dropdown on taxonomy section (Species/Assays/
  Specimens/Sequence types/Anatomy)
- Add ontology keyword/taxonomy buttons for blank ontology entries
- Auto-fill matching resource when ontology ID is typed in keyword or
  taxonomy fields
- Add 10 new roles: Collaborating/Principal Laboratory, Research Group,
  Hosting Institution, Distributor, Editor, Translator, Data Collector,
  Data Manager, Contact Person
- Filter taxa quick-add presets to Species category only
- COMMON_KEYWORD_PRESETS -> COMMON_ONTOLOGY_PRESETS
- COMMON_ONTOLOGIES -> COMMON_ONTOLOGY_RESOURCE_PRESETS
- KeywordOntologyPreset -> OntologyPreset
- Add TODO to fetch COMMON_ONTOLOGY_RESOURCE_PRESETS from an endpoint
Copy link
Copy Markdown
Member

@gsfk gsfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary approval since I'm out of issues for this latest iteration. No comments on code.

It's now possible to have ontology resource mismatches between the dataset info, and the data itself, by each having different versions, but:

  • that's now easy to edit for the dataset info
  • it's not clear that it even matters

For actually using the ontologies in phenopacket view in bento_public, it seems odd that we don't simply use the resources that are in the phenopacket itself. But this is a separate issue.

Copy link
Copy Markdown
Member

@davidlougheed davidlougheed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@SanjeevLakhwani SanjeevLakhwani merged commit 3be0334 into master Jun 1, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants