Skip to content

refactor: unify dataset endpoints and remove legacy Dataset model#704

Open
SanjeevLakhwani wants to merge 12 commits into
developfrom
refactor/unify-datasets-v2-endpoint
Open

refactor: unify dataset endpoints and remove legacy Dataset model#704
SanjeevLakhwani wants to merge 12 commits into
developfrom
refactor/unify-datasets-v2-endpoint

Conversation

@SanjeevLakhwani
Copy link
Copy Markdown
Collaborator

@SanjeevLakhwani SanjeevLakhwani commented Jun 4, 2026

Summary

Redmine tickets: 2833, 2826

  • Consolidated four duplicate dataset endpoint bases (/datasets, /datasets_v2, /api/datasets, /api/datasets_v2) into a single /api/datasets
  • Removed the legacy Dataset model, its serializer, and three management commands (create_dataset, list_project_datasets, migrate_datasets_to_v2)
  • Renamed model DatasetV2Dataset and DatasetV2TranslationDatasetTranslation (migration 0015 renames DB tables); DatasetV2SerializerDatasetSerializer, DatasetV2ViewSetDatasetViewSet
  • project.datasets reverse relation (renamed from dv2) and ProjectSerializer.datasets now expose Dataset directly — the datasets_v2 key is gone
  • Migration 0013 made a no-op (one-time data migration already applied to all deployments); migration 0014 drops the Dataset table and renames the FK related_name
  • Fixed missing mark_authz_done in DatasetViewSet.data_types action that caused 403 on all requests to that endpoint
  • Fixed pg_trgm extension not created on fresh DBs (migration patients/0012 now runs CreateExtension('pg_trgm') before first GIN index)
  • Fixed non-deterministic cbio export test (ordered Biosample queryset by id)

Four bases (/datasets, /datasets_v2, /api/datasets, /api/datasets_v2)
all pointed at the same ViewSet with inconsistent sub-path coverage.
Unified to /api/datasets only.

- Add counts, data_types, data_type_detail, translations,
  translation_detail as @action methods on DatasetV2ViewSet so
  the DRF router picks them up under /api/datasets/<id>/...
- Remove /datasets and /datasets_v2 root-level path() entries
- Remove duplicate router.register for datasets_v2
- Delete now-unreachable dataset_summary, dataset_v2_summary,
  dataset_v2_data_type_summary, dataset_v2_data_type view functions
  and their orphaned imports
- Update test reverse() names to DRF router-generated names
Drop the Dataset model, its management commands, and all references.
DatasetV2 is now the sole dataset representation:
- project.datasets (renamed from dv2) exposes DatasetV2 via FK reverse relation
- ProjectSerializer.datasets uses DatasetV2Serializer (was datasets_v2)
- Migration 0013 made no-op (data already migrated in all deployments)
- Migration 0014 drops Dataset table, renames related_name via AlterField

Also adds mark_authz_done to DatasetV2ViewSet.data_types action paths
(missing call caused 403 on valid and not-found requests).
@SanjeevLakhwani SanjeevLakhwani marked this pull request as draft June 4, 2026 15:21
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

❌ Patch coverage is 93.97590% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.88%. Comparing base (fd60221) to head (05f499c).

Files with missing lines Patch % Lines
chord_metadata_service/chord/api_views.py 92.24% 9 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #704      +/-   ##
===========================================
+ Coverage    93.33%   94.88%   +1.55%     
===========================================
  Files          139      138       -1     
  Lines         6060     5848     -212     
  Branches       597      556      -41     
===========================================
- Hits          5656     5549     -107     
+ Misses         346      245     -101     
+ Partials        58       54       -4     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…atasetTranslation

Add migration 0015 with RenameModel ops (renames DB tables chord_datasetv2 →
chord_dataset and chord_datasetv2translation → chord_datasettranslation).

Update all non-migration Python references across the codebase, including
the Resource.get_scope_filters() reverse-accessor update (datasetv2_set →
dataset_set, datasetv2__ → dataset__).
…ataset, VALID_DATASET_V2_PRIMARY_CONTACT → VALID_DATASET_PRIMARY_CONTACT)
Remove duplicate test_dataset_summary / test_dataset_summary_not_found
methods that collided after v2 rename; fix over-indented continuation
line in test_api.py.
Copy link
Copy Markdown
Member

@gsfk gsfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working for me, and all stray v2s are gone.


@async_api_view(["GET"])
@permission_classes([BentoAllowAny])
async def dataset_summary(request: DrfRequest, identifier: str):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd prefer if these legacy chord endpoints are left as-is. they should be cleaned up in coming versions anyway.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want dead code? These endpoints are already covered by DatasetViewSet. Consolidating them was requested on the initial PR, I apologize for not doing it properly on the first pr. The URL routes are also removed in urls.py, so reverting the functions would leave unreachable dead code.


@async_api_view(["GET"])
@permission_classes([BentoAllowAny])
async def dataset_v2_summary(request: DrfRequest, identifier: str):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem


@api_view(["GET"])
@permission_classes([BentoAllowAny])
async def dataset_v2_data_type_summary(request: DrfRequest, identifier: str):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd prefer if these legacy chord endpoints are left as-is. they should be cleaned up in coming versions anyway.


@api_view(["GET", "DELETE"])
@permission_classes([BentoDeferToHandler])
async def dataset_v2_data_type(request: DrfRequest, identifier: str, data_type: str):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

]

operations = [
django.contrib.postgres.operations.CreateExtension('pg_trgm'),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: why is an old migration being modified?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants