BDMS-609: Add support for legacy SiteNotes handling#550
Conversation
- Added "Site Notes (legacy)" term with associated category `note_type` and definition.
- Added `site_notes` property to `Thing` model and schema to handle "Site Notes (legacy)". - Ensured alignment with existing note retrieval and schema structure.
- Insert legacy `SiteNotes` in the `Notes` table during transfer process. - Updated stats to track the number of `notes_created`.
There was a problem hiding this comment.
Pull request overview
Adds end-to-end support for exposing legacy WaterLevels.SiteNotes in the new system by migrating them into the polymorphic Notes table and surfacing them on Thing API responses.
Changes:
- Added new lexicon term
Site Notes (legacy)under thenote_typecategory. - Updated
WaterLevelTransfererto createNotesrecords from unique, non-empty legacySiteNotesperPointID. - Exposed these notes via
Thing.site_notesandBaseThingResponse.site_notes.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
transfers/waterlevels_transfer.py |
Creates Notes rows during WaterLevels migration from legacy SiteNotes. |
schemas/thing.py |
Adds site_notes to the base Thing response schema for API exposure. |
db/thing.py |
Adds Thing.site_notes property backed by polymorphic notes filtering. |
core/lexicon.json |
Registers Site Notes (legacy) as a valid note_type term. |
transfers/waterlevels_transfer.py
Outdated
| # Site Notes (legacy) | ||
| site_notes = { | ||
| prep["row"].SiteNotes | ||
| for prep in prepared_rows | ||
| if hasattr(prep["row"], "SiteNotes") | ||
| and prep["row"].SiteNotes | ||
| and str(prep["row"].SiteNotes).strip() | ||
| } |
There was a problem hiding this comment.
site_notes is deduped before normalization, but content is later strip()ed when creating Notes. If the legacy source contains the same note with different leading/trailing whitespace (e.g., "note" vs " note "), the set will treat them as distinct and you can end up inserting duplicate Notes with identical content. Consider normalizing first (cast to str, strip, and then dedupe on the normalized string).
There was a problem hiding this comment.
Resolved. Implemented a multi-key deduplication strategy (content + date) to balance historical preservation with redundancy reduction (see commit e6a8f21).
| # Site Notes (legacy) | ||
| site_notes = { | ||
| prep["row"].SiteNotes | ||
| for prep in prepared_rows | ||
| if hasattr(prep["row"], "SiteNotes") | ||
| and prep["row"].SiteNotes | ||
| and str(prep["row"].SiteNotes).strip() | ||
| } | ||
| for note_content in site_notes: | ||
| session.add( | ||
| Notes( | ||
| target_table="thing", | ||
| target_id=thing_id, | ||
| note_type="Site Notes (legacy)", | ||
| content=str(note_content).strip(), | ||
| release_status="public", | ||
| ) | ||
| ) | ||
| stats["notes_created"] += 1 |
There was a problem hiding this comment.
New behavior migrates legacy SiteNotes into the polymorphic Notes table, but there are existing unit tests for WaterLevelTransferer (e.g., in tests/test_transfer_legacy_dates.py) and none appear to cover this new note-migration path. Adding tests for: (1) skipping None/blank/whitespace-only notes and (2) deduping unique notes per PointID (including whitespace variants) would prevent regressions.
| monitoring_frequencies: list[MonitoringFrequencyResponse] = [] | ||
| general_notes: list[NoteResponse] = [] | ||
| sampling_procedure_notes: list[NoteResponse] = [] | ||
| site_notes: list[NoteResponse] = [] |
There was a problem hiding this comment.
site_notes was added to the response schema with an empty-list default, but there are existing API/BDD checks for other note arrays (e.g., general_notes, sampling_procedure_notes) and none validate that site_notes is always present and is [] when empty (per Issue #549 acceptance criteria). Please add/extend an API test to assert the field is included and defaults to an empty list when no notes exist.
| site_notes: list[NoteResponse] = [] | |
| site_notes: list[NoteResponse] = Field(default_factory=list) |
- Track legacy `SiteNotes` by content and date to preserve context for duplicates. - Updated insertion to prepend the date to note content for uniqueness.
Why
This PR addresses the following problem / context:
SiteNotesfrom theWaterLevelstable were previously inaccessible in the new database and API.How
Implementation summary - the following was changed / added / removed:
Site Notes (legacy)to thenote_typecategory in core/lexicon.json.transfers/waterlevels_transfer.pyto migrate unique, non-emptySiteNotesinto the polymorphicNotestable, linked toThingrecords.NaN, empty, or whitespace-only strings from the legacy source.site_notesproperty to theThingdatabase model indb/thing.pyfor easy retrieval of these legacy records.site_notesin theBaseThingResponseschema inschemas/thing.py, ensuring it is available across all relevant API endpoints (e.g.,/things/wells).Notes
Any special considerations, workarounds, or follow-up work to note?
[]forsite_notesif no records exist, ensuring consistent behavior for downstream clients.WaterLevelssource are currently being mapped to this type; other legacy tables (likeSurfaceWaterData) still retain their originalSiteNotes`columns for historical reference until specifically mapped.SiteNotesfromWaterLevelsCSV toNotestable for Field Form Support #549