Skip to content

BDMS-609: Add support for legacy SiteNotes handling#550

Merged
jirhiker merged 4 commits intostagingfrom
kas-migrate-legacy-sitenotes-field
Feb 27, 2026
Merged

BDMS-609: Add support for legacy SiteNotes handling#550
jirhiker merged 4 commits intostagingfrom
kas-migrate-legacy-sitenotes-field

Conversation

@ksmuczynski
Copy link
Copy Markdown
Contributor

@ksmuczynski ksmuczynski commented Feb 26, 2026

Why

This PR addresses the following problem / context:

  • Legacy SiteNotes from the WaterLevels table were previously inaccessible in the new database and API.
  • This data is critical for field technicians and must be visible during field form generation.
  • To ensure data quality, blank or missing legacy records needed to be filtered out during the migration process.

How

Implementation summary - the following was changed / added / removed:

  • Added Site Notes (legacy) to the note_type category in core/lexicon.json.
  • Updated transfers/waterlevels_transfer.py to migrate unique, non-empty SiteNotes into the polymorphic Notes table, linked to Thing records.
  • Enhanced migration logic to explicitly skip NaN, empty, or whitespace-only strings from the legacy source.
  • Implemented a multi-key deduplication strategy (date + content) to balance historical preservation with redundancy reduction.
  • Added a site_notes property to the Thing database model in db/thing.py for easy retrieval of these legacy records.
  • Exposed site_notes in the BaseThingResponse schema in schemas/thing.py, ensuring it is available across all relevant API endpoints (e.g., /things/wells).

Notes

Any special considerations, workarounds, or follow-up work to note?

  • The API is designed to return an empty list [] for site_notes if no records exist, ensuring consistent behavior for downstream clients.
  • Date prefixing ensures that even if note content is identical, the system treats observations from different visits as unique historical events.
  • Only notes from the WaterLevels source are currently being mapped to this type; other legacy tables (like SurfaceWaterData) still retain their original SiteNotes `columns for historical reference until specifically mapped.
  • Resolves Issue Migrate Legacy SiteNotes from WaterLevels CSV to Notes table for Field Form Support #549

- Added "Site Notes (legacy)" term with associated category `note_type` and definition.
- Added `site_notes` property to `Thing` model and schema to handle "Site Notes (legacy)".
- Ensured alignment with existing note retrieval and schema structure.
- Insert legacy `SiteNotes` in the `Notes` table during transfer process.
- Updated stats to track the number of `notes_created`.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end support for exposing legacy WaterLevels.SiteNotes in the new system by migrating them into the polymorphic Notes table and surfacing them on Thing API responses.

Changes:

  • Added new lexicon term Site Notes (legacy) under the note_type category.
  • Updated WaterLevelTransferer to create Notes records from unique, non-empty legacy SiteNotes per PointID.
  • Exposed these notes via Thing.site_notes and BaseThingResponse.site_notes.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
transfers/waterlevels_transfer.py Creates Notes rows during WaterLevels migration from legacy SiteNotes.
schemas/thing.py Adds site_notes to the base Thing response schema for API exposure.
db/thing.py Adds Thing.site_notes property backed by polymorphic notes filtering.
core/lexicon.json Registers Site Notes (legacy) as a valid note_type term.

Comment on lines +401 to +408
# Site Notes (legacy)
site_notes = {
prep["row"].SiteNotes
for prep in prepared_rows
if hasattr(prep["row"], "SiteNotes")
and prep["row"].SiteNotes
and str(prep["row"].SiteNotes).strip()
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

site_notes is deduped before normalization, but content is later strip()ed when creating Notes. If the legacy source contains the same note with different leading/trailing whitespace (e.g., "note" vs " note "), the set will treat them as distinct and you can end up inserting duplicate Notes with identical content. Consider normalizing first (cast to str, strip, and then dedupe on the normalized string).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved. Implemented a multi-key deduplication strategy (content + date) to balance historical preservation with redundancy reduction (see commit e6a8f21).

Comment on lines +401 to +419
# Site Notes (legacy)
site_notes = {
prep["row"].SiteNotes
for prep in prepared_rows
if hasattr(prep["row"], "SiteNotes")
and prep["row"].SiteNotes
and str(prep["row"].SiteNotes).strip()
}
for note_content in site_notes:
session.add(
Notes(
target_table="thing",
target_id=thing_id,
note_type="Site Notes (legacy)",
content=str(note_content).strip(),
release_status="public",
)
)
stats["notes_created"] += 1
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior migrates legacy SiteNotes into the polymorphic Notes table, but there are existing unit tests for WaterLevelTransferer (e.g., in tests/test_transfer_legacy_dates.py) and none appear to cover this new note-migration path. Adding tests for: (1) skipping None/blank/whitespace-only notes and (2) deduping unique notes per PointID (including whitespace variants) would prevent regressions.

Copilot uses AI. Check for mistakes.
monitoring_frequencies: list[MonitoringFrequencyResponse] = []
general_notes: list[NoteResponse] = []
sampling_procedure_notes: list[NoteResponse] = []
site_notes: list[NoteResponse] = []
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

site_notes was added to the response schema with an empty-list default, but there are existing API/BDD checks for other note arrays (e.g., general_notes, sampling_procedure_notes) and none validate that site_notes is always present and is [] when empty (per Issue #549 acceptance criteria). Please add/extend an API test to assert the field is included and defaults to an empty list when no notes exist.

Suggested change
site_notes: list[NoteResponse] = []
site_notes: list[NoteResponse] = Field(default_factory=list)

Copilot uses AI. Check for mistakes.
- Track legacy `SiteNotes` by content and date to preserve context for duplicates.
- Updated insertion to prepend the date to note content for uniqueness.
@jirhiker jirhiker merged commit 700dbdd into staging Feb 27, 2026
8 checks passed
@TylerAdamMartinez TylerAdamMartinez deleted the kas-migrate-legacy-sitenotes-field branch March 12, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants