Skip to content

Legacy Transfer Pipeline.md

Kelsey Smuczynski edited this page Mar 27, 2026 · 2 revisions

Legacy Transfer Pipeline

Purpose

The transfers/ package migrates legacy NM_Aquifer and related source data into the current PostgreSQL/PostGIS schema. This is not a one-off historical artifact — it remains an active operational part of the repository.


Main Entry Point

python -m transfers.transfer

Orchestration Model

transfers/transfer.py runs the pipeline in phases:

  1. Optional schema reset and rebuild
  2. Foundational transfers (parallel)
  3. Well transfer
  4. Non-well location-type transfers (parallel)
  5. Large parallel transfer group for independent domains
  6. Sequential chemistry and sensor-dependent stages
  7. Location cleanup

A separate continuous-water-levels-only path is controlled by environment flags.


Environment Toggles

The orchestrator reads many TRANSFER_* environment variables:

Category Variables
Data domains TRANSFER_WELL_SCREENS, TRANSFER_SENSORS, TRANSFER_CONTACTS, TRANSFER_PERMISSIONS, TRANSFER_WATER_LEVELS, TRANSFER_CHEMISTRY_*, TRANSFER_NGWMN, TRANSFER_SURFACE_WATER, TRANSFER_WEATHER, TRANSFER_NON_WELL_THING_TYPES
Behavior DROP_AND_REBUILD_DB, ERASE_AND_REBUILD, CLEANUP_LOCATIONS, CONTINUOUS_WATER_LEVELS
Performance TRANSFER_LIMIT, TRANSFER_TEST_POINTIDS, TRANSFER_PARALLEL_WELLS, TRANSFER_WORKERS

.env.example does not list every toggle the code currently honors.


Schema Reset Behavior

If DROP_AND_REBUILD_DB=true, the transfer flow:

  1. Recreates the public schema
  2. Recreates PostGIS
  3. Runs Alembic migrations
  4. Syncs full-text-search triggers
  5. Initializes lexicon data
  6. Initializes parameter data

Outputs

  • Logs: transfers/logs/
  • Metrics: transfers/metrics/
  • Optional upload to GCS bucket (metrics and logs)

Spatial Transformation

During import, coordinates are automatically converted from UTM (NAD83 / SRID 26913) to WGS84 (SRID 4326). Legacy contact records are also normalized via OwnerKey mapping and canonicalization.


Performance Guidance

  • Avoid ORM-heavy bulk object creation for high-volume tables
  • Prefer SQLAlchemy Core inserts for large row counts
  • Keep data migrations idempotent and safe to re-run

These rules matter because many transfer tables contain very large row counts.


Operational Caveats

  • The transfer script protects against accidentally targeting ocotilloapi_test
  • It does not fully protect against every wrong local database selection — always inspect .env before running
  • The transfer script does not document a safe staging target or source credentials in-repo; confirm the target DB with the team before running

Clone this wiki locally