-
Notifications
You must be signed in to change notification settings - Fork 4
Legacy Transfer Pipeline.md
The transfers/ package migrates legacy NM_Aquifer and related source data into the current PostgreSQL/PostGIS schema. This is not a one-off historical artifact — it remains an active operational part of the repository.
python -m transfers.transfertransfers/transfer.py runs the pipeline in phases:
- Optional schema reset and rebuild
- Foundational transfers (parallel)
- Well transfer
- Non-well location-type transfers (parallel)
- Large parallel transfer group for independent domains
- Sequential chemistry and sensor-dependent stages
- Location cleanup
A separate continuous-water-levels-only path is controlled by environment flags.
The orchestrator reads many TRANSFER_* environment variables:
| Category | Variables |
|---|---|
| Data domains |
TRANSFER_WELL_SCREENS, TRANSFER_SENSORS, TRANSFER_CONTACTS, TRANSFER_PERMISSIONS, TRANSFER_WATER_LEVELS, TRANSFER_CHEMISTRY_*, TRANSFER_NGWMN, TRANSFER_SURFACE_WATER, TRANSFER_WEATHER, TRANSFER_NON_WELL_THING_TYPES
|
| Behavior |
DROP_AND_REBUILD_DB, ERASE_AND_REBUILD, CLEANUP_LOCATIONS, CONTINUOUS_WATER_LEVELS
|
| Performance |
TRANSFER_LIMIT, TRANSFER_TEST_POINTIDS, TRANSFER_PARALLEL_WELLS, TRANSFER_WORKERS
|
.env.exampledoes not list every toggle the code currently honors.
If DROP_AND_REBUILD_DB=true, the transfer flow:
- Recreates the
publicschema - Recreates PostGIS
- Runs Alembic migrations
- Syncs full-text-search triggers
- Initializes lexicon data
- Initializes parameter data
- Logs:
transfers/logs/ - Metrics:
transfers/metrics/ - Optional upload to GCS bucket (metrics and logs)
During import, coordinates are automatically converted from UTM (NAD83 / SRID 26913) to WGS84 (SRID 4326). Legacy contact records are also normalized via OwnerKey mapping and canonicalization.
- Avoid ORM-heavy bulk object creation for high-volume tables
- Prefer SQLAlchemy Core inserts for large row counts
- Keep data migrations idempotent and safe to re-run
These rules matter because many transfer tables contain very large row counts.
- The transfer script protects against accidentally targeting
ocotilloapi_test - It does not fully protect against every wrong local database selection — always inspect
.envbefore running - The transfer script does not document a safe staging target or source credentials in-repo; confirm the target DB with the team before running