This repo pushes millions of legacy rows through SQLAlchemy. When Codex or any other agent has to work on these transfers, keep the following rules in mind to avoid hour-long runs:
- Do not call
session.bulk_save_objectsfor high frequency tables (e.g., transducer observations, water-levels, chemistry results). It still instantiates every mapped class and kills throughput. - Instead, build plain dictionaries/tuples and call
session.execute(insert(Model), data)or the newer SQLAlchemysession.execute(stmt, execution_options={"synchronize_session": False}). - If validation is required (Pydantic models, bound schemas), validate first and dump to dicts before the Core insert.
- Activate the repo virtualenv before testing:
source .venv/bin/activatefrom the project root so all dependencies (sqlalchemy, fastapi, etc.) are available. - Load environment variables from
.envso pytest sees the same DB creds the app uses. For quick shells:set -a; source .env; set +a, or useENV_FILE=.env pytest ...withpython-dotenvinstalled. - Many tests expect a running Postgres bound to the vars in
.env; confirmPOSTGRES_*values point to the right instance before running destructive suites. - When done,
deactivateto exit the venv and avoid polluting other shells.
- Data migrations should be safe to re-run without creating duplicate rows or corrupting data.
- Use upserts or duplicate checks and update source fields only after successful inserts.
- After completing any code modification, do a cleanup and code analysis pass adjusted to the size and risk of the change.
- Check for obvious regressions, dead code, inconsistent config/docs/tests, and adjacent issues introduced by the change.
- Fix any concrete issues you find in that pass instead of stopping at implementation.
- After code cleanup, run
blackon the touched Python files and runflake8on the touched Python files before wrapping up. - Run targeted validation for the modified area after cleanup; use broader validation when the change affects shared boot, deploy, or database paths.
Following this playbook keeps ETL runs measured in seconds/minutes instead of hours. EOF
Always use source .venv/bin/activate to activate the venv running python