Skip to content

Latest commit

 

History

History
584 lines (477 loc) · 15.6 KB

File metadata and controls

584 lines (477 loc) · 15.6 KB

Database Overview

This document describes the MySQL schema used by the Climate Analog Database & API.
It covers the purpose of each table group, the data flow (ingest → temp → final → distances), and how to recreate the schema locally.


Contents


High-Level Architecture

erDiagram
  STATES {
    string StateCode PK
    string StateAbbr
    string StateName
  }

  COUNTIES {
    int CountyID PK
    string CountyCode
    string CountyName
    string StateCode FK
    float Latitude
    float Longitude
  }

  PHYSICALDISTANCES {
    int TargetCountyId PK
    int AnalogCountyId PK
    float PhysicalDistance
  }

  %% ------------------ FINAL FACT TABLES ------------------
  MONTHLY_PRECIP_TargetState {
    int CountyID PK
    int Year PK
    string Month PK
    float Precipitation
  }

  MONTHLY_TEMP_TargetState {
    int CountyID PK
    int Year PK
    string Month PK
    float Temperature
  }

  SEASONAL_PRECIP_TargetState {
    int CountyID PK
    int Year PK
    string Season PK
    float Precipitation
  }

  SEASONAL_TEMP_TargetState {
    int CountyID PK
    int Year PK
    string Season PK
    float Temperature
  }

  YEARLY_PRECIP_TargetState {
    int CountyID PK
    int Year PK
    float Precipitation
  }

  YEARLY_TEMP_TargetState {
    int CountyID PK
    int Year PK
    float Temperature
  }

  %% ------------------ NORMS TABLES ------------------
  MONTHLY_PRECIP_NORMS {
    int CountyID PK
    string Month PK
    float NormPrecipitation
    float StdDevPrecipitation
  }

  MONTHLY_TEMP_NORMS {
    int CountyID PK
    string Month PK
    float NormTemperature
    float StdDevTemperature
  }

  SEASONAL_PRECIP_NORMS {
    int CountyID PK
    string Season PK
    float NormPrecipitation
    float StdDevPrecipitation
  }

  SEASONAL_TEMP_NORMS {
    int CountyID PK
    string Season PK
    float NormTemperature
    float StdDevTemperature
  }

  YEARLY_PRECIP_NORMS {
    int CountyID PK
    float NormPrecipitation
    float StdDevPrecipitation
  }

  YEARLY_TEMP_NORMS {
    int CountyID PK
    float NormTemperature
    float StdDevTemperature
  }

  %% ------------------ DISTANCES (FINAL) ------------------
  MONTHLY_PRECIP_DIST {
    int TargetCountyID PK
    int AnalogCountyID PK
    int Year PK
    string Month PK
    float Distance
    int AnalogRank
  }

  MONTHLY_TEMP_DIST {
    int TargetCountyID PK
    int AnalogCountyID PK
    int Year PK
    string Month PK
    float Distance
    int AnalogRank
  }

  MONTHLY_COMBINED_DIST {
    int TargetCountyID PK
    int AnalogCountyID PK
    int Year PK
    string Month PK
    float Distance
    int AnalogRank
  }

  SEASONAL_PRECIP_DIST {
    int TargetCountyID PK
    int AnalogCountyID PK
    int Year PK
    string Season PK
    float Distance
    int AnalogRank
  }

  SEASONAL_TEMP_DIST {
    int TargetCountyID PK
    int AnalogCountyID PK
    int Year PK
    string Season PK
    float Distance
    int AnalogRank
  }

  SEASONAL_COMBINED_DIST {
    int TargetCountyID PK
    int AnalogCountyID PK
    int Year PK
    string Season PK
    float Distance
    int AnalogRank
  }

  YEARLY_PRECIP_DIST {
    int TargetCountyID PK
    int AnalogCountyID PK
    int Year PK
    float Distance
    int AnalogRank
  }

  YEARLY_TEMP_DIST {
    int TargetCountyID PK
    int AnalogCountyID PK
    int Year PK
    float Distance
    int AnalogRank
  }

  YEARLY_COMBINED_DIST {
    int TargetCountyID PK
    int AnalogCountyID PK
    int Year PK
    float Distance
    int AnalogRank
  }

  %% ------------------ TEMP STAGING (no FKs by design) ------------------
  TargetState_MNTH_PRECIP_TEMP {
    int CountyID
    int Year
    string Month
    float Precipitation
  }

  TargetState_MNTH_TEMP_TEMP {
    int CountyID
    int Year
    string Month
    float Temperature
  }

  TargetState_SEAS_PRECIP_TEMP {
    int CountyID
    int Year
    string Season
    float Precipitation
  }

  TargetState_SEAS_TEMP_TEMP {
    int CountyID
    int Year
    string Season
    float Temperature
  }

  TargetState_YR_PRECIP_TEMP {
    int CountyID
    int Year
    float Precipitation
  }

  TargetState_YR_TEMP_TEMP {
    int CountyID
    int Year
    float Temperature
  }

  MNTH_PRECIP_DIST_TEMP {
    int TargetCountyID
    int AnalogCountyID
    int Year
    string Month
    float Distance
  }

  MNTH_TEMP_DIST_TEMP {
    int TargetCountyID
    int AnalogCountyID
    int Year
    string Month
    float Distance
  }

  SEAS_PRECIP_DIST_TEMP {
    int TargetCountyID
    int AnalogCountyID
    int Year
    string Season
    float Distance
  }

  SEAS_TEMP_DIST_TEMP {
    int TargetCountyID
    int AnalogCountyID
    int Year
    string Season
    float Distance
  }

  YR_PRECIP_DIST_TEMP {
    int TargetCountyID
    int AnalogCountyID
    int Year
    float Distance
  }

  YR_TEMP_DIST_TEMP {
    int TargetCountyID
    int AnalogCountyID
    int Year
    float Distance
  }

  %% ------------------ RELATIONSHIPS ------------------
  STATES ||--o{ COUNTIES : has

  COUNTIES ||--o{ MONTHLY_PRECIP_TargetState : has
  COUNTIES ||--o{ MONTHLY_TEMP_TargetState   : has
  COUNTIES ||--o{ SEASONAL_PRECIP_TargetState: has
  COUNTIES ||--o{ SEASONAL_TEMP_TargetState  : has
  COUNTIES ||--o{ YEARLY_PRECIP_TargetState  : has
  COUNTIES ||--o{ YEARLY_TEMP_TargetState    : has

  COUNTIES ||--o{ MONTHLY_PRECIP_NORMS : norms_for
  COUNTIES ||--o{ MONTHLY_TEMP_NORMS   : norms_for
  COUNTIES ||--o{ SEASONAL_PRECIP_NORMS: norms_for
  COUNTIES ||--o{ SEASONAL_TEMP_NORMS  : norms_for
  COUNTIES ||--o{ YEARLY_PRECIP_NORMS  : norms_for
  COUNTIES ||--o{ YEARLY_TEMP_NORMS    : norms_for

  COUNTIES ||--o{ PHYSICALDISTANCES : as_Target
  COUNTIES ||--o{ PHYSICALDISTANCES : as_Analog

  COUNTIES ||--o{ MONTHLY_PRECIP_DIST   : as_Target
  COUNTIES ||--o{ MONTHLY_TEMP_DIST     : as_Target
  COUNTIES ||--o{ MONTHLY_COMBINED_DIST : as_Target
  COUNTIES ||--o{ SEASONAL_PRECIP_DIST  : as_Target
  COUNTIES ||--o{ SEASONAL_TEMP_DIST    : as_Target
  COUNTIES ||--o{ SEASONAL_COMBINED_DIST: as_Target
  COUNTIES ||--o{ YEARLY_PRECIP_DIST    : as_Target
  COUNTIES ||--o{ YEARLY_TEMP_DIST      : as_Target
  COUNTIES ||--o{ YEARLY_COMBINED_DIST  : as_Target

  COUNTIES ||--o{ MONTHLY_PRECIP_DIST   : as_Analog
  COUNTIES ||--o{ MONTHLY_TEMP_DIST     : as_Analog
  COUNTIES ||--o{ MONTHLY_COMBINED_DIST : as_Analog
  COUNTIES ||--o{ SEASONAL_PRECIP_DIST  : as_Analog
  COUNTIES ||--o{ SEASONAL_TEMP_DIST    : as_Analog
  COUNTIES ||--o{ SEASONAL_COMBINED_DIST: as_Analog
  COUNTIES ||--o{ YEARLY_PRECIP_DIST    : as_Analog
  COUNTIES ||--o{ YEARLY_TEMP_DIST      : as_Analog
  COUNTIES ||--o{ YEARLY_COMBINED_DIST  : as_Analog
Loading
  • DB Engine: MySQL
  • Primary Entities: Counties (with lat/long), States
  • Facts: Precipitation & Temperature for Monthly, Seasonal, and Yearly time scales (TargetState-focused final tables).
  • Norms: Means & std dev computed over 1991–2020 for monthly/seasonal/yearly, per county.
  • Distances: Euclidean distances computed for precip, temp, and combined across monthly/seasonal/yearly.
  • TEMP Tables: Used to ingest and transform before copying into final fact tables (keeps ingestion idempotent).

Data Flow (Ingest → Norms → Distances → Queries)

  1. Ingest NOAA files via /addallcountydata:

    • Parse fixed-width lines.
    • Look up CountyID from States/Counties.
    • Write to TargetStateCounty*_*_TEMP tables (monthly/seasonal/yearly + precip/temp).
  2. Compute Norms (1991–2020):

    • Accumulate totals & std dev for monthly, seasonal, yearly windows.
    • Insert into *_norms tables.
  3. Copy TEMP → Final:

    • TargetStateCountyMonthlyPrecip_TEMPmonthly_precipitation_data_TargetState
    • TargetStateCountyMonthlyTemp_TEMPmonthly_temperature_data_TargetState
    • (same for seasonal/yearly)
  4. Compute Distances:

    • Fill *_distances_TEMP then write/merge into *_distances (and *_combined_distances).
  5. Query Layer:

    • Stored procedures fetch top analogs by time-scale and year/season/month.

Table Groups

Reference Tables

  • States
    Canonical list of states.
    Used to resolve state codes during ingest.

  • Counties
    Canonical list of counties (CountyID, CountyName, StateCode, Latitude, Longitude).
    Used to resolve CountyID and provide spatial context.

  • PhysicalDistances (optional / auxiliary)
    If present, stores geodesic/physical distances between counties.


TEMP Staging Tables (Write-Once Per Run)

Used during ingestion to stage values that are eventually copied into final fact tables. These are dropped & recreated each run.

  • TargetStateCountyMonthlyPrecip_TEMP
  • TargetStateCountyMonthlyTemp_TEMP
  • TargetStateCountySeasonalPrecip_TEMP
  • TargetStateCountySeasonalTemp_TEMP
  • TargetStateCountyYearlyPrecip_TEMP
  • TargetStateCountyYearlyTemp_TEMP

Why TEMP?

  • Keeps partial runs isolated
  • Allows “insert ignore / replace” semantics safely
  • Enables efficient bulk copy into final fact tables

Final Fact Tables (TargetState data)

Hold the canonical time-series after TEMP copy:

  • Monthly

    • monthly_precipitation_data_TargetState
      • (CountyID, Year, Month, Precipitation)
    • monthly_temperature_data_TargetState
      • (CountyID, Year, Month, Temperature)
  • Seasonal (meteorological: DJF=winter, MAM, JJA, SON)

    • seasonal_precipitation_data_TargetState
      • (CountyID, Year, Season, Precipitation)
    • seasonal_temperature_data_TargetState
      • (CountyID, Year, Season, Temperature)
  • Yearly

    • yearly_precipitation_data_TargetState
      • (CountyID, Year, Precipitation)
    • yearly_temperature_data_TargetState
      • (CountyID, Year, Temperature)

Season boundaries (meteorological):

  • winter: Dec (prev yr) + Jan + Feb
  • spring: Mar + Apr + May
  • summer: Jun + Jul + Aug
  • fall: Sep + Oct + Nov

Norms Tables (1991–2020)

Per-county mean/stddev for each time-scale (used to contextualize anomalies):

  • monthly_precipitation_norms
  • monthly_temperature_norms
  • seasonal_precipitation_norms
  • seasonal_temperature_norms
  • yearly_precipitation_norms
  • yearly_temperature_norms

Window: 1991–2020 inclusive, matching NOAA climate normals in code.
Values: mean and standard deviation by month/season/year.


Distance Tables (Euclidean)

Distances computed between a target county and analog counties for each time-scale.

  • Monthly

    • monthly_precipitation_distances
    • monthly_temperature_distances
    • monthly_combined_distances
    • TEMP variants during compute:
      • monthly_precipitation_distances_TEMP
      • monthly_temperature_distances_TEMP
  • Seasonal

    • seasonal_precipitation_distances
    • seasonal_temperature_distances
    • seasonal_combined_distances
    • TEMP variants:
      • seasonal_precipitation_distances_TEMP
      • seasonal_temperature_distances_TEMP
  • Yearly

    • yearly_precipitation_distances
    • yearly_temperature_distances
    • yearly_combined_distances
    • TEMP variants:
      • yearly_precipitation_distances_TEMP
      • yearly_temperature_distances_TEMP

Combined distances use both precip and temp (the exact formula is encapsulated in stored procedures used by the backend).


Stored Procedures

See full signatures and bodies in docs/PROCEDURES.md.
Below is a functional grouping that matches the code:

  • Inserts / Norms

    • InsertMonthlyPrecipitationTargetState, InsertMonthlyTemperatureTargetState
    • InsertSeasonalPrecipitationTargetState, InsertSeasonalTemperatureTargetState
    • InsertYearlyPrecipitationTargetState, InsertYearlyTemperatureTargetState
    • InsertMonthlyPrecipitationNorms, InsertMonthlyTemperatureNorms
    • InsertSeasonalPrecipitationNorms, InsertSeasonalTemperatureNorms
    • InsertYearlyPrecipitationNorms, InsertYearlyTemperatureNorms
  • Distance Calculations

    • CalculateMonthlyPrecipitationDistances
    • CalculateMonthlyTemperatureDistances
    • CalculateAllMonthlyCombinedDistances
    • CalculateSeasonalPrecipitationDistances
    • CalculateSeasonalTemperatureDistances
    • CalculateAllSeasonalCombinedDistances
    • CalculateYearlyPrecipitationDistances
    • CalculateYearlyTemperatureDistances
    • CalculateYearlyCombinedDistances
  • Lookup / Helpers

    • GetCountyIDByCodeAndState
    • InsertCounty, InsertState
  • Top Analogs (Query API)

    • By Year (all years / specific year)
      • GetAllTopPrecipAnalogsForCountyByYear
      • GetAllTopTempAnalogsForCountyByYear
      • GetAllTopCombinedAnalogsForCountyByYear
      • GetTopPrecipAnalogsForCountyByYear
      • GetTopTempAnalogsForCountyByYear
      • GetTopCombinedAnalogsForCountyByYear
    • By Season
      • GetAllTopPrecipAnalogsForCountyBySeason
      • GetAllTopTempAnalogsForCountyBySeason
      • GetAllTopCombinedAnalogsForCountyBySeason
      • GetPrecipAnalogsForCountyByYearAndSeason
      • GetTempAnalogsForCountyByYearAndSeason
      • GetCombinedAnalogsForCountyByYearAndSeason
    • By Month
      • GetAllTopPrecipAnalogsForCountyByMonth
      • GetAllTopTempAnalogsForCountyByMonth
      • GetAllTopCombinedAnalogsForCountyByMonth
      • GetPrecipAnalogsForCountyByYearAndMonth
      • GetTempAnalogsForCountyByYearAndMonth
      • GetCombinedAnalogsForCountyByYearAndMonth

Conventions

  • Months stored as zero-padded strings: '01''12'.
  • Seasons stored as lowercase strings: 'winter' | 'spring' | 'summer' | 'fall'.
  • Invalid NOAA values: -9.99 / -99.90 are ignored in aggregates.
  • TEMP tables are dropped and recreated during each ingest run.
  • Target State focus: Final fact tables are suffixed with _TargetState and store TargetState county time series.

Recreating Schema Locally

Use the dump (schema-only + routines) and restore:

Create dump (from remote):

mysqldump -u <user> -h <host> -P <port> -p \
  --no-data --routines --no-tablespaces \
  climate-change-app > docs/db-dump.sql

Restore into local DB (example):

# Create (if needed)
mysql -u root -p -e "CREATE DATABASE climate_change_app_local;"
# Import
mysql -u root -p climate_change_app_local < docs/db-dump.sql

If you share this repo, include docs/db-dump.sql so others can bootstrap easily.


Operational Notes

Idempotent Ingest: Ingestion uses TEMP tables + INSERT IGNORE/REPLACE to avoid duplicate final writes. Latest Month Guard: Code checks the latest inserted (Year, Month) to skip old data. Cron Updates: src/cron/syncData.js checks NOAA for newer files and hits /addallcountydata if needed. Backups: Regularly export schema (and optionally data) for disaster recovery. Performance: Distance computations can be heavy — they are batched by time-scale and use TEMP tables internally.


See Also

TABLES.md — detailed table columns with types & keys

PROCEDURES.md — stored procedure signatures & bodies