Problem
build_pairs_row() in packages/core/src/mostlyright/_internal/_pairs.py incorrectly classifies Open-Meteo (OM) forecast records as IEM MOS records when the OM records carry a derived issued_at field.
The current code splits the forecast list using issued_at presence as the sole discriminator:
# Line ~264-265
iem_records = [r for r in forecasts if r.get("issued_at")]
om_records = [r for r in forecasts if not r.get("issued_at")]
This logic was correct when Open-Meteo records truly had no issued_at, but Phase 20+ OM records now carry a derived issued_at (e.g. to support cycle-math in the Open-Meteo fetcher). As a result, any OM record with a populated issued_at is routed into iem_records, passed to _select_best_run() and _aggregate_fcst_temps_iem(), and treated as IEM MOS data.
Reproduction
- Fetch forecasts for a station where both IEM MOS and Open-Meteo data are available (e.g.
forecast_sources=["iem_mos", "open_meteo"]).
- Ensure the Open-Meteo records include a derived
issued_at (this is the default behavior in Phase 20+ for training mode / previous-runs caching).
- Call
build_pairs_row() with the combined forecast list.
- Observe that OM records with
issued_at are missing from om_records and incorrectly appear in iem_records.
Minimal conceptual trigger:
forecasts = [
{"source": "iem.archive", "model": "NBS", "issued_at": "2026-06-04T12:00:00Z", "valid_at": "...", "temperature_f": 72},
{"source": "open_meteo.previous_runs", "model": "ecmwf_ifs04", "issued_at": "2026-06-04T06:00:00Z", "valid_at": "...", "temperature_c": 22},
]
# iem_records = both rows; om_records = []
# The OM row gets fed to _aggregate_fcst_temps_iem() which looks for temperature_f (None)
Root Cause
The discriminator assumes:
issued_at present → IEM MOS
issued_at absent → Open-Meteo
This assumption is violated by Phase 20+ Open-Meteo records, which have source values such as open_meteo.previous_runs, open_meteo.single_run, open_meteo.live, or open_meteo.seamless, and carry a derived issued_at.
The correct discriminator is the source field, which is authoritative per the schema definitions in packages/core/src/mostlyright/core/schemas/forecast.py and packages/core/src/mostlyright/_internal/specs/forecast_series.json.
Suggested Fix
Replace the issued_at-based split with a source-based split:
# Before
iem_records = [r for r in forecasts if r.get("issued_at")]
om_records = [r for r in forecasts if not r.get("issued_at")]
# After
iem_records = [r for r in forecasts if not r.get("source", "").startswith("open_meteo")]
om_records = [r for r in forecasts if r.get("source", "").startswith("open_meteo")]
This aligns with the schema contract where IEM records have source="iem.archive" and OM records have source values prefixed with open_meteo.
Impact
- Data misclassification: Open-Meteo forecasts with derived
issued_at are processed through IEM MOS aggregation paths, causing temperature_f lookups to fail (OM stores temperature_c), resulting in silently null forecast temperatures.
- Training data quality: If IEM MOS is the preferred source, the fallback to OM is skipped entirely when OM records are misclassified as IEM, potentially yielding null forecasts for dates where valid OM data exists.
- Affects Phase 20+ workflows that combine or cache both forecast sources.
Files Affected
packages/core/src/mostlyright/_internal/_pairs.py — build_pairs_row() function
Severity
Medium-High — silently corrupts forecast data for multi-source callers in Phase 20+.
Problem
build_pairs_row()inpackages/core/src/mostlyright/_internal/_pairs.pyincorrectly classifies Open-Meteo (OM) forecast records as IEM MOS records when the OM records carry a derivedissued_atfield.The current code splits the forecast list using
issued_atpresence as the sole discriminator:This logic was correct when Open-Meteo records truly had no
issued_at, but Phase 20+ OM records now carry a derivedissued_at(e.g. to support cycle-math in the Open-Meteo fetcher). As a result, any OM record with a populatedissued_atis routed intoiem_records, passed to_select_best_run()and_aggregate_fcst_temps_iem(), and treated as IEM MOS data.Reproduction
forecast_sources=["iem_mos", "open_meteo"]).issued_at(this is the default behavior in Phase 20+ for training mode / previous-runs caching).build_pairs_row()with the combined forecast list.issued_atare missing fromom_recordsand incorrectly appear iniem_records.Minimal conceptual trigger:
Root Cause
The discriminator assumes:
issued_atpresent → IEM MOSissued_atabsent → Open-MeteoThis assumption is violated by Phase 20+ Open-Meteo records, which have
sourcevalues such asopen_meteo.previous_runs,open_meteo.single_run,open_meteo.live, oropen_meteo.seamless, and carry a derivedissued_at.The correct discriminator is the
sourcefield, which is authoritative per the schema definitions inpackages/core/src/mostlyright/core/schemas/forecast.pyandpackages/core/src/mostlyright/_internal/specs/forecast_series.json.Suggested Fix
Replace the
issued_at-based split with asource-based split:This aligns with the schema contract where IEM records have
source="iem.archive"and OM records havesourcevalues prefixed withopen_meteo.Impact
issued_atare processed through IEM MOS aggregation paths, causingtemperature_flookups to fail (OM storestemperature_c), resulting in silently null forecast temperatures.Files Affected
packages/core/src/mostlyright/_internal/_pairs.py—build_pairs_row()functionSeverity
Medium-High — silently corrupts forecast data for multi-source callers in Phase 20+.