fix(pairs): discriminate OM/IEM forecasts by source, not issued_at (#67) by helloiamvu · Pull Request #69 · mostlyrightmd/mostlyright-sdk

helloiamvu · 2026-06-06T13:02:15Z

Summary

Fixes #67. build_pairs_row() split forecast records by issued_at presence, but Phase 20+ Open-Meteo rows carry a derived issued_at, so they were misrouted into the IEM MOS aggregation path — silently nulling forecast temps and polluting IEM run-selection when both sources are combined.

Discriminate by the authoritative source field instead (open_meteo* → Open-Meteo, else IEM), matching the contract research._fetch_open_meteo_range already documents ("discriminates via row.get('source')").

Changes

_is_open_meteo_record(): source prefixed open_meteo OR legacy no-source/no-issued_at shape (real IEM always carries issued_at, so a record missing both can only be legacy OM — keeps backward-compat).
OM aggregation falls back to a pre-converted temperature_f (the shape the research wrapper emits) so source-routed rows don't aggregate to null.
OM branch now preserves pop_6hr_pct / qpf_6hr_in and fcst_issued_at provenance that previously came via the IEM branch.
Leakage guard: OM rows from a run issued after market close are excluded from temp/POP/QPF aggregation (not just the timestamp) — mirrors _select_best_run's cutoff. The SDK's temporal-safety invariant.

Tests (TDD)

8 new regression tests in test_pairs.py covering: derived-issued_at classification, IEM-still-preferred, research-wrapper temperature_f shape, pop/qpf survival, zero-POP, legacy source-less shape, issued_at provenance, and after-close leakage exclusion. Parity gate (tests/test_parity.py) unchanged.

Review

Codex (gpt-5.5, medium) — 4 iterations: 1 P1 (leakage) + 3 P2 (precip regression, legacy shape, provenance) all fixed; final review clean.

TS Parity

Python-internal forecast-join discriminator fix; no public API surface change (same research() columns). The TS twin's buildPairsRow equivalent should get a parity ticket per CROSS-SDK-SYNC.md to apply the same source-based split — tracked separately.

Phase 20+ Open-Meteo rows carry a derived issued_at, so the old issued_at-presence split in build_pairs_row() misrouted them into the IEM MOS aggregation path — silently nulling forecast temps and polluting IEM run-selection when both sources are combined. Discriminate by the authoritative source field instead (open_meteo* -> Open-Meteo, else IEM), matching the contract research._fetch_open_meteo_range already documents. Also teach _aggregate_fcst_temps_openmeteo to fall back to a pre-converted temperature_f so source-discriminated rows from the research() wrapper (which emits temperature_f, not temperature_c) aggregate correctly instead of returning null. TDD: 3 new regression tests in test_pairs.py cover the derived-issued_at classification, IEM-still-preferred, and the research-wrapper temperature_f shape.

… split (#67)

Codex review caught a regression: research._fetch_open_meteo_range emits OM rows carrying pop_6hr_pct / qpf_6hr_in (not precipitation_probability_pct and no QPF read). Pre-#67 those rows flowed through the IEM branch, which set both fields; after source-routing they hit the OM branch, which only read precipitation_probability_pct and never set QPF — silently nulling precip columns for research(forecast_source="open_meteo"). OM branch now accepts the pop_6hr_pct alias (explicit None-checks keep a valid 0.0) and sums qpf_6hr_in over the window, matching IEM semantics. +2 regression tests.

…odex P2) Codex flagged that the pure source-prefix split regressed the previously documented Open-Meteo shape (no source, no issued_at, temperature_c) to the IEM branch -> null temps. Extract _is_open_meteo_record(): source prefixed open_meteo OR (no source AND no issued_at). Real IEM rows always carry an issued_at, so a record missing both can only be legacy OM. +1 regression test.

… codex P2) Codex flagged leakage-safety provenance loss: Phase 20+ OM rows carry a derived issued_at; pre-#67 the IEM branch set fcst_issued via _select_best_run, but after source routing the OM branch left fcst_issued_at None for forecast_source="open_meteo". Restore it as the most-recent OM issued_at at-or-before market close — never leaking a run issued after settlement. +2 regression tests.

…P1 leakage) Codex P1: my provenance fix filtered issued_at only for the displayed timestamp; the OM aggregation still summed window rows issued AFTER market close, leaking their temp/POP/QPF into the training pair (lookahead). Pre-#67 _select_best_run applied this cutoff; the OM branch did not. Now filter window_om by issued_at <= market_close before aggregating temps/POP/QPF/ issued_at. Legacy source-less rows (no issued_at) are kept — nothing to leak. Strengthened test asserts the after-close temp is excluded, not just hidden.

github-actions · 2026-06-06T13:02:33Z

Parity ticket gate: PASSED

parity-ticket-check: PR does not touch parity-trigger surface; gate skipped.

See CROSS-SDK-SYNC.md §2 for the workflow.

…h() docs (#52) Bump all three packages 1.5.2 -> 1.5.3. - #67 (#69): build_pairs_row() now discriminates OM/IEM forecasts by source instead of issued_at presence, fixing silent null temps + IEM run-selection pollution for Phase 20+ multi-source callers; closes a lookahead-leakage path for after-close OM runs. Parity gate unaffected. - #52 (#70): research() docstring clarifies daily-row return granularity.

helloiamvu added 6 commits June 6, 2026 14:37

docs(pairs): update module docstring to reflect source-based forecast…

9ca5b48

… split (#67)

helloiamvu requested a review from Tarabcak June 6, 2026 13:02

helloiamvu merged commit 237c437 into merged-vision Jun 6, 2026
17 checks passed

This was referenced Jun 6, 2026

release(v1.5.3): Open-Meteo forecast-join correctness (#67) + research() docs (#52) #71

Merged

bug: build_pairs_row() misclassifies OM records as IEM when derived issued_at is present #67

Closed

helloiamvu deleted the fix/issue-67-om-source-discriminator branch June 18, 2026 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pairs): discriminate OM/IEM forecasts by source, not issued_at (#67)#69

fix(pairs): discriminate OM/IEM forecasts by source, not issued_at (#67)#69
helloiamvu merged 6 commits into
merged-visionfrom
fix/issue-67-om-source-discriminator

helloiamvu commented Jun 6, 2026

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

helloiamvu commented Jun 6, 2026

Summary

Changes

Tests (TDD)

Review

TS Parity

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant