Problem
The Open-Meteo forecast path (research(..., forecast_source="open_meteo") and the standalone fetch_open_meteo()) is shaped in a way that makes users hit Open-Meteo's free-tier rate limits quickly at backtesting scale. Three compounding issues, the first being dominant.
Open-Meteo's free tier is 600 calls/min, 5,000/hr, 10,000/day, and critically it bills by weighted call cost, not request count: a request counts as more than one call when it exceeds 10 variables or 14 days, and the two multiply — weight ≈ max(vars/10, 1) × max(days/14, 1) × locations (Open-Meteo pricing; their examples: 15 vars × 14 days = 1.5 calls, 15 vars × 28 days = 3.0 calls).
Every call we make requests 18 hourly variables (_OM_VARIABLES_TO_FETCH, _open_meteo.py:75) over the full from_date..to_date with no chunking (research.py:1404). So the weighted cost of a single research() forecast call is 1.8 × max(days/14, 1):
| Window |
Weighted cost of 1 call |
Calls until 600/min ceiling |
| 7–14 days |
1.8 |
~330 |
| 30 days |
~3.9 |
~150 |
| 90 days |
~11.6 |
~50 |
| 1 year |
~47 |
~13 |
| 2 years |
~94 |
~6 |
1. The forecast cache exists but is never wired in — every run re-fetches (dominant amplifier)
read_forecast_cache / write_forecast_cache / forecast_cache_path (cache.py:542, :571) were built in Phase 20 (OM-06) but are referenced only in test_cache_forecasts.py — there is no production caller. _fetch_open_meteo_range (research.py:1384) calls the network directly with no cache read/write.
Previous-runs / single-runs / seamless data is immutable (historical forecast cycles never change), yet a quant iterating on a model re-fetches identical data on every run. This turns one legitimate fetch into 5–50.
2. The politeness throttle counts requests, not weight — false safety
_OM_POLITE_DELAY_S = 0.2 (_open_meteo.py:63) caps a single worker at ~300 req/min, nominally under 600. But because each request is weighted, a 1-year window weighs ~47, so the 600/min budget is exhausted ~13 stations into a loop — roughly 2.6 seconds in — and the 0.2s sleep does nothing to prevent it. The delay also lives inside fetch_open_meteo, so a user threading their own station loop loses even count-based bounding.
3. No client-side chunking + 18-variable over-fetch
The fetcher's own docstring warns "14-day Open-Meteo per-call cap; longer windows must chunk client-side" (_open_meteo.py:32), but the caller chunks nothing — long windows become single unbounded-weight calls. And the research() pairs join only consumes temp / precip-probability / precip, yet we always request 18 variables — paying ~1.8× weight on data that is then discarded.
Secondary: 429 backoff is shallow/linear (max(Retry-After, 0.2×(attempt+1)), 3 retries → ~1.2s total absent a Retry-After header; _open_meteo.py:553). It honors Retry-After (good) but gives up fast otherwise. No apikey / base-URL plumbing, so a user who needs headroom can't move to a paid tier — and the free tier is non-commercial-only, a ToS flag for Kalshi traders.
Worst case
Backtesting 1 model × 60 US stations over a 1-year window = 60 × 47 ≈ 2,800 weighted calls per run. With no caching, iterating that backtest just 4 times exhausts the 10,000/day budget. With a fast loop the 600/min ceiling trips after ~13 stations (~2.6s), well before the politeness delay is relevant — the per-minute lockout, not the daily budget, is what users will hit first.
Reproduction (conceptual)
import mostlyright as mr
# 60-station, 1-year backtest, single model
stations = [...] # 60 ICAO codes
for s in stations:
df = mr.research(s, "2025-01-01", "2025-12-31",
forecast_source="open_meteo", forecast_model="gfs_global")
# 429s from *-api.open-meteo.com begin ~13 stations in;
# re-running the loop re-fetches everything (no forecast cache).
Suggested fix (by impact)
- Wire the forecast cache into
_fetch_open_meteo_range — highest impact, lowest risk; the read/write/path functions already exist and are tested. Cache previous_runs / single_run / seamless; never cache live (rolling cycle, already flagged in write_forecast_cache).
- Throttle by weight, not request count — estimate
max(vars/10,1) × max(days/14,1) per call and pace against the 600/min budget, and/or chunk windows to ≤14 days so per-call weight stays ~1.8.
- Trim variables on the
research() path to what the pairs join uses (~3); keep the full 18 only for the standalone fetch_open_meteo() DataFrame API, ideally behind a variables= param.
- Deeper 429 backoff (exponential 1→2→4→8s) + optional
apikey / base-URL override for paid/commercial tiers.
Parity note (TS twin)
Only the Python side was audited. The TS twin (weather-ts) almost certainly mirrors items 1–3 and should get a parity ticket or be fixed in the same phase (cf. the IEM-MOS perf parity in #57/#58).
Related
Filed from a source audit of the Open-Meteo fetch path (mostlyright-sdk @ v1.5.2, 16d62de).
Problem
The Open-Meteo forecast path (
research(..., forecast_source="open_meteo")and the standalonefetch_open_meteo()) is shaped in a way that makes users hit Open-Meteo's free-tier rate limits quickly at backtesting scale. Three compounding issues, the first being dominant.Open-Meteo's free tier is 600 calls/min, 5,000/hr, 10,000/day, and critically it bills by weighted call cost, not request count: a request counts as more than one call when it exceeds 10 variables or 14 days, and the two multiply —
weight ≈ max(vars/10, 1) × max(days/14, 1) × locations(Open-Meteo pricing; their examples: 15 vars × 14 days = 1.5 calls, 15 vars × 28 days = 3.0 calls).Every call we make requests 18 hourly variables (
_OM_VARIABLES_TO_FETCH,_open_meteo.py:75) over the fullfrom_date..to_datewith no chunking (research.py:1404). So the weighted cost of a singleresearch()forecast call is1.8 × max(days/14, 1):1. The forecast cache exists but is never wired in — every run re-fetches (dominant amplifier)
read_forecast_cache/write_forecast_cache/forecast_cache_path(cache.py:542,:571) were built in Phase 20 (OM-06) but are referenced only intest_cache_forecasts.py— there is no production caller._fetch_open_meteo_range(research.py:1384) calls the network directly with no cache read/write.Previous-runs / single-runs / seamless data is immutable (historical forecast cycles never change), yet a quant iterating on a model re-fetches identical data on every run. This turns one legitimate fetch into 5–50.
2. The politeness throttle counts requests, not weight — false safety
_OM_POLITE_DELAY_S = 0.2(_open_meteo.py:63) caps a single worker at ~300 req/min, nominally under 600. But because each request is weighted, a 1-year window weighs ~47, so the 600/min budget is exhausted ~13 stations into a loop — roughly 2.6 seconds in — and the 0.2s sleep does nothing to prevent it. The delay also lives insidefetch_open_meteo, so a user threading their own station loop loses even count-based bounding.3. No client-side chunking + 18-variable over-fetch
The fetcher's own docstring warns "14-day Open-Meteo per-call cap; longer windows must chunk client-side" (
_open_meteo.py:32), but the caller chunks nothing — long windows become single unbounded-weight calls. And theresearch()pairs join only consumes temp / precip-probability / precip, yet we always request 18 variables — paying ~1.8× weight on data that is then discarded.Secondary: 429 backoff is shallow/linear (
max(Retry-After, 0.2×(attempt+1)), 3 retries → ~1.2s total absent aRetry-Afterheader;_open_meteo.py:553). It honorsRetry-After(good) but gives up fast otherwise. Noapikey/ base-URL plumbing, so a user who needs headroom can't move to a paid tier — and the free tier is non-commercial-only, a ToS flag for Kalshi traders.Worst case
Backtesting 1 model × 60 US stations over a 1-year window =
60 × 47 ≈ 2,800weighted calls per run. With no caching, iterating that backtest just 4 times exhausts the 10,000/day budget. With a fast loop the 600/min ceiling trips after ~13 stations (~2.6s), well before the politeness delay is relevant — the per-minute lockout, not the daily budget, is what users will hit first.Reproduction (conceptual)
Suggested fix (by impact)
_fetch_open_meteo_range— highest impact, lowest risk; the read/write/path functions already exist and are tested. Cacheprevious_runs/single_run/seamless; never cachelive(rolling cycle, already flagged inwrite_forecast_cache).max(vars/10,1) × max(days/14,1)per call and pace against the 600/min budget, and/or chunk windows to ≤14 days so per-call weight stays ~1.8.research()path to what the pairs join uses (~3); keep the full 18 only for the standalonefetch_open_meteo()DataFrame API, ideally behind avariables=param.apikey/ base-URL override for paid/commercial tiers.Parity note (TS twin)
Only the Python side was audited. The TS twin (
weather-ts) almost certainly mirrors items 1–3 and should get a parity ticket or be fixed in the same phase (cf. the IEM-MOS perf parity in #57/#58).Related
Filed from a source audit of the Open-Meteo fetch path (mostlyright-sdk @ v1.5.2,
16d62de).