Summary
A user asked whether forecast_nwp can expose five additional fields for HRRR/GFS, "if they're available in the source GRIBs":
cloud_cover_pct
visibility_m
ceiling_m
pressure_pa_mslp
pressure_pa_surface
Empirical answer:
pressure_pa_mslp and pressure_pa_surface already ship for HRRR + GFS today (see Already implemented). If the user is seeing NaN, that is almost certainly the GFS ambiguity bug below — not a missing field.
visibility_m and ceiling_m are cleanly available in both HRRR and GFS GRIBs — each maps to a unique (variable, level) .idx record. Low-risk to add.
cloud_cover_pct is available but blocked on GFS by an existing record-disambiguation limitation: on GFS the (TCDC, entire atmosphere) key returns two .idx records (instantaneous + time-averaged), which trips the GribIntegrityError ambiguity guard in _extract_records. HRRR is clean (single record).
- ⚠️ Bonus finding — pre-existing latent bug: the already-shipped
precip_mm_1h field hits the same ambiguity guard on GFS at the default fxx (two identical APCP:surface records). This strongly implies a default forecast_nwp(station, "gfs") call raises GribIntegrityError today. See Related latent bug.
Net: the feature is feasible. The real work is deterministic record disambiguation when (variable, level) is not unique — which simultaneously unblocks GFS cloud_cover_pct and fixes the existing GFS precip defect.
How this was investigated (methodology)
All findings are from real .idx files fetched from the AWS BDP mirrors the SDK actually uses, decoded with the SDK's own parser (parse_idx + compute_byte_end + filter_records) and the production per-model VARIABLE_MAPs. Sample cycle: 2026-06-01 12Z.
- HRRR:
https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20260601/conus/hrrr.t12z.wrfsfcf01.grib2.idx (170 records)
- GFS:
https://noaa-gfs-bdp-pds.s3.amazonaws.com/gfs.20260601/12/atmos/gfs.t12z.pgrb2.0p25.f001.idx (+ f000, f006 cross-checks)
Not verified locally: end-to-end cfgrib decode of the three new records — the [nwp] extra (cfgrib/xarray/sklearn) is not installed in the investigation environment. The .idx availability and the disambiguation logic are fully verified; the cfgrib short-name decode is the one remaining empirical gap (see Open questions).
Empirical availability — real .idx records
GRIB mapping is {canonical_column: (idx_variable, idx_level)}, matched exactly by filter_records. Record numbers are from the 2026-06-01 12Z cycle.
| Requested column |
(variable, level) |
HRRR sfcf01 |
GFS f001 (default fxx=1) |
Native units |
cloud_cover_pct |
(TCDC, entire atmosphere) |
✅ 1 rec (#116) |
⚠️ 2 recs (#636 1 hour fcst + #637 0-1 hour ave fcst) |
percent |
visibility_m |
(VIS, surface) |
✅ 1 rec (#5) |
✅ 1 rec (#10) |
metres |
ceiling_m |
(HGT, cloud ceiling) |
✅ 1 rec (#117) |
✅ 1 rec (#638) |
gpm ≈ m |
pressure_pa_surface |
(PRES, surface) |
✅ already mapped (#62) |
✅ already mapped (#561) |
Pa |
pressure_pa_mslp |
(MSLMA / PRMSL, mean sea level) |
✅ already mapped (#41) |
✅ already mapped (#1) |
Pa |
Replaying the production grouping + ambiguity guard against the real .idx (SDK's own code):
===== HRRR sfcf01 (model=hrrr) =====
cloud_cover_pct ('TCDC', 'entire atmosphere') -> 1 rec ['1 hour fcst']
visibility_m ('VIS', 'surface') -> 1 rec ['1 hour fcst']
ceiling_m ('HGT', 'cloud ceiling') -> 1 rec ['1 hour fcst']
===== GFS f001 (DEFAULT fxx=1) (model=gfs) =====
precip_mm_1h ('APCP', 'surface') -> 2 rec ['0-1 hour acc fcst','0-1 hour acc fcst'] <-- AMBIGUOUS -> GribIntegrityError
cloud_cover_pct ('TCDC', 'entire atmosphere') -> 2 rec ['1 hour fcst','0-1 hour ave fcst'] <-- AMBIGUOUS -> GribIntegrityError
visibility_m ('VIS', 'surface') -> 1 rec ['1 hour fcst']
ceiling_m ('HGT', 'cloud ceiling') -> 1 rec ['1 hour fcst']
===== GFS f000 (analysis) (model=gfs) =====
...all keys -> 1 rec (APCP absent at analysis hour) # masks the bug when fxx=0
HRRR (TCDC, entire atmosphere) is genuinely a single "total cloud cover" message; GFS publishes both an instantaneous and a window-averaged TCDC at the same level. (HGT, cloud ceiling) is distinct from HGT:cloud base / HGT:cloud top on both models, so exact-level matching already isolates the ceiling cleanly.
Already implemented (2 of 5)
pressure_pa_surface and pressure_pa_mslp are already in the schema and both model maps — no work needed:
Action: confirm with the user they're on a build that includes these, and that any NaN they observe on GFS isn't the precip/cloud ambiguity bug aborting the whole fetch.
⚠️ Related latent bug — GFS precip ambiguity (pre-existing)
precip_mm_1h → (APCP, surface) returns two records on a real GFS f001 .idx (#596 and #597, both 0-1 hour acc fcst — a well-known GFS quirk where APCP is emitted twice). filter_records keeps both (distinct record_no), then the _extract_records guard raises GribIntegrityError.
Because fxx defaults to 1 for GFS (forecast_nwp.py#L651), this is the default path. It's invisible today because:
- Live tests are skipped in CI (per the testing playbook), and
- The only "ambiguous" unit test (
test_forecast_nwp.py:417) exercises _cfgrib_variable_name, not the duplicate-.idx-record grouping path.
The cloud-cover disambiguation work below fixes this in the same stroke. Recommend confirming with one live GFS call and splitting into its own bug if the team prefers a separate fix track.
Proposed implementation
1. The core decision: disambiguate non-unique (variable, level) records
This is the only non-mechanical part. Three options:
- Option A (recommended): prefer the instantaneous record, then lowest
record_no. Among records sharing (variable, level), drop window-aggregated periods (forecast_period matching ave/acc/max/min window patterns) in favor of the plain N hour fcst / anl record; break remaining ties by lowest record_no. Fixes GFS cloud_cover_pct (picks #636 inst over #637 ave) and GFS precip_mm_1h (both windows identical → first by record_no = #596). Minimal map churn; keeps the loud-fail guard for genuinely unexpected duplicates.
- Option B: extend the map value to
(variable, level, forecast_period_matcher). Explicit and self-documenting, but verbose across 24 model modules and still needs a record_no tiebreak for APCP's identical twins.
- Option C: silently take the first
record_no on any collision. Simplest, but picks inst-vs-ave by luck of file order with no stated intent — least safe; rejected.
Whichever is chosen, preserve the existing GribIntegrityError for collisions that the rule cannot resolve, so true upstream layout changes still fail loud (per the RESEARCH "loud-fail" anti-pattern note).
2. Mechanical changes (per new field)
- Schema — add three nullable
float64 ColumnSpecs to NwpForecastSchema: cloud_cover_pct (units="percent"), visibility_m (units="m"), ceiling_m (units="m"). Additive + nullable → backward compatible within schema.forecast_nwp.v1. Regenerate exported JSON schema + EXPORT_MANIFEST and update test_schemas_codegen.py.
- Per-model maps — add the three entries to
hrrr.py and gfs.py:
cloud_cover_pct: ("TCDC", "entire atmosphere"), visibility_m: ("VIS", "surface"), ceiling_m: ("HGT", "cloud ceiling").
- cfgrib name table — add entries to
_GRIB_VAR_TO_CFGRIB_NAME (likely vis, tcc, and a gh/ceil for HGT-at-ceiling — confirm via decode). The single-message fallback at forecast_nwp.py#L1006 covers an unknown mapping, but explicit entries are preferred.
- Row assembly — add the three columns to
nullable_numeric_cols (forecast_nwp.py#L931) and to _empty_dataframe (forecast_nwp.py#L1035) so all-missing columns still round-trip as float64.
- QC rules — add predicates to
rules_nwp.py RULES_NWP_NCEP: cloud_cover_pct ∈ [0,100], visibility_m ≥ 0, ceiling_m ≥ 0. Mind the "no ceiling" encoding (below).
- Docs / CHANGELOG — update
docs/nwp-forecasts.md field table + CHANGELOG.
3. Tests (TDD — mandatory per CLAUDE.md)
- Unit: a
.idx fixture containing the real two-record GFS TCDC:entire atmosphere and APCP:surface cases; assert the disambiguation picks the instantaneous record and that forecast_nwp(..., "gfs") no longer raises.
- Unit: HRRR single-record path returns the value unchanged (no regression).
@pytest.mark.live: HRRR + GFS fetch of all three fields against a real cycle, asserting plausible physical ranges and confirming cfgrib decode.
Caveats to handle
ceiling_m "no ceiling" (clear sky). HGT:cloud ceiling carries a GRIB missing value when there is no ceiling. Confirm cfgrib yields NaN (preferred — and QC-safe) vs a sentinel like 20000/-5000 that would need masking. This is the most likely source of a surprising column.
visibility_m cap. Model VIS saturates (e.g. ~24135 m). Document; do not clip (native-units convention).
cloud_cover_pct semantics. TCDC:entire atmosphere is column-integrated total cloud — not the METAR/observation sky-cover the AWC/IEM adapters report. Document the difference so quants don't naively join them.
- NBM not verified. The sampled NBM
.idx path did not resolve this pass, and NBM is a statistical blend with a different inventory (its map already omits pressure_pa_surface). Confirm NBM availability separately before extending nbm.py; ECMWF/MSC/HAFS families are out of scope here.
Open questions / decisions
- Naming:
ceiling_m vs cloud_ceiling_m. The observation adapters already use cloud_ceiling_m (docs/adapters/iem.md:42). cloud_cover_pct and visibility_m already match canonical names in schema.forecast.station.v1 (Open-Meteo) and the observation adapters. Recommend cloud_ceiling_m for cross-source join consistency, unless we deliberately keep the user's requested ceiling_m. Needs a call.
- Disambiguation: Option A vs B (above).
- Scope of the GFS precip fix — bundle with this issue or split into its own bug?
- cfgrib short-names for the three records — confirm via one decode run before finalizing the name table.
Acceptance criteria
TS Parity (per CLAUDE.md dual-SDK rule)
This changes the public forecast_nwp column surface, so a TS-parity note is required:
- TS equivalent: identical public API — the TS SDK's NWP forecast result gains the same three columns with the same names/units. The GRIB byte-range +
.idx mechanics differ in implementation but the column contract is shared.
- Ship vs parity ticket: recommend a parity ticket tracked per
CROSS-SDK-SYNC.md rather than same-PR, since the disambiguation logic must be re-implemented and tested independently in TS.
- TS-specific constraints: none beyond existing NWP parity scope (no new deps; same schema id).
Investigated empirically against live AWS BDP .idx data (cycle 2026-06-01 12Z) using the SDK's own parser. The cfgrib end-to-end decode is the single unverified step (the [nwp] extra was not installed in the investigation environment).
Summary
A user asked whether
forecast_nwpcan expose five additional fields for HRRR/GFS, "if they're available in the source GRIBs":Empirical answer:
pressure_pa_mslpandpressure_pa_surfacealready ship for HRRR + GFS today (see Already implemented). If the user is seeingNaN, that is almost certainly the GFS ambiguity bug below — not a missing field.visibility_mandceiling_mare cleanly available in both HRRR and GFS GRIBs — each maps to a unique(variable, level).idxrecord. Low-risk to add.cloud_cover_pctis available but blocked on GFS by an existing record-disambiguation limitation: on GFS the(TCDC, entire atmosphere)key returns two.idxrecords (instantaneous + time-averaged), which trips theGribIntegrityErrorambiguity guard in_extract_records. HRRR is clean (single record).precip_mm_1hfield hits the same ambiguity guard on GFS at the defaultfxx(two identicalAPCP:surfacerecords). This strongly implies a defaultforecast_nwp(station, "gfs")call raisesGribIntegrityErrortoday. See Related latent bug.Net: the feature is feasible. The real work is deterministic record disambiguation when
(variable, level)is not unique — which simultaneously unblocks GFScloud_cover_pctand fixes the existing GFS precip defect.How this was investigated (methodology)
All findings are from real
.idxfiles fetched from the AWS BDP mirrors the SDK actually uses, decoded with the SDK's own parser (parse_idx+compute_byte_end+filter_records) and the production per-modelVARIABLE_MAPs. Sample cycle: 2026-06-01 12Z.https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20260601/conus/hrrr.t12z.wrfsfcf01.grib2.idx(170 records)https://noaa-gfs-bdp-pds.s3.amazonaws.com/gfs.20260601/12/atmos/gfs.t12z.pgrb2.0p25.f001.idx(+ f000, f006 cross-checks)Empirical availability — real
.idxrecordsGRIB mapping is
{canonical_column: (idx_variable, idx_level)}, matched exactly byfilter_records. Record numbers are from the 2026-06-01 12Z cycle.(variable, level)fxx=1)cloud_cover_pct(TCDC, entire atmosphere)1 hour fcst+ #6370-1 hour ave fcst)visibility_m(VIS, surface)ceiling_m(HGT, cloud ceiling)pressure_pa_surface(PRES, surface)pressure_pa_mslp(MSLMA / PRMSL, mean sea level)Replaying the production grouping + ambiguity guard against the real
.idx(SDK's own code):HRRR
(TCDC, entire atmosphere)is genuinely a single "total cloud cover" message; GFS publishes both an instantaneous and a window-averaged TCDC at the same level.(HGT, cloud ceiling)is distinct fromHGT:cloud base/HGT:cloud topon both models, so exact-level matching already isolates the ceiling cleanly.Already implemented (2 of 5)
pressure_pa_surfaceandpressure_pa_mslpare already in the schema and both model maps — no work needed:forecast_nwp.py#L166-L167hrrr.py#L30-L31—PRES:surface,MSLMA:mean sea levelgfs.py#L24-L25—PRES:surface,PRMSL:mean sea levelAction: confirm with the user they're on a build that includes these, and that any
NaNthey observe on GFS isn't the precip/cloud ambiguity bug aborting the whole fetch.precip_mm_1h → (APCP, surface)returns two records on a real GFSf001.idx(#596 and #597, both0-1 hour acc fcst— a well-known GFS quirk where APCP is emitted twice).filter_recordskeeps both (distinctrecord_no), then the_extract_recordsguard raisesGribIntegrityError.Because
fxxdefaults to1for GFS (forecast_nwp.py#L651), this is the default path. It's invisible today because:test_forecast_nwp.py:417) exercises_cfgrib_variable_name, not the duplicate-.idx-record grouping path.The cloud-cover disambiguation work below fixes this in the same stroke. Recommend confirming with one live GFS call and splitting into its own bug if the team prefers a separate fix track.
Proposed implementation
1. The core decision: disambiguate non-unique
(variable, level)recordsThis is the only non-mechanical part. Three options:
record_no. Among records sharing(variable, level), drop window-aggregated periods (forecast_periodmatchingave/acc/max/minwindow patterns) in favor of the plainN hour fcst/anlrecord; break remaining ties by lowestrecord_no. Fixes GFScloud_cover_pct(picks #636 inst over #637 ave) and GFSprecip_mm_1h(both windows identical → first byrecord_no= #596). Minimal map churn; keeps the loud-fail guard for genuinely unexpected duplicates.(variable, level, forecast_period_matcher). Explicit and self-documenting, but verbose across 24 model modules and still needs arecord_notiebreak for APCP's identical twins.record_noon any collision. Simplest, but picks inst-vs-ave by luck of file order with no stated intent — least safe; rejected.Whichever is chosen, preserve the existing
GribIntegrityErrorfor collisions that the rule cannot resolve, so true upstream layout changes still fail loud (per the RESEARCH "loud-fail" anti-pattern note).2. Mechanical changes (per new field)
float64ColumnSpecs toNwpForecastSchema:cloud_cover_pct(units="percent"),visibility_m(units="m"),ceiling_m(units="m"). Additive + nullable → backward compatible withinschema.forecast_nwp.v1. Regenerate exported JSON schema +EXPORT_MANIFESTand updatetest_schemas_codegen.py.hrrr.pyandgfs.py:cloud_cover_pct: ("TCDC", "entire atmosphere"),visibility_m: ("VIS", "surface"),ceiling_m: ("HGT", "cloud ceiling")._GRIB_VAR_TO_CFGRIB_NAME(likelyvis,tcc, and agh/ceilfor HGT-at-ceiling — confirm via decode). The single-message fallback atforecast_nwp.py#L1006covers an unknown mapping, but explicit entries are preferred.nullable_numeric_cols(forecast_nwp.py#L931) and to_empty_dataframe(forecast_nwp.py#L1035) so all-missing columns still round-trip asfloat64.rules_nwp.pyRULES_NWP_NCEP:cloud_cover_pct ∈ [0,100],visibility_m ≥ 0,ceiling_m ≥ 0. Mind the "no ceiling" encoding (below).docs/nwp-forecasts.mdfield table + CHANGELOG.3. Tests (TDD — mandatory per CLAUDE.md)
.idxfixture containing the real two-record GFSTCDC:entire atmosphereandAPCP:surfacecases; assert the disambiguation picks the instantaneous record and thatforecast_nwp(..., "gfs")no longer raises.@pytest.mark.live: HRRR + GFS fetch of all three fields against a real cycle, asserting plausible physical ranges and confirming cfgrib decode.Caveats to handle
ceiling_m"no ceiling" (clear sky).HGT:cloud ceilingcarries a GRIB missing value when there is no ceiling. Confirm cfgrib yieldsNaN(preferred — and QC-safe) vs a sentinel like20000/-5000that would need masking. This is the most likely source of a surprising column.visibility_mcap. Model VIS saturates (e.g. ~24135 m). Document; do not clip (native-units convention).cloud_cover_pctsemantics.TCDC:entire atmosphereis column-integrated total cloud — not the METAR/observation sky-cover the AWC/IEM adapters report. Document the difference so quants don't naively join them..idxpath did not resolve this pass, and NBM is a statistical blend with a different inventory (its map already omitspressure_pa_surface). Confirm NBM availability separately before extendingnbm.py; ECMWF/MSC/HAFS families are out of scope here.Open questions / decisions
ceiling_mvscloud_ceiling_m. The observation adapters already usecloud_ceiling_m(docs/adapters/iem.md:42).cloud_cover_pctandvisibility_malready match canonical names inschema.forecast.station.v1(Open-Meteo) and the observation adapters. Recommendcloud_ceiling_mfor cross-source join consistency, unless we deliberately keep the user's requestedceiling_m. Needs a call.Acceptance criteria
forecast_nwp(station, "hrrr")andforecast_nwp(station, "gfs")return non-nullcloud_cover_pct,visibility_m,ceiling_mfor an in-domain station at defaultfxx.GribIntegrityError(cloud and precip disambiguated).schema.forecast_nwp.v1; all-missing columns arefloat64.NaN/clean.TS Parity (per CLAUDE.md dual-SDK rule)
This changes the public
forecast_nwpcolumn surface, so a TS-parity note is required:.idxmechanics differ in implementation but the column contract is shared.CROSS-SDK-SYNC.mdrather than same-PR, since the disambiguation logic must be re-implemented and tested independently in TS.Investigated empirically against live AWS BDP
.idxdata (cycle 2026-06-01 12Z) using the SDK's own parser. The cfgrib end-to-end decode is the single unverified step (the[nwp]extra was not installed in the investigation environment).