Skip to content

feat(weather): expose cloud_cover_pct / visibility_m / ceiling_m in forecast_nwp (HRRR/GFS); pressure_pa_* already ship #63

@helloiamvu

Description

@helloiamvu

Summary

A user asked whether forecast_nwp can expose five additional fields for HRRR/GFS, "if they're available in the source GRIBs":

cloud_cover_pct
visibility_m
ceiling_m
pressure_pa_mslp
pressure_pa_surface

Empirical answer:

  • pressure_pa_mslp and pressure_pa_surface already ship for HRRR + GFS today (see Already implemented). If the user is seeing NaN, that is almost certainly the GFS ambiguity bug below — not a missing field.
  • visibility_m and ceiling_m are cleanly available in both HRRR and GFS GRIBs — each maps to a unique (variable, level) .idx record. Low-risk to add.
  • cloud_cover_pct is available but blocked on GFS by an existing record-disambiguation limitation: on GFS the (TCDC, entire atmosphere) key returns two .idx records (instantaneous + time-averaged), which trips the GribIntegrityError ambiguity guard in _extract_records. HRRR is clean (single record).
  • ⚠️ Bonus finding — pre-existing latent bug: the already-shipped precip_mm_1h field hits the same ambiguity guard on GFS at the default fxx (two identical APCP:surface records). This strongly implies a default forecast_nwp(station, "gfs") call raises GribIntegrityError today. See Related latent bug.

Net: the feature is feasible. The real work is deterministic record disambiguation when (variable, level) is not unique — which simultaneously unblocks GFS cloud_cover_pct and fixes the existing GFS precip defect.


How this was investigated (methodology)

All findings are from real .idx files fetched from the AWS BDP mirrors the SDK actually uses, decoded with the SDK's own parser (parse_idx + compute_byte_end + filter_records) and the production per-model VARIABLE_MAPs. Sample cycle: 2026-06-01 12Z.

  • HRRR: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20260601/conus/hrrr.t12z.wrfsfcf01.grib2.idx (170 records)
  • GFS: https://noaa-gfs-bdp-pds.s3.amazonaws.com/gfs.20260601/12/atmos/gfs.t12z.pgrb2.0p25.f001.idx (+ f000, f006 cross-checks)

Not verified locally: end-to-end cfgrib decode of the three new records — the [nwp] extra (cfgrib/xarray/sklearn) is not installed in the investigation environment. The .idx availability and the disambiguation logic are fully verified; the cfgrib short-name decode is the one remaining empirical gap (see Open questions).


Empirical availability — real .idx records

GRIB mapping is {canonical_column: (idx_variable, idx_level)}, matched exactly by filter_records. Record numbers are from the 2026-06-01 12Z cycle.

Requested column (variable, level) HRRR sfcf01 GFS f001 (default fxx=1) Native units
cloud_cover_pct (TCDC, entire atmosphere) ✅ 1 rec (#116) ⚠️ 2 recs (#636 1 hour fcst + #637 0-1 hour ave fcst) percent
visibility_m (VIS, surface) ✅ 1 rec (#5) ✅ 1 rec (#10) metres
ceiling_m (HGT, cloud ceiling) ✅ 1 rec (#117) ✅ 1 rec (#638) gpm ≈ m
pressure_pa_surface (PRES, surface) ✅ already mapped (#62) ✅ already mapped (#561) Pa
pressure_pa_mslp (MSLMA / PRMSL, mean sea level) ✅ already mapped (#41) ✅ already mapped (#1) Pa

Replaying the production grouping + ambiguity guard against the real .idx (SDK's own code):

===== HRRR sfcf01 (model=hrrr) =====
  cloud_cover_pct   ('TCDC', 'entire atmosphere')  -> 1 rec  ['1 hour fcst']
  visibility_m      ('VIS', 'surface')             -> 1 rec  ['1 hour fcst']
  ceiling_m         ('HGT', 'cloud ceiling')       -> 1 rec  ['1 hour fcst']

===== GFS f001 (DEFAULT fxx=1) (model=gfs) =====
  precip_mm_1h      ('APCP', 'surface')            -> 2 rec  ['0-1 hour acc fcst','0-1 hour acc fcst']  <-- AMBIGUOUS -> GribIntegrityError
  cloud_cover_pct   ('TCDC', 'entire atmosphere')  -> 2 rec  ['1 hour fcst','0-1 hour ave fcst']        <-- AMBIGUOUS -> GribIntegrityError
  visibility_m      ('VIS', 'surface')             -> 1 rec  ['1 hour fcst']
  ceiling_m         ('HGT', 'cloud ceiling')       -> 1 rec  ['1 hour fcst']

===== GFS f000 (analysis) (model=gfs) =====
  ...all keys -> 1 rec (APCP absent at analysis hour)   # masks the bug when fxx=0

HRRR (TCDC, entire atmosphere) is genuinely a single "total cloud cover" message; GFS publishes both an instantaneous and a window-averaged TCDC at the same level. (HGT, cloud ceiling) is distinct from HGT:cloud base / HGT:cloud top on both models, so exact-level matching already isolates the ceiling cleanly.

Already implemented (2 of 5)

pressure_pa_surface and pressure_pa_mslp are already in the schema and both model maps — no work needed:

Action: confirm with the user they're on a build that includes these, and that any NaN they observe on GFS isn't the precip/cloud ambiguity bug aborting the whole fetch.

⚠️ Related latent bug — GFS precip ambiguity (pre-existing)

precip_mm_1h → (APCP, surface) returns two records on a real GFS f001 .idx (#596 and #597, both 0-1 hour acc fcst — a well-known GFS quirk where APCP is emitted twice). filter_records keeps both (distinct record_no), then the _extract_records guard raises GribIntegrityError.

Because fxx defaults to 1 for GFS (forecast_nwp.py#L651), this is the default path. It's invisible today because:

  • Live tests are skipped in CI (per the testing playbook), and
  • The only "ambiguous" unit test (test_forecast_nwp.py:417) exercises _cfgrib_variable_name, not the duplicate-.idx-record grouping path.

The cloud-cover disambiguation work below fixes this in the same stroke. Recommend confirming with one live GFS call and splitting into its own bug if the team prefers a separate fix track.


Proposed implementation

1. The core decision: disambiguate non-unique (variable, level) records

This is the only non-mechanical part. Three options:

  • Option A (recommended): prefer the instantaneous record, then lowest record_no. Among records sharing (variable, level), drop window-aggregated periods (forecast_period matching ave/acc/max/min window patterns) in favor of the plain N hour fcst / anl record; break remaining ties by lowest record_no. Fixes GFS cloud_cover_pct (picks #636 inst over #637 ave) and GFS precip_mm_1h (both windows identical → first by record_no = #596). Minimal map churn; keeps the loud-fail guard for genuinely unexpected duplicates.
  • Option B: extend the map value to (variable, level, forecast_period_matcher). Explicit and self-documenting, but verbose across 24 model modules and still needs a record_no tiebreak for APCP's identical twins.
  • Option C: silently take the first record_no on any collision. Simplest, but picks inst-vs-ave by luck of file order with no stated intent — least safe; rejected.

Whichever is chosen, preserve the existing GribIntegrityError for collisions that the rule cannot resolve, so true upstream layout changes still fail loud (per the RESEARCH "loud-fail" anti-pattern note).

2. Mechanical changes (per new field)

  1. Schema — add three nullable float64 ColumnSpecs to NwpForecastSchema: cloud_cover_pct (units="percent"), visibility_m (units="m"), ceiling_m (units="m"). Additive + nullable → backward compatible within schema.forecast_nwp.v1. Regenerate exported JSON schema + EXPORT_MANIFEST and update test_schemas_codegen.py.
  2. Per-model maps — add the three entries to hrrr.py and gfs.py:
    cloud_cover_pct: ("TCDC", "entire atmosphere"), visibility_m: ("VIS", "surface"), ceiling_m: ("HGT", "cloud ceiling").
  3. cfgrib name table — add entries to _GRIB_VAR_TO_CFGRIB_NAME (likely vis, tcc, and a gh/ceil for HGT-at-ceiling — confirm via decode). The single-message fallback at forecast_nwp.py#L1006 covers an unknown mapping, but explicit entries are preferred.
  4. Row assembly — add the three columns to nullable_numeric_cols (forecast_nwp.py#L931) and to _empty_dataframe (forecast_nwp.py#L1035) so all-missing columns still round-trip as float64.
  5. QC rules — add predicates to rules_nwp.py RULES_NWP_NCEP: cloud_cover_pct ∈ [0,100], visibility_m ≥ 0, ceiling_m ≥ 0. Mind the "no ceiling" encoding (below).
  6. Docs / CHANGELOG — update docs/nwp-forecasts.md field table + CHANGELOG.

3. Tests (TDD — mandatory per CLAUDE.md)

  • Unit: a .idx fixture containing the real two-record GFS TCDC:entire atmosphere and APCP:surface cases; assert the disambiguation picks the instantaneous record and that forecast_nwp(..., "gfs") no longer raises.
  • Unit: HRRR single-record path returns the value unchanged (no regression).
  • @pytest.mark.live: HRRR + GFS fetch of all three fields against a real cycle, asserting plausible physical ranges and confirming cfgrib decode.

Caveats to handle

  • ceiling_m "no ceiling" (clear sky). HGT:cloud ceiling carries a GRIB missing value when there is no ceiling. Confirm cfgrib yields NaN (preferred — and QC-safe) vs a sentinel like 20000/-5000 that would need masking. This is the most likely source of a surprising column.
  • visibility_m cap. Model VIS saturates (e.g. ~24135 m). Document; do not clip (native-units convention).
  • cloud_cover_pct semantics. TCDC:entire atmosphere is column-integrated total cloud — not the METAR/observation sky-cover the AWC/IEM adapters report. Document the difference so quants don't naively join them.
  • NBM not verified. The sampled NBM .idx path did not resolve this pass, and NBM is a statistical blend with a different inventory (its map already omits pressure_pa_surface). Confirm NBM availability separately before extending nbm.py; ECMWF/MSC/HAFS families are out of scope here.

Open questions / decisions

  1. Naming: ceiling_m vs cloud_ceiling_m. The observation adapters already use cloud_ceiling_m (docs/adapters/iem.md:42). cloud_cover_pct and visibility_m already match canonical names in schema.forecast.station.v1 (Open-Meteo) and the observation adapters. Recommend cloud_ceiling_m for cross-source join consistency, unless we deliberately keep the user's requested ceiling_m. Needs a call.
  2. Disambiguation: Option A vs B (above).
  3. Scope of the GFS precip fix — bundle with this issue or split into its own bug?
  4. cfgrib short-names for the three records — confirm via one decode run before finalizing the name table.

Acceptance criteria

  • forecast_nwp(station, "hrrr") and forecast_nwp(station, "gfs") return non-null cloud_cover_pct, visibility_m, ceiling_m for an in-domain station at default fxx.
  • GFS default call no longer raises GribIntegrityError (cloud and precip disambiguated).
  • New columns validate against schema.forecast_nwp.v1; all-missing columns are float64.
  • QC predicates cover the three fields; "no ceiling" maps to NaN/clean.
  • Unit + live tests added; docs + CHANGELOG updated.

TS Parity (per CLAUDE.md dual-SDK rule)

This changes the public forecast_nwp column surface, so a TS-parity note is required:

  • TS equivalent: identical public API — the TS SDK's NWP forecast result gains the same three columns with the same names/units. The GRIB byte-range + .idx mechanics differ in implementation but the column contract is shared.
  • Ship vs parity ticket: recommend a parity ticket tracked per CROSS-SDK-SYNC.md rather than same-PR, since the disambiguation logic must be re-implemented and tested independently in TS.
  • TS-specific constraints: none beyond existing NWP parity scope (no new deps; same schema id).

Investigated empirically against live AWS BDP .idx data (cycle 2026-06-01 12Z) using the SDK's own parser. The cfgrib end-to-end decode is the single unverified step (the [nwp] extra was not installed in the investigation environment).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions