feat(weather): expose cloud_cover_pct / visibility_m / ceiling_m in forecast_nwp (HRRR/GFS); pressure_pa_* already ship

## Summary

A user asked whether `forecast_nwp` can expose five additional fields for **HRRR/GFS**, "if they're available in the source GRIBs":

```
cloud_cover_pct
visibility_m
ceiling_m
pressure_pa_mslp
pressure_pa_surface
```

**Empirical answer:**

- **`pressure_pa_mslp` and `pressure_pa_surface` already ship** for HRRR + GFS today (see [Already implemented](#already-implemented)). If the user is seeing `NaN`, that is almost certainly the GFS ambiguity bug below — not a missing field.
- **`visibility_m` and `ceiling_m` are cleanly available** in both HRRR and GFS GRIBs — each maps to a unique `(variable, level)` `.idx` record. Low-risk to add.
- **`cloud_cover_pct` is available but blocked on GFS** by an existing record-disambiguation limitation: on GFS the `(TCDC, entire atmosphere)` key returns **two** `.idx` records (instantaneous + time-averaged), which trips the `GribIntegrityError` ambiguity guard in [`_extract_records`](packages/weather/src/mostlyright/weather/forecast_nwp.py#L416). HRRR is clean (single record).
- ⚠️ **Bonus finding — pre-existing latent bug:** the *already-shipped* `precip_mm_1h` field hits the **same** ambiguity guard on GFS at the default `fxx` (two identical `APCP:surface` records). This strongly implies a default `forecast_nwp(station, "gfs")` call raises `GribIntegrityError` today. See [Related latent bug](#related-latent-bug-gfs-precip).

Net: the feature is feasible. The real work is **deterministic record disambiguation when `(variable, level)` is not unique** — which simultaneously unblocks GFS `cloud_cover_pct` *and* fixes the existing GFS precip defect.

---

## How this was investigated (methodology)

All findings are from **real `.idx` files** fetched from the AWS BDP mirrors the SDK actually uses, decoded with the **SDK's own parser** (`parse_idx` + `compute_byte_end` + `filter_records`) and the production per-model `VARIABLE_MAP`s. Sample cycle: **2026-06-01 12Z**.

- HRRR: `https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20260601/conus/hrrr.t12z.wrfsfcf01.grib2.idx` (170 records)
- GFS: `https://noaa-gfs-bdp-pds.s3.amazonaws.com/gfs.20260601/12/atmos/gfs.t12z.pgrb2.0p25.f001.idx` (+ f000, f006 cross-checks)

> **Not verified locally:** end-to-end cfgrib decode of the three new records — the `[nwp]` extra (`cfgrib`/`xarray`/`sklearn`) is not installed in the investigation environment. The `.idx` availability and the disambiguation logic are fully verified; the cfgrib short-name decode is the one remaining empirical gap (see [Open questions](#open-questions--decisions)).

---

## Empirical availability — real `.idx` records

GRIB mapping is `{canonical_column: (idx_variable, idx_level)}`, matched **exactly** by [`filter_records`](packages/weather/src/mostlyright/weather/_fetchers/_nwp_idx.py#L212). Record numbers are from the 2026-06-01 12Z cycle.

| Requested column | `(variable, level)` | HRRR sfcf01 | GFS f001 (default `fxx=1`) | Native units |
|---|---|---|---|---|
| `cloud_cover_pct` | `(TCDC, entire atmosphere)` | ✅ 1 rec (#116) | ⚠️ **2 recs** (#636 `1 hour fcst` + #637 `0-1 hour ave fcst`) | percent |
| `visibility_m` | `(VIS, surface)` | ✅ 1 rec (#5) | ✅ 1 rec (#10) | metres |
| `ceiling_m` | `(HGT, cloud ceiling)` | ✅ 1 rec (#117) | ✅ 1 rec (#638) | gpm ≈ m |
| `pressure_pa_surface` | `(PRES, surface)` | ✅ already mapped (#62) | ✅ already mapped (#561) | Pa |
| `pressure_pa_mslp` | `(MSLMA / PRMSL, mean sea level)` | ✅ already mapped (#41) | ✅ already mapped (#1) | Pa |

Replaying the production grouping + ambiguity guard against the real `.idx` (SDK's own code):

```
===== HRRR sfcf01 (model=hrrr) =====
  cloud_cover_pct   ('TCDC', 'entire atmosphere')  -> 1 rec  ['1 hour fcst']
  visibility_m      ('VIS', 'surface')             -> 1 rec  ['1 hour fcst']
  ceiling_m         ('HGT', 'cloud ceiling')       -> 1 rec  ['1 hour fcst']

===== GFS f001 (DEFAULT fxx=1) (model=gfs) =====
  precip_mm_1h      ('APCP', 'surface')            -> 2 rec  ['0-1 hour acc fcst','0-1 hour acc fcst']  <-- AMBIGUOUS -> GribIntegrityError
  cloud_cover_pct   ('TCDC', 'entire atmosphere')  -> 2 rec  ['1 hour fcst','0-1 hour ave fcst']        <-- AMBIGUOUS -> GribIntegrityError
  visibility_m      ('VIS', 'surface')             -> 1 rec  ['1 hour fcst']
  ceiling_m         ('HGT', 'cloud ceiling')       -> 1 rec  ['1 hour fcst']

===== GFS f000 (analysis) (model=gfs) =====
  ...all keys -> 1 rec (APCP absent at analysis hour)   # masks the bug when fxx=0
```

HRRR `(TCDC, entire atmosphere)` is genuinely a single "total cloud cover" message; GFS publishes both an instantaneous and a window-averaged TCDC at the same level. `(HGT, cloud ceiling)` is distinct from `HGT:cloud base` / `HGT:cloud top` on both models, so exact-level matching already isolates the ceiling cleanly.

<a name="already-implemented"></a>
## Already implemented (2 of 5)

`pressure_pa_surface` and `pressure_pa_mslp` are **already in the schema and both model maps** — no work needed:

- Schema columns: [`forecast_nwp.py#L166-L167`](packages/core/src/mostlyright/core/schemas/forecast_nwp.py#L166)
- HRRR map: [`hrrr.py#L30-L31`](packages/weather/src/mostlyright/weather/_fetchers/_nwp_grids/hrrr.py#L30) — `PRES:surface`, `MSLMA:mean sea level`
- GFS map: [`gfs.py#L24-L25`](packages/weather/src/mostlyright/weather/_fetchers/_nwp_grids/gfs.py#L24) — `PRES:surface`, `PRMSL:mean sea level`

**Action:** confirm with the user they're on a build that includes these, and that any `NaN` they observe on GFS isn't the precip/cloud ambiguity bug aborting the whole fetch.

<a name="related-latent-bug-gfs-precip"></a>
## ⚠️ Related latent bug — GFS precip ambiguity (pre-existing)

`precip_mm_1h → (APCP, surface)` returns **two** records on a real GFS `f001` `.idx` (#596 and #597, both `0-1 hour acc fcst` — a well-known GFS quirk where APCP is emitted twice). [`filter_records`](packages/weather/src/mostlyright/weather/_fetchers/_nwp_idx.py#L212) keeps both (distinct `record_no`), then the [`_extract_records` guard](packages/weather/src/mostlyright/weather/forecast_nwp.py#L416) raises `GribIntegrityError`.

Because **`fxx` defaults to `1` for GFS** ([`forecast_nwp.py#L651`](packages/weather/src/mostlyright/weather/forecast_nwp.py#L651)), this is the *default* path. It's invisible today because:
- Live tests are skipped in CI (per the testing playbook), and
- The only "ambiguous" unit test ([`test_forecast_nwp.py:417`](packages/weather/tests/test_forecast_nwp.py)) exercises `_cfgrib_variable_name`, **not** the duplicate-`.idx`-record grouping path.

The cloud-cover disambiguation work below fixes this in the same stroke. **Recommend confirming with one live GFS call and splitting into its own bug if the team prefers a separate fix track.**

---

## Proposed implementation

### 1. The core decision: disambiguate non-unique `(variable, level)` records

This is the only non-mechanical part. Three options:

- **Option A (recommended): prefer the instantaneous record, then lowest `record_no`.** Among records sharing `(variable, level)`, drop window-aggregated periods (`forecast_period` matching `ave`/`acc`/`max`/`min` window patterns) in favor of the plain `N hour fcst` / `anl` record; break remaining ties by lowest `record_no`. Fixes GFS `cloud_cover_pct` (picks #636 inst over #637 ave) **and** GFS `precip_mm_1h` (both windows identical → first by `record_no` = #596). Minimal map churn; keeps the loud-fail guard for *genuinely* unexpected duplicates.
- **Option B: extend the map value to `(variable, level, forecast_period_matcher)`.** Explicit and self-documenting, but verbose across 24 model modules and still needs a `record_no` tiebreak for APCP's identical twins.
- **Option C: silently take the first `record_no` on any collision.** Simplest, but picks inst-vs-ave by luck of file order with no stated intent — least safe; rejected.

Whichever is chosen, preserve the existing `GribIntegrityError` for collisions that the rule *cannot* resolve, so true upstream layout changes still fail loud (per the RESEARCH "loud-fail" anti-pattern note).

### 2. Mechanical changes (per new field)

1. **Schema** — add three nullable `float64` `ColumnSpec`s to [`NwpForecastSchema`](packages/core/src/mostlyright/core/schemas/forecast_nwp.py#L101): `cloud_cover_pct` (`units="percent"`), `visibility_m` (`units="m"`), `ceiling_m` (`units="m"`). Additive + nullable → backward compatible within `schema.forecast_nwp.v1`. Regenerate exported JSON schema + `EXPORT_MANIFEST` and update `test_schemas_codegen.py`.
2. **Per-model maps** — add the three entries to [`hrrr.py`](packages/weather/src/mostlyright/weather/_fetchers/_nwp_grids/hrrr.py) and [`gfs.py`](packages/weather/src/mostlyright/weather/_fetchers/_nwp_grids/gfs.py):
   `cloud_cover_pct: ("TCDC", "entire atmosphere")`, `visibility_m: ("VIS", "surface")`, `ceiling_m: ("HGT", "cloud ceiling")`.
3. **cfgrib name table** — add entries to [`_GRIB_VAR_TO_CFGRIB_NAME`](packages/weather/src/mostlyright/weather/forecast_nwp.py#L120) (likely `vis`, `tcc`, and a `gh`/`ceil` for HGT-at-ceiling — **confirm via decode**). The single-message fallback at [`forecast_nwp.py#L1006`](packages/weather/src/mostlyright/weather/forecast_nwp.py#L1006) covers an unknown mapping, but explicit entries are preferred.
4. **Row assembly** — add the three columns to `nullable_numeric_cols` ([`forecast_nwp.py#L931`](packages/weather/src/mostlyright/weather/forecast_nwp.py#L931)) and to `_empty_dataframe` ([`forecast_nwp.py#L1035`](packages/weather/src/mostlyright/weather/forecast_nwp.py#L1035)) so all-missing columns still round-trip as `float64`.
5. **QC rules** — add predicates to [`rules_nwp.py`](packages/weather/src/mostlyright/weather/qc/rules_nwp.py) `RULES_NWP_NCEP`: `cloud_cover_pct ∈ [0,100]`, `visibility_m ≥ 0`, `ceiling_m ≥ 0`. Mind the "no ceiling" encoding (below).
6. **Docs / CHANGELOG** — update [`docs/nwp-forecasts.md`](docs/nwp-forecasts.md) field table + CHANGELOG.

### 3. Tests (TDD — mandatory per CLAUDE.md)

- Unit: a `.idx` fixture containing the real **two-record** GFS `TCDC:entire atmosphere` and `APCP:surface` cases; assert the disambiguation picks the instantaneous record and that `forecast_nwp(..., "gfs")` no longer raises.
- Unit: HRRR single-record path returns the value unchanged (no regression).
- `@pytest.mark.live`: HRRR + GFS fetch of all three fields against a real cycle, asserting plausible physical ranges and confirming cfgrib decode.

---

## Caveats to handle

- **`ceiling_m` "no ceiling" (clear sky).** `HGT:cloud ceiling` carries a GRIB *missing* value when there is no ceiling. Confirm cfgrib yields `NaN` (preferred — and QC-safe) vs a sentinel like `20000`/`-5000` that would need masking. This is the most likely source of a surprising column.
- **`visibility_m` cap.** Model VIS saturates (e.g. ~24135 m). Document; do not clip (native-units convention).
- **`cloud_cover_pct` semantics.** `TCDC:entire atmosphere` is column-integrated total cloud — *not* the METAR/observation sky-cover the AWC/IEM adapters report. Document the difference so quants don't naively join them.
- **NBM not verified.** The sampled NBM `.idx` path did not resolve this pass, and NBM is a statistical blend with a different inventory (its map already omits `pressure_pa_surface`). Confirm NBM availability separately before extending [`nbm.py`](packages/weather/src/mostlyright/weather/_fetchers/_nwp_grids/nbm.py); ECMWF/MSC/HAFS families are out of scope here.

<a name="open-questions--decisions"></a>
## Open questions / decisions

1. **Naming: `ceiling_m` vs `cloud_ceiling_m`.** The observation adapters already use **`cloud_ceiling_m`** ([`docs/adapters/iem.md:42`](docs/adapters/iem.md)). `cloud_cover_pct` and `visibility_m` *already match* canonical names in `schema.forecast.station.v1` (Open-Meteo) and the observation adapters. Recommend **`cloud_ceiling_m`** for cross-source join consistency, unless we deliberately keep the user's requested `ceiling_m`. **Needs a call.**
2. **Disambiguation: Option A vs B** (above).
3. **Scope of the GFS precip fix** — bundle with this issue or split into its own bug?
4. **cfgrib short-names** for the three records — confirm via one decode run before finalizing the name table.

## Acceptance criteria

- [ ] `forecast_nwp(station, "hrrr")` and `forecast_nwp(station, "gfs")` return non-null `cloud_cover_pct`, `visibility_m`, `ceiling_m` for an in-domain station at default `fxx`.
- [ ] GFS default call no longer raises `GribIntegrityError` (cloud **and** precip disambiguated).
- [ ] New columns validate against `schema.forecast_nwp.v1`; all-missing columns are `float64`.
- [ ] QC predicates cover the three fields; "no ceiling" maps to `NaN`/`clean`.
- [ ] Unit + live tests added; docs + CHANGELOG updated.

## TS Parity (per CLAUDE.md dual-SDK rule)

This changes the public `forecast_nwp` column surface, so a TS-parity note is required:

- **TS equivalent:** identical public API — the TS SDK's NWP forecast result gains the same three columns with the same names/units. The GRIB byte-range + `.idx` mechanics differ in implementation but the column contract is shared.
- **Ship vs parity ticket:** recommend a **parity ticket** tracked per [`CROSS-SDK-SYNC.md`](.planning/CROSS-SDK-SYNC.md) rather than same-PR, since the disambiguation logic must be re-implemented and tested independently in TS.
- **TS-specific constraints:** none beyond existing NWP parity scope (no new deps; same schema id).

---

*Investigated empirically against live AWS BDP `.idx` data (cycle 2026-06-01 12Z) using the SDK's own parser. The cfgrib end-to-end decode is the single unverified step (the `[nwp]` extra was not installed in the investigation environment).*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(weather): expose cloud_cover_pct / visibility_m / ceiling_m in forecast_nwp (HRRR/GFS); pressure_pa_* already ship #63

Summary

How this was investigated (methodology)

Empirical availability — real `.idx` records

Already implemented (2 of 5)

⚠️ Related latent bug — GFS precip ambiguity (pre-existing)

Proposed implementation

1. The core decision: disambiguate non-unique `(variable, level)` records

2. Mechanical changes (per new field)

3. Tests (TDD — mandatory per CLAUDE.md)

Caveats to handle

Open questions / decisions

Acceptance criteria

TS Parity (per CLAUDE.md dual-SDK rule)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Requested column	`(variable, level)`	HRRR sfcf01	GFS f001 (default `fxx=1`)	Native units
`cloud_cover_pct`	`(TCDC, entire atmosphere)`	✅ 1 rec (#116)	⚠️ 2 recs (#636 `1 hour fcst` + #637 `0-1 hour ave fcst`)	percent
`visibility_m`	`(VIS, surface)`	✅ 1 rec (#5)	✅ 1 rec (#10)	metres
`ceiling_m`	`(HGT, cloud ceiling)`	✅ 1 rec (#117)	✅ 1 rec (#638)	gpm ≈ m
`pressure_pa_surface`	`(PRES, surface)`	✅ already mapped (#62)	✅ already mapped (#561)	Pa
`pressure_pa_mslp`	`(MSLMA / PRMSL, mean sea level)`	✅ already mapped (#41)	✅ already mapped (#1)	Pa

feat(weather): expose cloud_cover_pct / visibility_m / ceiling_m in forecast_nwp (HRRR/GFS); pressure_pa_* already ship #63

Description

Summary

How this was investigated (methodology)

Empirical availability — real .idx records

Already implemented (2 of 5)

⚠️ Related latent bug — GFS precip ambiguity (pre-existing)

Proposed implementation

1. The core decision: disambiguate non-unique (variable, level) records

2. Mechanical changes (per new field)

3. Tests (TDD — mandatory per CLAUDE.md)

Caveats to handle

Open questions / decisions

Acceptance criteria

TS Parity (per CLAUDE.md dual-SDK rule)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Empirical availability — real `.idx` records

1. The core decision: disambiguate non-unique `(variable, level)` records