Skip to content

Add MARS archive support to IFS ENS reformatter#508

Open
mrshll wants to merge 2 commits intomainfrom
mars-staging-validation-test
Open

Add MARS archive support to IFS ENS reformatter#508
mrshll wants to merge 2 commits intomainfrom
mars-staging-validation-test

Conversation

@mrshll
Copy link
Member

@mrshll mrshll commented Mar 12, 2026

Summary

  • Enable the IFS ENS reformatter to fetch historical data (pre-2024-04-01) from the MARS staging bucket on source.coop, routing to ECMWF open data for later dates
  • Add mars_grib_index_param and mars_read_scale_factor to EcmwfInternalAttrs for handling MARS-specific differences (e.g. geopotential z → geopotential height gh conversion)
  • Extend GRIB index parsing with step filtering and missing column handling for MARS indexes
  • Add MARS S3 byte-range download path in region_job
  • Skip GRIB metadata assertions for MARS source (different field descriptions in MARS GRIBs)

Test plan

  • test_read_mars_staging_data — validates all 20 variables are readable from source.coop at a single step/member
  • test_backfill_local_mars_source — runs the full reformatter pipeline (backfill_local) on MARS data, verifying temperature and precipitation output including deaccumulation
  • All existing fast tests pass (29/29 ECMWF tests)
  • ruff format, ruff check, ty check all pass

Validates that the ECMWF IFS ENS MARS backfill data hosted on
source.coop can be downloaded via byte-range requests and read
correctly through rasterio. Tests all 20 template variables
(14 sfc + 6 pl) for correct shape and finite values.

The test handles both the old JSON array and new JSON-lines
index formats, normalizing field names to the open data convention.
Enable the reformatter to fetch historical data from the MARS staging
bucket (source.coop) for init times before 2024-04-01, routing to ECMWF
open data for later dates. Key changes:

- Add mars_grib_index_param and mars_read_scale_factor to EcmwfInternalAttrs
  for z→gh conversion (geopotential to geopotential height)
- Extend grib index parsing with step filtering and missing column handling
  for MARS indexes (all steps in one file, cf-only indexes lack number column)
- Add MARS download path in region_job using S3 byte-range downloads
- Skip GRIB metadata assertions for MARS source (different field descriptions)
- Add test_backfill_local_mars_source integration test running the full
  pipeline on 2016-03-08 MARS data, verifying temperature and precipitation
  output including deaccumulation
@mrshll mrshll changed the title Add slow test for MARS-staged GRIB data on source.coop Add MARS archive support to IFS ENS reformatter Mar 12, 2026
@mrshll mrshll requested a review from aldenks March 12, 2026 20:03
@mrshll mrshll self-assigned this Mar 12, 2026
grib_index_level_type="pl",
grib_index_level_value=925,
keep_mantissa_bits=11,
mars_grib_index_param="z",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this pattern -- is there a better way to declare these changes over time as internal conventions change?

f"{grib_comment=} != {data_var.internal_attrs.grib_comment=}"
# MARS GRIBs have different comment/description metadata than open data,
# so we only validate these fields for open data sources.
if not coord.is_mars_source():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this branching either -- is there a more "native" way to declare changes over the lifetime of the archive, or is it just internal logic like tihs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant