Skip to content

Add ECMWF AIFS deterministic forecast dataset#449

Open
mrshll wants to merge 20 commits intomainfrom
ecmwf-aifs
Open

Add ECMWF AIFS deterministic forecast dataset#449
mrshll wants to merge 20 commits intomainfrom
ecmwf-aifs

Conversation

@mrshll
Copy link
Member

@mrshll mrshll commented Feb 25, 2026

Summary

  • Add ECMWF AIFS deterministic forecast dataset integration (15-day, 0.25 degree)
  • Address all review feedback from initial submission

Review feedback addressed

  • Rename ecmwf/aifs to ecmwf/aifs_deterministic, update dataset_id to ecmwf-aifs-deterministic-forecast-15-day-0-25-degree
  • Use shared ecmwf_grib_index.py instead of custom byte-range parser; generalize it to support deterministic (non-ensemble) data
  • Set max_vars_per_download_group=10 and handle multi-band GRIB reads via rasterio metadata matching
  • Use group_by() from common.iterating for source_groups
  • Remove skin_temperature_surface variable, rename total_column_water_vapour_atmosphere to precipitable_water_atmosphere
  • Add replica storage config (EcmwfAifsIcechunkAwsOpenDataDatasetStorageConfig)
  • Update k8s resources to match GFS (cpu=3.5, memory=7G, shared_memory=1.5G, ephemeral=20G)
  • Update cron schedule to 25 */6 * * * with 30-minute deadline
  • Add comprehensive region_job unit tests (source_groups, generate_source_file_coords, download_file, read_data single/multi-band, operational_update_jobs)
  • Add snapshot value assertions in integration test
  • Add lat/lon spacing assertions in template_config test
  • Add cross-dataset consistency exceptions for ECMWF parameter naming
  • Add dataset integration guide sections to prevent common review issues

Test plan

  • uv run pytest tests/ecmwf/aifs_deterministic/ — all pass
  • uv run pytest tests/ecmwf/test_ecmwf_grib_index.py — all pass
  • uv run pytest tests/common/datasets_cf_compliance_test.py — all pass (excluding pre-existing NASA SMAP failure)
  • uv run pytest tests/common/common_template_config_subclasses_test.py — all pass (excluding pre-existing NASA SMAP failure)
  • uv run pytest -m slow tests/ecmwf/aifs_deterministic/ — integration test passes with snapshot values
  • uv run ruff format && uv run ruff check — clean
  • uv run ty check — all checks passed

Implements the ECMWF AIFS (Artificial Intelligence Forecasting System)
deterministic forecast product as a new dataset with 16 variables,
0.25° global grid, 6-hourly inits, 0-360h lead times. Handles the S3
path change (aifs/ -> aifs-single/) on 2025-02-26 and GRIB master table
version differences for precipitation variables between early and
current data.
Copy link
Member

@aldenks aldenks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol got nerd sniped into reviewing this. I do think we should do AIFS ENS first but after these comments its basically there

- Rename ecmwf/aifs to ecmwf/aifs_deterministic, update dataset_id
- Use shared ecmwf_grib_index.py instead of custom byte-range parser
- Support deterministic (non-ensemble) data in shared GRIB index parser
- Set max_vars_per_download_group=10, handle multi-band GRIB reads
- Use group_by() from common.iterating for source_groups
- Remove skin_temperature_surface, rename to precipitable_water_atmosphere
- Add replica storage config, update k8s resources to match GFS
- Add comprehensive region_job unit tests and snapshot value assertions
- Add cross-dataset consistency exceptions for ECMWF naming
Add guidance on: finding a reference dataset before coding, cross-dataset
variable naming consistency, running the chunk/shard tool instead of
estimating, shared provider utilities, minimum RegionJob test coverage,
copying k8s resource values from similar datasets, replica storage setup,
snapshot value assertions, and pre-submission cross-dataset checks.
@mrshll mrshll self-assigned this Feb 26, 2026
@mrshll mrshll requested a review from aldenks February 26, 2026 18:31
@mrshll mrshll marked this pull request as ready for review February 26, 2026 18:32
@mrshll
Copy link
Member Author

mrshll commented Feb 26, 2026

@aldenks robot addressed your suggestions. lmk how you want to proceed? I'm not precious about the PR if you want to redo or start with ENS only. I'd lean towards doing the deterministic one in addition to ENS

@aldenks
Copy link
Member

aldenks commented Feb 27, 2026

@mrshll to your larger question -- i think we should do this before aifs because it's basically there!

mrshll and others added 4 commits February 27, 2026 12:16
…nfig.py

Co-authored-by: Alden Keefe Sampson <alden@dynamical.org>
Co-authored-by: Alden Keefe Sampson <alden@dynamical.org>
…nfig.py

Co-authored-by: Alden Keefe Sampson <alden@dynamical.org>
…nfig.py

Co-authored-by: Alden Keefe Sampson <alden@dynamical.org>
@aldenks aldenks linked an issue Feb 27, 2026 that may be closed by this pull request
mrshll pushed a commit that referenced this pull request Feb 27, 2026
- Fix chunk/shard sizes: 1 init_time per chunk/shard, larger spatial chunks
  (240x240) per chunk_shard_size tool output
- Make precipitable_water_atmosphere metadata consistent across datasets
  (short_name=pwat, long_name=Precipitable water) and remove stale CF
  compliance exceptions
- Restructure dataset_integration_guide.md: remove ECMWF-specific examples,
  revert chunk/shard docs to simple wording, simplify storage config section,
  remove pre-submission checks (duplicate of AGENTS.md)
- Use np.testing.assert_allclose in integration test and add post-update
  snapshot values for temperature_2m and precipitation_rate_surface
- Remove ecmwf-api-client dev dependency and scripts ruff ignore rules
- Regenerate zarr template

https://claude.ai/code/session_01C5o4hepUqwYHeEeYk5rqjy
aldenks and others added 13 commits March 5, 2026 22:41
- Rename precipitation_rate_surface → precipitation_surface and fix
  long_name to match IFS ENS / ECMWF parameter DB ("Precipitation rate")
- Remove convective_precipitation_rate_surface and precipitable_water_atmosphere
  (not in IFS ENS variable set)
- Replace geopotential_500hpa (z, m²/s²) with geopotential_height_500hpa,
  geopotential_height_850hpa, geopotential_height_925hpa (gh, m) matching
  IFS ENS; AIFS provides geopotential (z), so a scale_factor=1/g converts
  to geopotential height. Requires adding apply_data_transformations to
  AIFS region job.
- Add temperature_850hpa and temperature_925hpa to match IFS ENS
- Reorder variables to match IFS ENS ordering
- Rename S3 bucket to dynamical-ecmwf-aifs-deterministic

Missing from AIFS (not in source GRIB): categorical_precipitation_type_surface (ptype),
wind_gust_10m (10fg).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
template_config_test.py:
- test_dataset_attributes: check dataset_id, version format, time_domain/resolution
- test_dimension_coordinates_shapes_and_values: detailed type/value checks
- test_template_variables_have_required_attrs: iterate all vars for required fields
- test_geopotential_height_vars_have_scale_factor: verify z→gh conversion config
- test_derive_coordinates_and_spatial_ref: detailed derived coord checks
- test_coords_property_order_and_names: coordinate order
- test_get_template_spatial_ref: CRS matches expected

region_job_test.py:
- test_apply_data_transformations_scale_factor: geopotential→height conversion applied
- test_apply_data_transformations_no_scale_factor: passthrough when no scale_factor set
- test_read_data_alt_precip_metadata: tp readable using table v34+ GRIB metadata

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix chunk/shard layout: 1 init_time per chunk/shard, larger spatial chunks (240x240)
- Regenerate AIFS template with updated chunk/shard config
- Add complete snapshot value assertions for temperature_2m and precipitation_surface post-update
- Restructure dataset_integration_guide.md: move shared/common utility notes to Implementation notes section, remove Pre-submission checks (duplicate of AGENTS.md), simplify storage config and chunk/shard instructions
- Add reference dataset notes and expanded common utilities list to AGENTS.md
- Remove ecmwf-api-client dev dependency and scripts ruff config (not needed in this PR)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Change latitude chunk from 240 to 241 (3 near-equal chunks of 241/241/239
over 721 pixels instead of 240/240/240/1). Shard of 723 covers all 721
pixels in a single shard, improving evenness from 0.501 to 1.000.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The index_cols and set_index calls were duplicated across the ensemble
and non-ensemble branches. Unify them with a single list that gets the
ensemble prefix prepended conditionally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete ecmwf_utils.py and add vars_available() which uses item() on the
set of date_available values to assert the grouping invariant: all vars
in a group must share the same date_available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…update

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Assert full URL strings instead of substrings, add boundary and 00z
test cases. Simplify integration guide prose. Remove unused boto3 dev
dependency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These variables are provided as running accumulations over the entire
forecast duration. Add deaccumulate_to_rate=True with window_reset_frequency
of Timedelta.max to precipitation_surface, downward_long_wave_radiation_flux_surface,
and downward_short_wave_radiation_flux_surface. Apply deaccumulation in
region_job apply_data_transformations following the IFS ENS pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ECMWF AIFS (deterministic)

2 participants