Open
Conversation
Implements the ECMWF AIFS (Artificial Intelligence Forecasting System) deterministic forecast product as a new dataset with 16 variables, 0.25° global grid, 6-hourly inits, 0-360h lead times. Handles the S3 path change (aifs/ -> aifs-single/) on 2025-02-26 and GRIB master table version differences for precipitation variables between early and current data.
aldenks
requested changes
Feb 26, 2026
Member
aldenks
left a comment
There was a problem hiding this comment.
lol got nerd sniped into reviewing this. I do think we should do AIFS ENS first but after these comments its basically there
- Rename ecmwf/aifs to ecmwf/aifs_deterministic, update dataset_id - Use shared ecmwf_grib_index.py instead of custom byte-range parser - Support deterministic (non-ensemble) data in shared GRIB index parser - Set max_vars_per_download_group=10, handle multi-band GRIB reads - Use group_by() from common.iterating for source_groups - Remove skin_temperature_surface, rename to precipitable_water_atmosphere - Add replica storage config, update k8s resources to match GFS - Add comprehensive region_job unit tests and snapshot value assertions - Add cross-dataset consistency exceptions for ECMWF naming
Add guidance on: finding a reference dataset before coding, cross-dataset variable naming consistency, running the chunk/shard tool instead of estimating, shared provider utilities, minimum RegionJob test coverage, copying k8s resource values from similar datasets, replica storage setup, snapshot value assertions, and pre-submission cross-dataset checks.
Member
Author
|
@aldenks robot addressed your suggestions. lmk how you want to proceed? I'm not precious about the PR if you want to redo or start with ENS only. I'd lean towards doing the deterministic one in addition to ENS |
aldenks
requested changes
Feb 27, 2026
tests/ecmwf/aifs_deterministic/forecast/dynamical_dataset_test.py
Outdated
Show resolved
Hide resolved
tests/ecmwf/aifs_deterministic/forecast/dynamical_dataset_test.py
Outdated
Show resolved
Hide resolved
src/reformatters/ecmwf/aifs_deterministic/forecast/template_config.py
Outdated
Show resolved
Hide resolved
src/reformatters/ecmwf/aifs_deterministic/forecast/template_config.py
Outdated
Show resolved
Hide resolved
src/reformatters/ecmwf/aifs_deterministic/forecast/template_config.py
Outdated
Show resolved
Hide resolved
src/reformatters/ecmwf/aifs_deterministic/forecast/template_config.py
Outdated
Show resolved
Hide resolved
Member
|
@mrshll to your larger question -- i think we should do this before aifs because it's basically there! |
…nfig.py Co-authored-by: Alden Keefe Sampson <alden@dynamical.org>
Co-authored-by: Alden Keefe Sampson <alden@dynamical.org>
…nfig.py Co-authored-by: Alden Keefe Sampson <alden@dynamical.org>
…nfig.py Co-authored-by: Alden Keefe Sampson <alden@dynamical.org>
mrshll
pushed a commit
that referenced
this pull request
Feb 27, 2026
- Fix chunk/shard sizes: 1 init_time per chunk/shard, larger spatial chunks (240x240) per chunk_shard_size tool output - Make precipitable_water_atmosphere metadata consistent across datasets (short_name=pwat, long_name=Precipitable water) and remove stale CF compliance exceptions - Restructure dataset_integration_guide.md: remove ECMWF-specific examples, revert chunk/shard docs to simple wording, simplify storage config section, remove pre-submission checks (duplicate of AGENTS.md) - Use np.testing.assert_allclose in integration test and add post-update snapshot values for temperature_2m and precipitation_rate_surface - Remove ecmwf-api-client dev dependency and scripts ruff ignore rules - Regenerate zarr template https://claude.ai/code/session_01C5o4hepUqwYHeEeYk5rqjy
- Rename precipitation_rate_surface → precipitation_surface and fix
long_name to match IFS ENS / ECMWF parameter DB ("Precipitation rate")
- Remove convective_precipitation_rate_surface and precipitable_water_atmosphere
(not in IFS ENS variable set)
- Replace geopotential_500hpa (z, m²/s²) with geopotential_height_500hpa,
geopotential_height_850hpa, geopotential_height_925hpa (gh, m) matching
IFS ENS; AIFS provides geopotential (z), so a scale_factor=1/g converts
to geopotential height. Requires adding apply_data_transformations to
AIFS region job.
- Add temperature_850hpa and temperature_925hpa to match IFS ENS
- Reorder variables to match IFS ENS ordering
- Rename S3 bucket to dynamical-ecmwf-aifs-deterministic
Missing from AIFS (not in source GRIB): categorical_precipitation_type_surface (ptype),
wind_gust_10m (10fg).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
template_config_test.py: - test_dataset_attributes: check dataset_id, version format, time_domain/resolution - test_dimension_coordinates_shapes_and_values: detailed type/value checks - test_template_variables_have_required_attrs: iterate all vars for required fields - test_geopotential_height_vars_have_scale_factor: verify z→gh conversion config - test_derive_coordinates_and_spatial_ref: detailed derived coord checks - test_coords_property_order_and_names: coordinate order - test_get_template_spatial_ref: CRS matches expected region_job_test.py: - test_apply_data_transformations_scale_factor: geopotential→height conversion applied - test_apply_data_transformations_no_scale_factor: passthrough when no scale_factor set - test_read_data_alt_precip_metadata: tp readable using table v34+ GRIB metadata Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix chunk/shard layout: 1 init_time per chunk/shard, larger spatial chunks (240x240) - Regenerate AIFS template with updated chunk/shard config - Add complete snapshot value assertions for temperature_2m and precipitation_surface post-update - Restructure dataset_integration_guide.md: move shared/common utility notes to Implementation notes section, remove Pre-submission checks (duplicate of AGENTS.md), simplify storage config and chunk/shard instructions - Add reference dataset notes and expanded common utilities list to AGENTS.md - Remove ecmwf-api-client dev dependency and scripts ruff config (not needed in this PR) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n schedules, improve tests (#498)
Change latitude chunk from 240 to 241 (3 near-equal chunks of 241/241/239 over 721 pixels instead of 240/240/240/1). Shard of 723 covers all 721 pixels in a single shard, improving evenness from 0.501 to 1.000. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The index_cols and set_index calls were duplicated across the ensemble and non-ensemble branches. Unify them with a single list that gets the ensemble prefix prepended conditionally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete ecmwf_utils.py and add vars_available() which uses item() on the set of date_available values to assert the grouping invariant: all vars in a group must share the same date_available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…update Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Assert full URL strings instead of substrings, add boundary and 00z test cases. Simplify integration guide prose. Remove unused boto3 dev dependency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These variables are provided as running accumulations over the entire forecast duration. Add deaccumulate_to_rate=True with window_reset_frequency of Timedelta.max to precipitation_surface, downward_long_wave_radiation_flux_surface, and downward_short_wave_radiation_flux_surface. Apply deaccumulation in region_job apply_data_transformations following the IFS ENS pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Review feedback addressed
ecmwf/aifstoecmwf/aifs_deterministic, update dataset_id toecmwf-aifs-deterministic-forecast-15-day-0-25-degreeecmwf_grib_index.pyinstead of custom byte-range parser; generalize it to support deterministic (non-ensemble) datamax_vars_per_download_group=10and handle multi-band GRIB reads via rasterio metadata matchinggroup_by()fromcommon.iteratingforsource_groupsskin_temperature_surfacevariable, renametotal_column_water_vapour_atmospheretoprecipitable_water_atmosphereEcmwfAifsIcechunkAwsOpenDataDatasetStorageConfig)25 */6 * * *with 30-minute deadlineTest plan
uv run pytest tests/ecmwf/aifs_deterministic/— all passuv run pytest tests/ecmwf/test_ecmwf_grib_index.py— all passuv run pytest tests/common/datasets_cf_compliance_test.py— all pass (excluding pre-existing NASA SMAP failure)uv run pytest tests/common/common_template_config_subclasses_test.py— all pass (excluding pre-existing NASA SMAP failure)uv run pytest -m slow tests/ecmwf/aifs_deterministic/— integration test passes with snapshot valuesuv run ruff format && uv run ruff check— cleanuv run ty check— all checks passed