Skip to content

feat: add time_units and calendar metadata to CMIP6/CMIP7 datasets#574

Open
lewisjared wants to merge 7 commits intomainfrom
feat/add-time-units-calendar
Open

feat: add time_units and calendar metadata to CMIP6/CMIP7 datasets#574
lewisjared wants to merge 7 commits intomainfrom
feat/add-time-units-calendar

Conversation

@lewisjared
Copy link
Contributor

Description

Adds time_units and calendar metadata columns to CMIP6 and CMIP7 dataset models, enabling proper handling of non-standard CF calendars (e.g., 360_day, noleap, proleptic_gregorian).

Previously, start_time/end_time were stored as datetime.datetime, which cannot represent dates like Feb 30 in a 360-day calendar. This PR:

  • Adds time_units and calendar columns to both CMIP6Dataset and CMIP7Dataset SQLAlchemy models
  • Changes DatasetFile.start_time/end_time from DateTime to String column type in the DB
  • Extracts calendar metadata from netCDF files via the complete parsers (cmip6_parsers.py, cmip7_parsers.py)
  • Replaces parse_datetime with parse_cftime_dates in utils.py, producing cftime.datetime objects that support all CF calendars
  • Updates constraint comparisons (PartialDateTime, RequireContiguousTimerange, RequireOverlappingTimerange) to work with cftime.datetime via duck typing and mixed-calendar fallbacks
  • Converts times on DB load in the base adapter so all downstream code works with cftime objects
  • Handles parquet round-trips by converting cftime to/from strings in solve_helpers.py
  • Includes an Alembic migration that adds the new columns and resets finalised=False to trigger re-finalisation

Key design decisions

  • cftime.datetime objects stored as strings in DB (cftime doesn't subclass datetime.datetime)
  • During partial DRS finalization, mixed calendars ("standard" vs "proleptic_gregorian") are handled via string-based fallback comparison
  • climate-ref-core does NOT depend on cftime; PartialDateTime uses duck typing (getattr) instead of isinstance

Checklist

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Changelog item added to changelog/

During partial DRS finalization, subgroups can contain cftime objects
from different but equivalent calendars (e.g., "standard" vs
"proleptic_gregorian"). Add try-except fallback that converts to
string-based comparison when direct cftime subtraction raises TypeError.
@codecov
Copy link

codecov bot commented Mar 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
core ?
providers 89.93% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.
see 81 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PostgreSQL's sourcedatasettype enum stores enum names (CMIP6, CMIP7),
not Python enum values (cmip6, cmip7). SQLite accepted lowercase since
it has no native enum type, but PostgreSQL rejects the invalid values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant