Skip to content

Scheduled Subdaily-to-Daily Aggregation ETL Task #367

@daniel-slaugh

Description

@daniel-slaugh

Implement a dedicated scheduled ETL task that aggregates sub-daily HydroServer datastreams to daily datastreams (mean or end_of_interval), with UI configuration, run status, logs, and failure notifications.

DWRi's current setup:

  • Most Legacy TCP stations aggregate sub-daily data into a time-weighted daily mean dataset (trapezoidal)
  • One of their stations (beaver_mt) contains a daily summary dataset which uses the last value of each day for a reservoir storage dataset

Frontend Requirements

Task form:

  • Add a select field that defaults to 'ETL' but "Aggregation" is also an option. This should map to task.type.
  • if the task.type == "Aggregation":
    1. Data connection isn't needed for aggregation tasks, so remove the data connection select field and set data connection to null.
    2. Add a timezone selector similar to the other timezone selectors on the frontend except we know hydroserver will always give us a standard ISO timestamp. So the user only needs to select a two button group for either fixed offset or daylight savings aware. Then, they'll select the specific offset or IANA timezone from the list.
  • The schedule section will remain the same
  • The swimlane section will require a datastream selection as the source and target instead of just the target.
  • Add a dropdown to select the aggregation statistic with three choices: 1. simple mean. 2. Time-weighted daily mean 3. Last value of day

hydroserverpy Requirements

  • Add aggregation execution flow:
    • Check phenomenon_end_time for source and destination datastream. Short circuit if there's not at least a day between them.
    • Fetch 3 days behind destination datastream's phenomenon_end_time to current date.
    • for each closed day time range, apply aggregation statistic.
    • Save new data to destination datastream
    • Instead of using hydroserverpy's extract, transform, load pipeline, add a custom path that fetches, aggregates, and loads more efficiently. Make sure this works for both Celery and the Streaming Data Loader.

Backend Requirements

  • Add user friendly logs to allow a workspace owner to easily see the status of their aggregation tasks

Acceptance Criteria

  • Owner/Editor can configure an aggregation task with one or more mappings.
  • Task runs on schedule and via run-now.
  • Output daily values match selected statistic.
  • Aggregation window reliably converts sub daily into daily values
  • UI displays task state and logs.
  • Failures are logged and notification emails are sent.
  • Viewer cannot modify task configuration

Testing

  • Backend: aggregation correctness, schedule/run-now, error+notification paths, permissions.
  • Frontend: form validation, role-based actions, task status/log rendering.

Metadata

Metadata

Assignees

Labels

backendAssociated with the backend repositorydata mgmt appAssociated with the data management apppython clientAssociated with hydroserverpy

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions