Skip to content

Fix archive data fidelity and add on-demand NetCDF download#26

Merged
cchwala merged 3 commits intomainfrom
improve_data_generator
Mar 10, 2026
Merged

Fix archive data fidelity and add on-demand NetCDF download#26
cchwala merged 3 commits intomainfrom
improve_data_generator

Conversation

@cchwala
Copy link
Member

@cchwala cchwala commented Mar 10, 2026

PR summary

  • Fix flat-plateau bug: _get_netcdf_index_for_timestamp was stretching
    the source file across the full archive period; now cycles at the source
    data's native pace, eliminating identical-value plateaus.
  • On-demand download: ensure_netcdf_file() fetches the 3-month /
    10-second OpenMRG file automatically on startup if not present
    (NETCDF_FILE_URL env var); used by both archive_generator and
    mno_simulator.
  • Faster generation: Single contiguous isel(slice(...)).values call
    replaces thousands of individual isel() calls — load time drops from
    minutes to ~1 s.
  • New defaults: 1-day archive at 10-second resolution
    (overridable via ARCHIVE_DAYS / ARCHIVE_INTERVAL_SECONDS).

cchwala added 3 commits March 10, 2026 21:29
- _get_netcdf_index_for_timestamp: use original_duration as denominator
  instead of loop_duration_seconds so the source file cycles at its
  native 10 s resolution rather than being stretched across the archive
  period (eliminated flat plateaus followed by sudden jumps)
- add ensure_netcdf_file() helper: downloads via temp file with progress
  logging; archive_generator and mno_simulator call it at startup so the
  3-month file is fetched automatically when not already present
- generate_archive.py: add --netcdf-file-url / NETCDF_FILE_URL support;
  switch defaults to 1 day / 10 s; replace per-slice isel() loop with a
  single contiguous slice load (slice(0, max_idx+1)) for fast bulk read
- archive_generator and mno_simulator: set NETCDF_FILE to 3-month file,
  NETCDF_FILE_URL to the KIT download link; volumes writable so the file
  can be persisted to ./parser/example_data/ on the host
- Archive defaults: 1 day, 10 s interval (ARCHIVE_DAYS / ARCHIVE_INTERVAL_SECONDS)
- config.yml: update netcdf_file path to 3-month file
@cchwala cchwala merged commit d136099 into main Mar 10, 2026
5 checks passed
@codecov
Copy link

codecov bot commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 57.37705% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.44%. Comparing base (5c081f8) to head (e395975).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
mno_data_source_simulator/data_generator.py 27.77% 26 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #26      +/-   ##
==========================================
- Coverage   72.17%   71.44%   -0.73%     
==========================================
  Files          22       22              
  Lines        1980     2021      +41     
==========================================
+ Hits         1429     1444      +15     
- Misses        551      577      +26     
Flag Coverage Δ
mno_simulator 84.18% <57.37%> (-3.99%) ⬇️
parser 77.91% <ø> (ø)
webserver 44.73% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant