acestor is a production dengue intelligence system built around two pipelines:
| Pipeline | Purpose |
|---|---|
dengue_prep |
Downloads and prepares raw case and weather data into prepared_data/ |
dengue |
Reads from prepared_data/, runs forecasting models, produces maps and reports |
Both are driven from a single YAML config file each and can be run independently or scheduled together.
- Prerequisites
- Installation
- Running the pipelines
- Scheduling pipelines
- Configuration guide
- Pipeline stages
- Project layout
- Development
- Architecture
- dengue_prep pipeline
- Running & scheduling both pipelines
- Config reference
- Troubleshooting
Before anything else, make sure you have:
- Python 3.10+ — python.org
- uv — fast Python package manager (install guide)
- Geospatial system libraries — required for
geopandas/shapely:- Mac:
brew install gdal proj geos - Linux (Debian/Ubuntu):
apt-get install gdal-bin libgdal-dev libgeos-dev libproj-dev - Windows: install OSGeo4W or use WSL
- Mac:
- pdflatex (optional) — only needed if
report.compile_pdf: truein your config
1. Clone the repo
git clone https://github.com/dsih-artpark/acestor.git
cd acestor2. Install dependencies
uv sync --extra dengue --extra cds --extra s3| Extra | What it adds |
|---|---|
dengue |
Geospatial + modeling stack (geopandas, scikit-learn, etc.) |
cds |
Copernicus CDS weather downloads |
s3 |
AWS S3 storage backend |
3. Set up environment variables
Copy the example env file and fill in your secrets:
cp .env.example .env # if it exists, otherwise create .env manuallyAt minimum, set these if you use CDS downloads or email notifications:
CDS_API_KEY=your-key-here
SMTP_PASSWORD=your-password-here
Secrets in YAML configs use
${VAR:-default}syntax — never commit real keys.
Run dengue_prep first to prepare data, then dengue to produce a forecast.
For full details — including how to chain them and schedule both — see docs/RUNNING_PIPELINES.md.
# Step 1 — prepare data
uv run python -m acestor.run \
--pipeline pipelines.dengue_prep.pipeline:build_pipeline \
--config configs/ap_district_prep.yaml
# Step 2 — run forecast
uv run python -m acestor.run \
--pipeline pipelines.dengue.pipeline:build_pipeline \
--config configs/ap_district.yaml \
--run-id my-first-run--pipeline— points to the pipeline builder function--config— your YAML config file--run-id— any string to identify this run; outputs go under{artifacts_base}/{run-id}/
Exit code 0 = success, non-zero = failure.
make run-dengue-pipeline DENGUE_RUN_ID=my-run
# With a custom config:
DENGUE_CONFIG=configs/ap_district.yaml make run-dengue-pipeline DENGUE_RUN_ID=my-run
# Incremental/staged graph (faster, for testing):
make run-dengue-pipeline-incremental DENGUE_RUN_ID=smoke-001Outputs land under {storages.artifacts.filesystem.base_path}/{run_id}/:
{run_id}/
predictions/
plots/
reports/
results/ ← zipped LaTeX bundle, maps zip
Use scripts/run_schedules.py to run one or more pipelines on a recurring schedule.
Edit the PIPELINES list at the top of scripts/run_schedules.py:
PIPELINES = [
{
"name": "gba-weekly",
"cron": "0 6 * * 1", # every Monday at 06:00 UTC
"pipeline": "pipelines.dengue.pipeline:build_pipeline",
"config": "configs/gba_stage1_s3.yaml",
},
# add more pipelines here
]Cron expression format: minute hour day month day_of_week
| Example | Meaning |
|---|---|
0 6 * * 1 |
Every Monday at 06:00 UTC |
0 8 * * * |
Every day at 08:00 UTC |
*/30 * * * * |
Every 30 minutes |
Foreground (useful for testing):
uv run python scripts/run_schedules.pyBackground — Mac/Linux:
nohup uv run python scripts/run_schedules.py > .acestor/scheduler.out 2>&1 &
echo $! # prints the PID — save it to stop the scheduler laterBackground — Windows:
Start-Process pythonw -ArgumentList "scripts\run_schedules.py" -WindowStyle HiddenStop the scheduler (Mac/Linux):
kill <PID>Each run writes its own log file:
logs/
gba-weekly/
run-20260327_060000.log
run-20260403_060000.log
Watch a run live:
tail -f logs/gba-weekly/run-20260327_060000.logList all runs for a pipeline:
ls -lht logs/gba-weekly/If the scheduler was briefly down and missed a scheduled run, it will catch up automatically (within a 1-hour grace window).
For production deployment on Docker or AWS EC2 — including systemd setup, IAM roles, S3 artifact storage, and log monitoring — see docs/DEPLOYMENT.md.
A pre-built Docker image is available on Docker Hub:
docker pull dsihartpark/acestor:latesthttps://hub.docker.com/repository/docker/dsihartpark/acestor
Images are tagged by version (e.g., dsihartpark/acestor:1.0.0) and built automatically on every GitHub release tag.
Start from an example config in configs/ — gba_docker_test.yaml is a good starting point.
For a full reference of every config key, see docs/CONFIG_REFERENCE.md.
pipeline:
name: dengue
title: "Dengue Intelligence" # used in report titles
run:
run_date: "2026-03-18" # the reference date for this run
storages:
artifacts:
filesystem:
base_path: "/path/to/outputs" # where all run outputs are written
data:
case_download:
enabled: true
source_path: "datasets/raw_linelist_data/..."
geojson:
base_path: "datasets/geojsons/geojsons_GBA"
email: # optional — run notifications
enabled: false
on: [success, failed]
smtp_host: smtp.example.com
to: [you@example.com]
report:
compile_pdf: false # set true if pdflatex is installedUse ${VAR} or ${VAR:-default} anywhere in the config — they are resolved at load time:
email:
smtp_password: "${SMTP_PASSWORD}"Steps execute in DAG order. Names match logs and code under pipelines/dengue_prep/steps/.
| # | Step | What it does |
|---|---|---|
| 1 | download_case_data |
Locates raw case files (filesystem or S3) |
| 2 | parse_case_data |
Parses IHIP files → daily case counts by region, upserts into prepared_data/ |
| 3 | download_weather_data |
Downloads weather from OpenMeteo, CDS, or pre-parsed source |
| 4 | parse_weather_data |
Aggregates to daily weather features, upserts into prepared_data/ |
Steps execute in DAG order. Names match logs and code under pipelines/dengue/steps/.
| # | Step | What it does |
|---|---|---|
| 1 | identify_sampling_day |
Resolves the case window and run metadata |
| 2 | parse_case_data |
Loads prepared daily case series from prepared_data/ |
| 3 | validate_case_data_sufficiency |
Optional gate — stops early if data is too thin |
| 4 | parse_weather_data |
Loads prepared daily weather features from prepared_data/ |
| 5 | identify_cutoff_dates |
Case/weather cutoffs and prediction calendar |
| 6 | generate_thresholds |
Builds threshold tables from history + config |
| 7 | train_and_predict |
Fits models, writes predictions |
| 8 | combine_predictions |
Single combined predictions table |
| 9 | assess_thresholds |
Threshold assessment + figure metadata |
| 10 | generate_maps |
Choropleth map PNGs |
| 11 | generate_report |
JSON + LaTeX bundle + maps zip + optional PDF |
| 12 | notify_run |
Sends success email (if configured) |
acestor-v2/
├── acestor/ # Core runtime: config, orchestration, storage, CLI
├── pipelines/
│ ├── dengue_prep/ # Data preparation pipeline (download + parse)
│ └── dengue/ # Forecast pipeline (model + maps + report)
├── configs/ # Example YAML configs (ap_district.yaml, ap_district_prep.yaml, …)
├── docs/ # Guides: Architecture, DENGUE_PREP, RUNNING_PIPELINES, CONFIG_REFERENCE, …
├── scripts/
│ ├── run_schedules.py # Multi-pipeline APScheduler process
│ └── install_schedule.py # Crontab installer (single-pipeline alternative)
├── logs/ # Per-run log files (created at runtime)
├── Dockerfile
└── pyproject.toml
# Install with dev extras
uv sync --all-extras
# Lint, format, test
make lint
make format
make testPre-commit hooks: pre-commit install
For ARTPARK deployments and collaboration: artpark.in GitHub: dsih-artpark/acestor Issues: github.com/dsih-artpark/acestor/issues