Skip to content

📊 Add income distribution S3 export for bespoke viz#6128

Merged
marcelgerber merged 3 commits into
masterfrom
data-add-income-distribution
May 18, 2026
Merged

📊 Add income distribution S3 export for bespoke viz#6128
marcelgerber merged 3 commits into
masterfrom
data-add-income-distribution

Conversation

@marcelgerber
Copy link
Copy Markdown
Member

@marcelgerber marcelgerber commented May 15, 2026

Our other bespoke viz also upload data to the owid-public S3 bucket (see e.g. #6067 and https://github.com/owid/etl/blob/6e3a93b6d1afcbc99771e00556162131eb3f9c59/etl/steps/export/s3/ihme_gbd/latest/gbd_treemap_json.py), so let's do that for the income distribution plot, too. We also transform the data into a JSON format that's easily consumable in JS.

@paarriagadap If you want, you could now get rid of the data://external/poverty_inequality/latest/thousand_bins_distribution step. I believe that specifying the latest version in this step's DAG should be enough.

Summary

  • Add an S3 export step for the 2026 World Bank PIP thousand-bin income distribution data.
  • Register export://s3/poverty_inequality/latest/income-distribution in the poverty and inequality DAG.
  • Emit compact country-keyed JSON at data/poverty-inequality/income-distribution.2026.json.

Validation

  • DRY_RUN=1 .venv/bin/etlr export://s3/poverty_inequality/latest/income-distribution --export --private

@owidbot
Copy link
Copy Markdown
Contributor

owidbot commented May 15, 2026

Quick links (staging server):

Site Dev Site Preview Admin Wizard Docs

Login: ssh owid@staging-site-data-add-income-distribution

chart-diff: ✅ No charts for review.
data-diff: ✅ No differences found

Automatically updated datasets matching excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included

Edited: 2026-05-15 12:27:40 UTC
Execution time: 7.33 seconds

@marcelgerber marcelgerber changed the title 📊 Add income distribution S3 export 📊 Add income distribution S3 export for bespoke viz May 15, 2026
@marcelgerber marcelgerber requested a review from paarriagadap May 15, 2026 12:32
@marcelgerber marcelgerber marked this pull request as ready for review May 15, 2026 12:51
Copy link
Copy Markdown
Contributor

@paarriagadap paarriagadap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

My only comment is about hardcoding the year, which is not so future-proof. I would prefer to assign the latest year instead.

It's true that the previous step looks redundant now, but I am keeping it for now, because I might use it for static charts.

# S3 bucket name and folder where dataset files will be stored.
S3_BUCKET_NAME = "owid-public"
S3_DATA_DIR = Path("data/poverty-inequality")
EXPORT_YEAR = 2026
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer this to be the latest year and not hardcode to 2026

S3_BUCKET_NAME = "owid-public"
S3_DATA_DIR = Path("data/poverty-inequality")
EXPORT_YEAR = 2026
OUTPUT_FILE = f"income-distribution.{EXPORT_YEAR}.json"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, hardcoding the year can be a problem for future versions of the data

@marcelgerber
Copy link
Copy Markdown
Member Author

I've now changed it so that it always creates a data file for each year. Then we don't need to hardcode anything here, and can just flexibly use whichever data file we need.
It's not like it costs us much in terms of time or space, anyhow.

@marcelgerber marcelgerber merged commit df5fb10 into master May 18, 2026
4 checks passed
@marcelgerber marcelgerber deleted the data-add-income-distribution branch May 18, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants