-
Notifications
You must be signed in to change notification settings - Fork 86
Description
Author of Proposal: maintainer
Reason or problem
Notebook 28 covers line rasterization and custom merge functions, but only with synthetic geometry. There's nothing showing how these work on real data, so it's hard to see what a custom merge actually does when you throw a few million rows at it.
Proposal
Add a user guide notebook (33_NYC_Taxi_Lines.ipynb) that rasterizes NYC yellow taxi trips as lines with a custom non-linear merge function.
Design:
- Pull the January 2025 yellow taxi parquet from the TLC public dataset
- Build LineStrings from pickup/dropoff coordinates
- Rasterize onto a Manhattan grid using built-in merges (count, sum) and a custom log-fare merge
- Show how log-sum compresses dynamic range vs linear sum. High-volume corridors dominate the linear sum; the log version lets you see what's happening in the rest of the city
Usage:
One parquet download, filters to Manhattan, runs on a laptop. No library changes.
Value:
A working example of line rasterization on real data, plus a concrete case where a non-linear merge tells you something the built-in modes don't.
Stakeholders and impacts
Useful for people learning the rasterize API. No code changes to the library.
Drawbacks
The parquet file is around 100 MB.
Alternatives
Synthetic data would avoid the download, but notebook 28 already does that.
Unresolved questions
None.
Additional notes or context
Parquet source: https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-01.parquet