Skip to content

Add Porto taxi trajectory rasterization example with custom non-linear merge #1021

@brendancol

Description

@brendancol

Author of Proposal: maintainer

Reason or problem

Notebook 28 covers line rasterization and custom merge functions, but only with synthetic geometry. There's nothing showing how these work on real data, so it's hard to see what a custom merge actually does when you throw a few million rows at it.

Proposal

Add a user guide notebook (33_NYC_Taxi_Lines.ipynb) that rasterizes NYC yellow taxi trips as lines with a custom non-linear merge function.

Design:

  • Pull the January 2025 yellow taxi parquet from the TLC public dataset
  • Build LineStrings from pickup/dropoff coordinates
  • Rasterize onto a Manhattan grid using built-in merges (count, sum) and a custom log-fare merge
  • Show how log-sum compresses dynamic range vs linear sum. High-volume corridors dominate the linear sum; the log version lets you see what's happening in the rest of the city

Usage:
One parquet download, filters to Manhattan, runs on a laptop. No library changes.

Value:
A working example of line rasterization on real data, plus a concrete case where a non-linear merge tells you something the built-in modes don't.

Stakeholders and impacts

Useful for people learning the rasterize API. No code changes to the library.

Drawbacks

The parquet file is around 100 MB.

Alternatives

Synthetic data would avoid the download, but notebook 28 already does that.

Unresolved questions

None.

Additional notes or context

Parquet source: https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-01.parquet

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions