Improve polygonize performance: single-pass tracing, JIT merge helpers, batch shapely#1010
Merged
brendancol merged 3 commits intomasterfrom Mar 16, 2026
Merged
Improve polygonize performance: single-pass tracing, JIT merge helpers, batch shapely#1010brendancol merged 3 commits intomasterfrom
brendancol merged 3 commits intomasterfrom
Conversation
…s, batch shapely (#1008) Replace the two-pass _follow with a single-pass implementation using a dynamically-grown buffer. This eliminates retracing every polygon boundary a second time, which was the dominant cost for rasters with many small regions. Add @ngjit to _point_in_ring, _simplify_ring, and _signed_ring_area so the dask chunk-merge path runs compiled instead of interpreted. Use shapely.polygons() batch constructor for hole-free polygons in _to_geopandas (shapely 2.0+, with fallback for older versions).
- Buffer growth: snake-shaped polygon with >64 boundary points - JIT merge helpers: direct tests of _simplify_ring, _signed_ring_area, _point_in_ring - Dask merge: checkerboard pattern forcing many boundary merges - Geopandas batch: mixed hole-free and holed polygons through the shapely.polygons() batch path
…1008) Benchmarks showed the single-pass _follow was 15-30% slower than the original two-pass version. The buffer growth check in the inner loop adds overhead that numba doesn't optimize away, and the two-pass approach benefits from the data being in cache on the second pass. Reverted _follow to the original two-pass implementation. The JIT merge helpers (2.3-2.6x dask speedup) and batch shapely construction (1.3-1.6x geopandas speedup) are kept.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two internal performance changes to
polygonize. No API changes._simplify_ring,_signed_ring_area, and_point_in_ringwere plain Python loops in the dask chunk-merge path. Added@ngjitso they compile with numba._to_geopandasnow batch-constructs hole-free polygons viashapely.linearrings()+shapely.polygons()on shapely 2.0+. Polygons with holes and older shapely fall back to the scalar constructor.(An earlier commit tried replacing the two-pass
_followwith a single-pass buffer-growth approach. Benchmarks showed it was 15-30% slower -- numba already optimizes the two-pass loop well, and the buffer growth check added inner-loop overhead. Reverted.)Benchmarks
_point_in_ring10k pts_simplify_ring5k ptsTest plan
_simplify_ring,_signed_ring_area,_point_in_ringCloses #1008