Skip to content

Replace stackstac with odc-stac#270

Merged
jsignell merged 2 commits intomainfrom
feature/replace-stackstac-odcstac
Mar 13, 2026
Merged

Replace stackstac with odc-stac#270
jsignell merged 2 commits intomainfrom
feature/replace-stackstac-odcstac

Conversation

@tylanderson
Copy link
Contributor

@tylanderson tylanderson commented Mar 10, 2026

Replaces stackstac for loading data into xarray with odc-stac via odc.stac.load

No-data masking shown via da.nodata and xarray.where. CRS no longer needs to be explicitly specified

Closes #238

@tylanderson tylanderson requested a review from jsignell March 10, 2026 21:59
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions
Copy link

github-actions bot commented Mar 10, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NASA-IMPACT.github.io/veda-docs/pr-preview/pr-270/

Built to branch gh-pages at 2026-03-12 21:11 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 12, 2026

View / edit / reply to this conversation on ReviewNB

jsignell commented on 2026-03-12T16:26:49Z
----------------------------------------------------------------

We can actually go ahead and delete this note!


@review-notebook-app
Copy link

review-notebook-app bot commented Mar 12, 2026

View / edit / reply to this conversation on ReviewNB

jsignell commented on 2026-03-12T16:26:50Z
----------------------------------------------------------------

Is setting the dtype really required? That feels odd. Also is it possible to just use chunks="auto" rather than using a dict with the coords as keys?


tylanderson commented on 2026-03-12T17:59:45Z
----------------------------------------------------------------

For dtype, in this case the STAC item defines nodata as inf, but the data is actually int16. I kept dtype as float32 to allow masking that data in xarray.

For chunks, I thought I recalled that you could pass "auto" to chunks, but looking at the latest docs, they want a dict that has each dim and a chunk size or auto for each

jsignell commented on 2026-03-12T20:02:36Z
----------------------------------------------------------------

Ok! Thanks for looking into it. That's a bummer that it's more verbose and you need to know the coords names before opening the data :(

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 12, 2026

View / edit / reply to this conversation on ReviewNB

jsignell commented on 2026-03-12T16:26:51Z
----------------------------------------------------------------

Line #2.    from odc.algo import mask_cleanup, to_float

these two look unused


@review-notebook-app
Copy link

review-notebook-app bot commented Mar 12, 2026

View / edit / reply to this conversation on ReviewNB

jsignell commented on 2026-03-12T16:26:52Z
----------------------------------------------------------------

Line #7.    data = data.where(data != data.nodata)

Is this necessary?


tylanderson commented on 2026-03-12T18:13:02Z
----------------------------------------------------------------

odc.stac won't generate masked / fill nodata with NaN while loading, so it's expected users do this (unlike something like rioxarray with mask_and_scale). In this case, replacing nodata with NaN allows plotting cleanly

jsignell commented on 2026-03-12T20:03:19Z
----------------------------------------------------------------

Ok cool cool. Thanks for explaining.

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 12, 2026

View / edit / reply to this conversation on ReviewNB

jsignell commented on 2026-03-12T16:26:53Z
----------------------------------------------------------------

Update the text here to make it clear that the data is already clipped coming out of odc.stac.load


@review-notebook-app
Copy link

review-notebook-app bot commented Mar 12, 2026

View / edit / reply to this conversation on ReviewNB

jsignell commented on 2026-03-12T16:26:54Z
----------------------------------------------------------------

Remove this section since the data is already clipped.


tylanderson commented on 2026-03-12T18:06:03Z
----------------------------------------------------------------

This will mask the data outside of the polgyon itself, while above it just limits the data by the bbox of the polygon. I think it's nice to have an example of reducing the initial array via bbox/geopolygon etc, but I could add a comment above to make clear the difference

@@ -107,7 +107,7 @@
"# include all your imports in this cell\n",
Copy link
Collaborator

@jsignell jsignell Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need pandas anymore


Reply via ReviewNB

@jsignell
Copy link
Collaborator

@tylanderson thanks for taking this on! In general I'd be interested in whether using chunks={} gives reasonable performance times.

Copy link
Contributor Author

For dtype, in this case the STAC item defines nodata as inf, but the data is actually int16. I kept dtype as float32 to allow masking that data in xarray.

For chunks, I thought I recalled that you could pass "auto" to chunks, but looking at the latest docs, they want a dict that has each dim and a chunk size or auto for each


View entire conversation on ReviewNB

Copy link
Contributor Author

This will mask the data outside of the polgyon itself, while above it just limits the data by the bbox of the polygon. I think it's nice to have an example of reducing the initial array via bbox/geopolygon etc, but I could add a comment above to make clear the difference


View entire conversation on ReviewNB

Copy link
Contributor Author

odc.stac won't generate masked / fill nodata with NaN while loading, so it's expected users do this (unlike something like rioxarray with mask_and_scale). In this case, replacing nodata with NaN allows plotting cleanly


View entire conversation on ReviewNB

@tylanderson
Copy link
Contributor Author

I'd be interested in whether using chunks={} gives reasonable performance times.

For chunks = {}, I believe what this will do is load using whatever the default chunking is on the layers (this means time will always have a chunk len of 1 for example). I think in many case it could be acceptable for reading single layers

Copy link
Collaborator

Ok! Thanks for looking into it. That's a bummer that it's more verbose and you need to know the coords names before opening the data :(


View entire conversation on ReviewNB

Copy link
Collaborator

Ok cool cool. Thanks for explaining.


View entire conversation on ReviewNB

Copy link
Collaborator

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!!

@jsignell jsignell merged commit 68c68ae into main Mar 13, 2026
1 check passed
@jsignell jsignell deleted the feature/replace-stackstac-odcstac branch March 13, 2026 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switch to odc-stac once it works in pangeo-notebook image

2 participants