Skip to content

Conversation

@edoyango
Copy link
Collaborator

@edoyango edoyango commented Jan 8, 2026

In addition to adding tests, this "fixes" the numpy and dask Select classes so that they do what is said in the doc string. For example,

from pyearthtools.pipeline.operations.dask.select import Select
import dask.array as da

incoming_data = da.zeros((10,5,2))
select = Select([0])
print(select.apply_func(incoming_data).shape)
select = Select([0, None, 0])
print(select.apply_func(incoming_data).shape)

prints

(10, 5, 1)
(10, 5, 1)

instead of the expected (as per the doc string)

(5,2)
(5)

the old numpy Select does the same thing.

I found the logic difficult to follow, so to correct the behaviour, I rewrote the classes to use numpy/dask's native tuple/slice indexing instead.

The xarray select class does the right thing so didn't need changes.

this fixes the implementation of numpy.Select so that
the results match what is stated in the doc string.
This changes the implementation to use indexing with
native slices.
this fixes the implementation of dask.Select so that
the results match what is stated in the doc string.
This changes the implementation to use indexing with
native slices.
@coveralls
Copy link

Pull Request Test Coverage Report for Build 20802458553

Details

  • 107 of 107 (100.0%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.6%) to 65.631%

Totals Coverage Status
Change from base Build 20305027296: 0.6%
Covered Lines: 10636
Relevant Lines: 15846

💛 - Coveralls

@nikeethr
Copy link
Collaborator

nikeethr commented Jan 12, 2026

I'm also having trouble parsing what was done previously.

As long as:

  • the behavior is consistent with the docs
  • the behavior is consistent with the different "backends", and
  • doesn't regress any tutorials that use Select (in which case its more likely that the tutorial is what needs to be updated),

then the changes seem fine.

I'm not familiar enough with the codebase but Select seems like a pretty common thing which is why I in particular highlighted the last point re: tutorial regression. Also not familiar if (some subset of) tutorials are auto included in the testing - feel free to ignore that last point, if you think there's no chance the tutorials would be affected.

@edoyango
Copy link
Collaborator Author

the behavior is consistent with the docs

The new functionality ensures numpy and dask select match the doc string example. The description, however, is probably outdated as select doc string seems to be select array sections using a given index list - rather than an element.

Operation to select an element from a given array

Select data from a given index

the behavior is consistent with the different "backends"

It's a little hard to say that all the backends have consistent behaviour, as xarray doesn't have a select class - instead it has SelectDataset and SliceDataset. But the behaviour between the dask and numpy ops was, and still is, consistent.

doesn't regress any tutorials that use Select (in which case its more likely that the tutorial is what needs to be updated)

I couldn't find any tutorials that use the Select operation, so there shouldn't be any regressions.

@nikeethr
Copy link
Collaborator

Hmm, well dask and numpy definitely need to be consistent since they are pretty fundemental APIs.

With xarray it actually uses dask or numpy under the hood so I’m not sure why it’s a bit different. We can revisit it later I guess. For now at least the logic is a bit more readable.

odd that there’s no tutorials using select, given it’s pretty much bread and butter for dissecting data. I guess they all use the entire dataset all the time? Or maybe I’m misunderstanding what select does and it’s something more internal. I’m also not sure what the difference between select and slice would be.

Again I’m used to “slice of indices” to “select data”, so a slice is essentially a map to a particular selection, I haven’t looked into Slice in the context of PET to comment further.

In any case I’m fine with the changes given there’s no regression. We can revisit xarray backend being a bit different later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants