-
Notifications
You must be signed in to change notification settings - Fork 88
Description
See #1215 (comment). Quoting relevant part here:
I also have a quick question about memory usage. Since it worked fine on the other cluster I wanted to see the maximum size of array that will work. I noticed that the operation
a_train = a[train_mask]requires peak memory of about 2.5 times the size of the original matrixa, when the masktrain_maskselects 50% of the rows (train_maskis 1D array).My expectation was a memory footprint of about 1.5x (1x for
a+ 0.5x fora_train). Can you confirm if this 2.5x peak memory usage is expected for boolean mask indexing? If so, could you briefly explain why the temporary memory requirement is so high?
The current implementation of the "masked copy" operation first calculates the offsets of the non-zeroes (which uses an array of size equal to that of the original mask array), then creates the output array and uses the offsets to fill it in. Therefore, the memory overhead you quote is expected, under the existing code. See https://github.com/nv-legate/cupynumeric/blob/main/src/cupynumeric/index/advanced_indexing.cu#L120.
But possibly for the case of a dense array we could use
thrust::copy_if, which has lower memory requirements.