-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Hi scran team,
Thanks for the great package! I’m encountering a performance issue when using modelGeneVar() on a large HDF5-backed SingleCellExperiment.
Previously, I used:
variable_genes <- scran::modelGeneVar(sce_object)
This completed in around 10 hours on the same large, HDF5-backed dataset. However, running the same call now takes several days without completing.
Here’s how we construct the object:
log_object <- HDF5Array::H5ADMatrix(raw_file_path, "data") count_object <- HDF5Array::H5ADMatrix(raw_file_path, "counts") sce_object <- SingleCellExperiment::SingleCellExperiment( assays = list(counts = count_object, logcounts = log_object) )
It seems that modelGeneVar() might not be fully optimized for delayed operations or is trying to load the entire matrix into memory. However, I don’t see an explicit memory error.
Do you have any suggestions on how to speed this up or safely apply modelGeneVar() to HDF5-backed data?
Thanks,
Maggie