Skip to content

modelGeneVar() extremely slow with HDF5-backed SingleCellExperiment #124

@keyingkuang

Description

@keyingkuang

Hi scran team,

Thanks for the great package! I’m encountering a performance issue when using modelGeneVar() on a large HDF5-backed SingleCellExperiment.

Previously, I used:
variable_genes <- scran::modelGeneVar(sce_object)
This completed in around 10 hours on the same large, HDF5-backed dataset. However, running the same call now takes several days without completing.

Here’s how we construct the object:
log_object <- HDF5Array::H5ADMatrix(raw_file_path, "data") count_object <- HDF5Array::H5ADMatrix(raw_file_path, "counts") sce_object <- SingleCellExperiment::SingleCellExperiment( assays = list(counts = count_object, logcounts = log_object) )

It seems that modelGeneVar() might not be fully optimized for delayed operations or is trying to load the entire matrix into memory. However, I don’t see an explicit memory error.

Do you have any suggestions on how to speed this up or safely apply modelGeneVar() to HDF5-backed data?

Thanks,
Maggie

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions