Summary
Currently, methods such as get_tables, get_sites, get_publications, and similar retrieval APIs return data for all registered studies in a dataset instance.
When working with large-scale queries (e.g., multiple studies retrieved via search_studies), this behavior can become inefficient and unnecessarily memory-intensive.
This issue proposes adding parameterization to these methods to allow selective retrieval of data for specific study IDs, site IDs, or table IDs.
Problem
In workflows involving large datasets: search_studies() may register hundreds of studies
Subsequent calls like:
ds.get_tables()
ds.get_sites()
ds.get_publications()
return all associated data
This leads to:
- Increased memory usage
- Slower execution
- Lack of fine-grained control for downstream workflows
Proposed Solution
Extend retrieval methods to accept optional filtering parameters.
Example API
ds.get_tables(study_ids=[13156, 19281])
ds.get_sites(site_ids=[101, 102])
ds.get_publications(study_ids=[13156])
Summary
Currently, methods such as get_tables, get_sites, get_publications, and similar retrieval APIs return data for all registered studies in a dataset instance.
When working with large-scale queries (e.g., multiple studies retrieved via search_studies), this behavior can become inefficient and unnecessarily memory-intensive.
This issue proposes adding parameterization to these methods to allow selective retrieval of data for specific study IDs, site IDs, or table IDs.
Problem
In workflows involving large datasets: search_studies() may register hundreds of studies
Subsequent calls like:
return all associated data
This leads to:
Proposed Solution
Extend retrieval methods to accept optional filtering parameters.
Example API