Skip to content

Feature Request: Parameterize data retrieval methods for selective access (NOAADataset & PangaeaDataset) #41

@doswal

Description

@doswal

Summary

Currently, methods such as get_tables, get_sites, get_publications, and similar retrieval APIs return data for all registered studies in a dataset instance.

When working with large-scale queries (e.g., multiple studies retrieved via search_studies), this behavior can become inefficient and unnecessarily memory-intensive.

This issue proposes adding parameterization to these methods to allow selective retrieval of data for specific study IDs, site IDs, or table IDs.

Problem

In workflows involving large datasets: search_studies() may register hundreds of studies

Subsequent calls like:

ds.get_tables()
ds.get_sites()
ds.get_publications()

return all associated data

This leads to:

  • Increased memory usage
  • Slower execution
  • Lack of fine-grained control for downstream workflows

Proposed Solution

Extend retrieval methods to accept optional filtering parameters.

Example API

ds.get_tables(study_ids=[13156, 19281])
ds.get_sites(site_ids=[101, 102])
ds.get_publications(study_ids=[13156])

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions