Skip to content

add example docs to all datasets and tasks possible#823

Open
jhnwu3 wants to merge 1 commit intomasterfrom
improve/docs
Open

add example docs to all datasets and tasks possible#823
jhnwu3 wants to merge 1 commit intomasterfrom
improve/docs

Conversation

@jhnwu3
Copy link
Collaborator

@jhnwu3 jhnwu3 commented Feb 6, 2026

This pull request primarily adds usage examples to the docstrings of dataset and task classes across the codebase, making it easier for users to understand how to instantiate and use these classes. Additionally, there are minor formatting and code cleanup improvements. The most important changes are grouped below:

Documentation improvements (usage examples):

  • Added example code snippets to the docstrings of major dataset classes: EHRShotDataset, MIMIC3Dataset, MIMIC4EHRDataset, MIMIC4NoteDataset, MIMIC4CXRDataset, and MIMIC4Dataset to demonstrate typical usage and initialization. [1] [2] [3] [4] [5] [6]
  • Added example code snippets to the docstrings of task classes: BenchmarkEHRShot, BMDHSDiseaseClassification, ChestXray14BinaryClassification, ChestXray14MultilabelClassification, COVID19CXRClassification, DrugRecommendationMIMIC4, InHospitalMortalityMIMIC4, LengthOfStayStageNetMIMIC4, and MIMIC3ICD9Coding. [1] [2] [3] [4] [5] [6] [7] [8] [9]

Code formatting and cleanup:

  • Improved formatting and consistency in argument lists for dataset class initializers by adding trailing commas. [1] [2] [3] [4]
  • Minor code formatting improvements and restructuring for readability in task files, including consistent sample creation and import placement. [1] [2] [3]

Minor functional changes:

  • Simplified and clarified logic in the pre_filter method of BenchmarkEHRShot for filtering OMOP tables.

Miscellaneous:

  • Added blank lines and improved import organization for clarity in several task files. [1] [2] [3] [4]

These changes collectively enhance usability, readability, and maintainability of the codebase.

@jhnwu3 jhnwu3 requested a review from EricSchrock February 6, 2026 16:37
Copy link
Collaborator

@EricSchrock EricSchrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple small items I noticed.

from pyhealth.datasets import ChestXray14Dataset # Avoid circular import
from pyhealth.datasets import ChestXray14Dataset # Avoid circular import

if disease not in ChestXray14Dataset.classes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task currently only supports all lower-case disease names, so the added example will fail. Probably best to update the task to support upper-case too.

        self.disease = disease.lower()
        if self.disease not in ChestXray14Dataset.classes:
            msg = f"Invalid disease: '{self.disease}'! Must be one of {ChestXray14Dataset.classes}."
            logger.error(msg)
            raise ValueError(msg)

>>> from pyhealth.tasks import InHospitalMortalityMIMIC4
>>> dataset = MIMIC4EHRDataset(
... root="/path/to/mimic-iv/2.2",
... tables=["diagnoses_icd", "procedures_icd", "labevents"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this task uses diagnoses_icd or procedures_icd.

Suggested change
... tables=["diagnoses_icd", "procedures_icd", "labevents"],
... tables=["labevents"],

samples = []
for i in range(len(admissions) - 1): # Skip the last admission since we need a "next" admission
for i in range(
len(admissions) - 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This autoformatting seems a little aggressive. Is this what you're expecting?

Same question throughout this file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants