Skip to content

xhudec2/nuclei-graph

Repository files navigation

Nuclei Graph Model

This repository contains all code relevant to the bachelor thesis Cell Nuclei Classification with Graph Neural Networks by Lukáš Hudec.

The code and structure of the repository is derived from a template created and maintained by the RationAI group.

Please note that some parts of the source code require access to sensitive data, which is not publicly available. Furthermore, the code is designed to run on a remote cloud environment with MLFlow logging to ensure reproducibility, both of which are also publicly unavailable. This limits the capabilities to run the scripts in the preprocessing/ directory, as all stripts rely on either sensitive data or access to MLFlow, private to the RationAI group. Similarly, nuclei_segmentation/ cannot be run as it requires large WSIs to produce results.

This repository contains a toy_dataset of randomly generated graph data for demonstration purposes.

This README is inspired by other repositories maintained by the RationAI group.

Project Structure

The repository is organized as follows:

  • configs/ – Configuration files for experiments.
  • preprocessing/ – Scripts for data preprocessing.
  • nuclei_graph/ – Implementation of the model, graph convolution layers, transforms and dataset handling.
  • nuclei_segmentation/ – Scripts for nuclei segmentation using LKCell.

My Contribution

Modification of files from the RationAI Machine Learning Template

All configuration files in configs/.

Files in preprocessing/, apart from:

  • preprocessing/annotation_masking.py.

Files in nuclei_graph/, apart from:

  • nuclei_graph/masks/nuclei_mask.py,
  • nuclei_graph/data/samplers/weighed_random_sampler.py.

Model Training

The Graph Neural Network model is trained on nuclei segmented by LKCell from MMCI prostate cancer WSIs. Each nucleus is classified as positive - 1 or negative - 0. The ground truth is obtained from an intersection of CAM explainability masks, generated from the prostate model, and expert annotations. The graph construction and training is done as follows:

  1. Use a transform from PyTorch Geometric to create a KNNGraph from the provided nuclei to create a WSI-level graph.
  2. Partition each WSI-level graph into smaller subgraphs with ClusterData partitioning.
  3. Filter subgraphs:
    • take all subgraphs created from negative WSIs
    • take only subgraphs with more than positivity_threshold% positive nuclei from positive WSIs, where positivity_threshold is a parameter to ClusterGraph.
      • the default value is set to 0, meaning that only graphs with at least one positive nucleus are kept.
  4. During training, all nuclei are kept in the positive subgraphs. A train_mask is used only to compute loss from nuclei which are marked as positive by both CAM and the expert annotation.

Model Inference

The model creates predictions for the entire WSI-level graph. Partitioning the WSI-level graph into smaller subgraphs, like in training, is not necessary for inference.

Setup Instructions

To run the model, follow these steps:

1. Install pdm

If you don't have pdm installed, install it with:

pip install pdm

2. Install Dependencies

Run the following command to install the required dependencies:

use

pdm install

to install the cpu dependencies

or

pdm lock -G gpu && pdm install

for the gpu version.

Note: Due to limitations with PyTorch Geometric versions, you need either Python 3.12 without CUDA, or Python 3.11 and CUDA 12.1 on your machine for this to work. If this is not the case and you have problems with installing torch-scatter, torch-sparse, torch-cluster, and pyg-lib, you need to change the pip wheels in pyproject.toml. The correct wheels for your specific version can be found at https://data.pyg.org/whl/. Unfortunately, the wheels need to be specified, without them, the libraries refuse to install / take a long time to install.

3. Running the Model

Use the following command to run the model:

pdm {mode} experiment={experiment_config}

Where:

  • {mode} can be one of:
    • fit – Train the model
    • validate – Validate the model
    • test – Test the model
    • predict – Run inference
  • {experiment_config} is the name of the experiment, example experiments can be found in configs/experiment
    • toy - Toy experiment for demonstration purposes

Training the Model

To train the model using the toy configuration, run:

pdm train experiment=toy

Slide-level Nuclei Predictions

To generate WSI predictions masks, run:

pdm predict experiment=predict checkpoint="{checkpoint_mlflow_uri}"

The predictions are saved to MLFlow.

Note: requires access to RationAI MLFlow

License

The project is licensed under the MIT license.

About

Thesis repository

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors