This repository contains all code relevant to the bachelor thesis Cell Nuclei Classification with Graph Neural Networks by Lukáš Hudec.
The code and structure of the repository is derived from a template created and maintained by the RationAI group.
Please note that some parts of the source code require access to sensitive data, which is not publicly available. Furthermore, the code is designed to run on a remote cloud environment with MLFlow logging to ensure reproducibility, both of which are also publicly unavailable. This limits the capabilities to run the scripts in the preprocessing/ directory, as all stripts rely on either sensitive data or access to MLFlow, private to the RationAI group. Similarly, nuclei_segmentation/ cannot be run as it requires large WSIs to produce results.
This repository contains a toy_dataset of randomly generated graph data for demonstration purposes.
This README is inspired by other repositories maintained by the RationAI group.
The repository is organized as follows:
configs/– Configuration files for experiments.preprocessing/– Scripts for data preprocessing.nuclei_graph/– Implementation of the model, graph convolution layers, transforms and dataset handling.nuclei_segmentation/– Scripts for nuclei segmentation using LKCell.
Modification of files from the RationAI Machine Learning Template
All configuration files in configs/.
Files in preprocessing/, apart from:
preprocessing/annotation_masking.py.
Files in nuclei_graph/, apart from:
nuclei_graph/masks/nuclei_mask.py,nuclei_graph/data/samplers/weighed_random_sampler.py.
The Graph Neural Network model is trained on nuclei segmented by LKCell from MMCI prostate cancer WSIs.
Each nucleus is classified as positive - 1 or negative - 0. The ground truth is obtained from an intersection of CAM explainability masks, generated from the prostate model, and expert annotations.
The graph construction and training is done as follows:
- Use a transform from PyTorch Geometric to create a
KNNGraphfrom the provided nuclei to create a WSI-level graph. - Partition each WSI-level graph into smaller subgraphs with
ClusterDatapartitioning. - Filter subgraphs:
- take all subgraphs created from negative WSIs
- take only subgraphs with more than
positivity_threshold%positive nuclei from positive WSIs, wherepositivity_thresholdis a parameter toClusterGraph.- the default value is set to
0, meaning that only graphs with at least one positive nucleus are kept.
- the default value is set to
- During training, all nuclei are kept in the positive subgraphs. A
train_maskis used only to compute loss from nuclei which are marked as positive by both CAM and the expert annotation.
The model creates predictions for the entire WSI-level graph. Partitioning the WSI-level graph into smaller subgraphs, like in training, is not necessary for inference.
To run the model, follow these steps:
If you don't have pdm installed, install it with:
pip install pdmRun the following command to install the required dependencies:
use
pdm installto install the cpu dependencies
or
pdm lock -G gpu && pdm installfor the gpu version.
Note: Due to limitations with PyTorch Geometric versions, you need either Python 3.12 without CUDA, or Python 3.11 and CUDA 12.1 on your machine for this to work. If this is not the case and you have problems with installing torch-scatter, torch-sparse, torch-cluster, and pyg-lib, you need to change the pip wheels in pyproject.toml. The correct wheels for your specific version can be found at https://data.pyg.org/whl/. Unfortunately, the wheels need to be specified, without them, the libraries refuse to install / take a long time to install.
Use the following command to run the model:
pdm {mode} experiment={experiment_config}Where:
{mode}can be one of:fit– Train the modelvalidate– Validate the modeltest– Test the modelpredict– Run inference
{experiment_config}is the name of the experiment, example experiments can be found inconfigs/experimenttoy- Toy experiment for demonstration purposes
To train the model using the toy configuration, run:
pdm train experiment=toyTo generate WSI predictions masks, run:
pdm predict experiment=predict checkpoint="{checkpoint_mlflow_uri}"The predictions are saved to MLFlow.
Note: requires access to RationAI MLFlow
The project is licensed under the MIT license.