This project explores implementing of 1D ResNet model to predict crystal violet concentrations from Raman spectra. The model’s performance is evaluated using a combination of regression and classification metrics to reflect both the continuous nature of the predictions and the discrete levels of the target concentrations. During training, Mean Squared Error is used as the loss function. Evaluation metrics (MAE, RMSE, R2, and weighted kappa score) are calculated after rounding predictions to the nearest concentration level.
The dataset used in this project belongs to Prof. Dr. Alpan Bek. Due to data sharing restrictions original raw dataset is not shared with this repository.
As a result, preprocessing.ipynb notebook which was originally used to process the raw data can not be executed without access to raw files. However, it is included for illustrate the preprocessing steps applied before training.
The preprocessed dataset used in this study is available via an external link. After downloaded, make sure to update data_path variable inside the main.ipynb notebook accordingly to point to your local dataset folder.
This project is implemented based on 1 and portions of the original code from 1 are adapted. Click here for the original code.
preprocessing.ipynb= Jupyter notebook used for preprocessing the raw Raman spectroscopy data (not directly usable without raw files)project.ipynb= main notebook to run training, evaluation and visualization of resultsresnet.py= defines custom 1D ResNet architecture used for the regression taskdata.py=loading and optional augmentation of the dataset for training.augment.py= augmentation functions and tools for both online and offline data augmentation.training.py= training loop, loss tracking, and performance evaluation
Footnotes
-
Ho, CS., Jean, N., Hogan, C.A. et al. Rapid identification of pathogenic bacteria using Raman spectroscopy and deep learning. Nat Commun 10, 4927 (2019). https://doi.org/10.1038/s41467-019-12898-9 ↩ ↩2