OtolReader supports automated classification of hatchery marks in salmon otoliths. The package provides a Jupyter Notebook-based utility that allows a human to easily extract marked subsamples from an otolith image. These samples can be stored in a local directory. The samples can then be used to train a machine learning algorithm for otolith classification. The software automatically processes the samples to generate a list of 151 numbers summarizing each sample. The set of all sample summary vectors are then used to train a neural network to implement the classification.
1.) Generate a database of training samples using the TrainingDatabaseMaker.ipynb notebook
2.) Generate training arrays using the samples with the PublicationFiles/TrainingArrayGenerator.py script
3.) Upload training arrays to AWS S3 account
4.) Generate and train classification and binary models using the InternalTransferNetworkTraining/InternalTransferNetwork-TrainingCrossVal.ipynb notebook on AWS Sagemaker. The notebook calls the InternalTransferNetworkTraining/TransferNetworkTraining.py module and references files stored in AWS S3. The file paths may need to be updated to match the users AWS S3 file directory.
5.) Download the models from S3
6.) Run the PublicationFiles/ClassificationNetworkSampleSelection.py script to identify samples for the second round of training of the binary network. Running this with 150 images and ten folds took three full days.
7.) Use the PublicationFiles/BinaryTransferTraining notebook to retrain the top layer of the binary networks and save the resulting models.
8.) Use the PublicationFiles/CrossValProcessing.ipynb notebook to explore the performance of the cross validation models.
A package that stores otolith processing utilities.
stores several otolith-specific image processing functions that can be used by higher-level algorithms.
- draw_samples_2
- Used to draw rectangles at predetermined locations in an image, based on x, y, and rotation coordinates. Useful for visualizing sample selections.
- draw_samples_3
- Similar functionality to draw_samples_2, but a more efficient implementation. Used to draw rectangles at predetermined locations in an image, based on x, y, and rotation coordinates. Useful for visualizing sample selections.
- extract_samples_2
- Extracts samples from an image given coordinates and rotation. Assumes that the whole image is rotated before applying the coordinates.
- extract_samples_3
- Extracts samples from an image given coordinates and rotation. Assumes that the coordinates are applied, then the sample is extracted, then rotated and cropped to the final size.
- fcn_mark_im_ind
- Computes the mark index and image index (starting from 0) based on the raw image index and the number of images per mark. For example, If the raw image index is 5 and their are two images per mark, then the function would return mark index 2 and image index 1.
- high_low
- computes upper, lower, left and right indexes to use in extracting a sub-array, given the center coordinates and target width and height. Accounts for cases where the target sample is reduced in size due to overlapping the boundary of the original array.
- cross_validation
- This function implements ten-fold cross validation on a data set. It is rarely used now because it is not designed for work with AWS, but can reduce the overhead in implementing cross-validation tests.
defines two functions that implement a fast fourrier transform algorithm and use it to interpret otolith image samples.
- fft_score
- compresses an input 2D sample to a 1D array, computes a second derivative of the 1D array, calculates the FFT of the result and returns the sum of the f_fft over the range of frequencies specified by the user (defaults to 9:12)
- fft_finder
- Iterates across an image and returns the indexes in the image with the highest fft score (as calculated by the fft_score function)
Directory containing samples extracted from the five original image classes (3,5H10, 1,6H, None, 4n,2n,2H, 6,2H).
A jupyter notebook that can be used to efficiently extract marked samples from otolith images.
A directory of modules for creating and using classification neural networks.
A script used to generate a training array. The script process samples from SampleDatabase to create to generate synthetic samples, process them into 151 element vectors, and store all of them in a single array.
A script that iterates through models generated for cross validation and calculates the accuracy of classifications for each fold.
A script used to train a selection and classification network for cross validation.
All models and results documented in the publication "Determining salmon provenance with automated otolith reading" were generated using the scripts in the otolreader/PublicationFiles directory.