Methods
Summary
The digital compound light microscope will be used to take pictures of a population of a single species of Dinoflagellates (which will be purchased from a vendor as permanent stained slides and not collected from the environment to ensure pure colonies).
Each image will be segmented into smaller images. These images will then be initially processed manually to identify which images contain cells or parts of cells and which are an empty background. This small database will be used to train a convolution neural network to differentiate between empty images and images with cells to streamline the database generation for the rest of the Dinoflagellate species and to make it easier to build up future databases.
After an image database of all the Dinoflagellate species is developed, and divided into a training, testing, and evaluation sets, another convolution neural network will be trained to classify the different species. Once an accuracy of +99% is reached the model weights will be published as open source for everyone to use for free.
Finally I will be testing samples from the environment to confirm the usability of the classifier.
Challenges
1. Developing the image database might be more difficult than planned, I have developed a preliminary script that seem to work at the moment in making the image database easier and faster to generate with minimum human manual input. But that can only be confirmed once the script is deployed.
2. The training of the neural network itself is currently more art than science, this is the main challenge, but I will be using standardized image classifiers that will probably make this phase less of a pain.
3. The Dinoflagellates permanent slides so not give enough information for the neural network to find differences between species, is expected, and if their is less information presented then different staining methods will be used, or none-stained cells will be tested.
Pre Analysis Plan
I will be using Keras on TensorFlow python library to setup my artificial neural networks which implements an accuracy method that tests the percentage of successfully classified Dinoflagellates, this is standard in machine learning protocols, and the accuracy level should reach +99% for a model to be deemed useful.
Protocols
This project has not yet shared any protocols.