Many months, changes, and models after...
Hello to everyone who has been following the development of our project. Well, "following" might be a big stretch in this case since we haven't really shared updates for the past seven months. Sincere apologies for that. In this labnote we give context on this silence, as well as share updates on what is new around this project - which is ALIVE!
Speaking of being alive...
One of the big events that greatly delayed our progress was a terrible accident Maniçoba (our canine partner in this research) suffered back in May. He was attack by a Pitbull and almost passed away, undergoing surgery and a slow recovery process for the healing of the many wounds made on his torso:

The attack was by our neighbor's dog, right after we moved to a new home - a personal change that demanded a lot of my energy during Q1. So, alongside having to deal with the organizing of a new place, we were very involved in Maniçoba recovery, taking good care of wounds and emotions. All of which took me away from the lab, alongside work, family and Life.
Changes Everywhere
All of these life changing events happened here in Brazil. Meanwhile in India, Akash (the project's co-lead) also experienced big changes. As part of Dognosis, he and his team successfully raised a first round of funds to begin pawing the ground towards canine olfactory detection of cancer cells - setting up a Dog Lab in Bangalore.
(BTW, Dognosis is currently raising another round of investments. If you'd like to know more, shoot Akash a message).
Reorienting
It has just been recently that we as a team have been able to re-unite and think through next steps for this research project. As documented before, we had stumbled upon a big barrier involving the training (thus performance) of our first ML model.
Fortunately, during the past month and a half we have been able to find work arounds and actually start working on the project again! So here are some new approaches and results we've been having.
Back to Dataset
One of the first things we did was decreasing the number of canine behavior classes we were working with, to match the number of classes used in the study that collected and analyzed the data we are using (Kumpulainen et al. 2021). So we went from 18 classes present in the raw dataset to 7:


Despite these changes, we kept the weight filter we applied to the data in our first attempt (trying to have a dataset that best fit Maniçoba's biotype). But more on that later on.
Back to Edge Impulse
With that we started engaging with Edge Impulse again, this time under a 14-day trial (unlimited) enterprise account. In revisiting both Edge Impulse's inner-workings and the original paper whose dataset we are working with, we have been able to better inform Edge Impulse on the characteristics of the dataset, creating better structures for the processing of the data:

For instance, given that we are working with subparts of the dataset, Edge Impulse can't parse the frequency of data points from the timestamps. So by using the override function we can inform the platform straight away.

Another example is the determining of samples size on Edge Impulse, which we have been able to defined based on information gathered from (Kumpulainen et al. 2021). These parameters haven't been set like this in our previous attempt, and they have definitely helped us get better results this time.
One great feature of an Edge Impulse enterprise account is the use of a tool called EON Tunner, which helps a great deal in determining the best parameters to train the ML model (all the way from feature discovery to NN configurations):
"The EON Tuner analyzes your input data, potential signal processing blocks, and neural network architectures - and gives you an overview of possible model architectures that will fit your chosen device's latency and memory requirements"
This has enabled us to get some exciting results in terms of ML performance. Up to 72% of overall accuracy, even for the quantized model version to be deployed on edge!

However, we noticed something else: the filter we applied to the dataset to match Maniçoba's biotype yielded a sub-dataset with very limited data on the "Standing" and "Sniffing" classes, which as one can imagine are rather relevant ones! This observation got us thinking more deeply about the filter we applied, what it means for model accuracy and possible trade-offs.
For instance, we realized that this approach is actually very likely overfitting our model to a subset of the overall dataset, and this does not necessarily mean that such a model will perform better for smaller dogs like Manicoba.
At the same time, there is a practical reason for why we filtered the dataset in the first place: it's too big as it is. And Edge Impulse can only take in datasets up to 100mb. So moving forward we will utilize a randomized approach to decrease the dataset's size, hopefully arriving at better results.
That's it for now! Please stay tunned for upcoming updates - where you can expect an ever better model, possibly already running on the Nicla Board with edge inference of Manicoba's movements.
0 comments