A Satisfactory Model, New Issues, Next Steps

Hi everyone. The past month has been a very active one for our project. In a nutshell, I spent a lot of time training different ML models to achieve satisfactory performance for our data-set. In this update I aim to bring a break-down of the manipulation procedures used to treat the original dataset, as well as the parameters used to train our model over Edge-impulse. And, of course, updates on next steps.

So, as you probably know by now we are working with the "Movement Sensor Dataset for Dog Behavior Classification" dataset (Vehkaoja et al. 2022) - which is part of ESP's Bio-logger Ethogram Benchmark (BEBE). Very well, the original dataset is quite vast (2GB) and we have some limitations with the Edge Impulse tool we are using, which accepts datasets of up 100mb. So some data manipulation was required to get this step done. You can find the jupyter notebook we used for the manipulation in the project's Github repository, alongside the final Manipulated DataSet and the Prediction model itself.

Dataset Manipulation

But here's a quick recap of how we've trimmed the dataset, accounting for the lessons we learned in previous attemps that filtered the data based on dog breed - as well as new insights from how the original dataset has been structured. So we started by Filtering out rows where TestNum is equal to 2, which mean taking away the partial data from dogs which have been recorded twice in the original data collection. This was done under the assumption that it'd be better to take away potentially "repetitive" data, as opposed to randomly taking away rows to fit within aforementioned limit.

After that we filtered the DataFrame to have only observations in Behavior 1. This was done so because in the original dataset some observations has up to three different observed behaviors (when dogs were doing more than one distinguisable movement), but in the analysis done by the same researchers they also kept entries with only one distinguishable behavior.

Then we deleted a series of columns that we wouldn't use in our analysis. These included:

'DogID': The identifier used for each dog in the data gathering experiment
'TestNum': The number of the run for each dog (only 17 did 2 runs)
'ANeck_xyz': All Accelerometer data from Neck sensors - our project focus on back Accelerometer sensors
'GNeck_xyz': All Gyroscope data from Neck sensors - our project focus on back Accelerometer sensors
'GBack_xyz': All Gyroscope data from Back sensors - our project focus on back Accelerometer sensors
'Task': The instructed task given to the dog by their owener during the moment his behavior was being recorded.
'Behavior_2 & 3': Concomitant observed behaviors in dogs - by this point these were full of <undefined> values as we had already filtered for it.
'PointEvent': Short events annotated separately, Bark for example.

Then we proceeded to only look at the seven behaviors that have been used in the author's original analysis of the dataset:

Cmd+enter to leave code blockselected_behaviors = ["Galloping", "Lying chest", "Sitting", "Sniffing", "Standing", "Trotting", "Walking"]

After this we were pretty close to a satisfactory dataset (around 120MB). And to get there we perfomed a 2 decimal-factor rounding of the Accelerometer data, keeping 3 decimal places. And this was enough for to get our working dataset.

ML Model Training

With the dataset in hand it was time to go back to Edge Impulse to get it trained. It is worth mentioning here that I have also seriously considered moving away from Edge Impuls and train the model from scratch via Tensor Flow Light. But I realize that this would demand a whole lot of energy and effort in terms of learning this new tool, that I'd rather dedicate to the future steps of the project. Namely, getting the haptics to work well. That's why I kept with the easy interface of Edge Impulse for training (creating a new pro trial account), which by now I'm very familiar with.

So, Edge Impulse offers many different parameters that one can tweak in order to train a model. All the way from sample and windows sizes, to feature detection and NN configurations. I'll be brief here in saying that I have tried many combinations. And what I have found to have worked the best, in terms of model performance is the following:

Sample Size of 2000ms

Window Size of 2000ms
Window Increase of 1000ms

Wavelet Spectral Feature Analysis
- bior1.3

LGBM Random Forest Classifier

All of which led up to an all time high performing model with 85% accuracy! So this was pretty exciting and it really toke me a long time to get at such results. After that, the next immediate step was deploying the model as a C++ or Arduino library, which I did using Edge Impulse's platform itself.

New Issues

Those who have been following our project since the beginning will remember that we had already a working ML model running on our edge-device before we started this campaign. The only issue then was that our model was trained on human data. That "old" model was also trained on Edge Impulse, thus I was familiar with the process of flashing such model onto our edge device (the Nicla Sense ME board). Basically, one had to change a library for the new model to be in position. However, due to some still unkown issue, when I substituted the old "human" model for the new "dog" one in the same database, an error ocurred while compiling. Debbuging attemps with GPT4 suggests that it has something to do with a variable name conflict between one of the new model's header files and a header file from the Nicla_system.h library. This has made it impossible for us to test the new model in action, on the edge-device. But we are activelly trying to find a work around it.

Next Steps

We first next step is trying to use a different compiler approach for the Nicla board, to see if would allow for the flashing of both the Nicla_system.h library and the new "dog" prediction model. If that does not work, we will move towards using another edge device, similar to the Nicla. That is the Seed Studio Xiao NRF52840, which also possess edge ML capabilites and build in 3 axis accelerometer, like the Nicla board. This board has been used in a similar project by another person, which gives us hope.

Finally, once that is done we really look forward to move to the haptics part of the project, where we will incorporate the edge ML predictions to vibrotactile feedback delivered by a dedicated bracelet. And another idea that has been growing within us is the use of a heart rate sensor alongside the accelerometer predictions, as a way to couple movement and physiological data, all at while transfering both data inputs to the bracelet, so we can really feel our dog's heartbets. Keep on watching for the next steps in this journey! Danilo

1. Vehkaoja, A., Somppi, S., Törnqvist, H., Valldeoriola Cardó, A., Kumpulainen, P., Väätäjä, H., Majaranta, P., Surakka, V., Kujala, M. V., & Vainio, O. (2022). Description of movement sensor dataset for dog behavior classification. Data in Brief, 40, 107822. https://doi.org/10.1016/j.dib.2022.107822

Edge machine learning refers to the process of running embedded ML models on site using devices capable of collecting, processing, and recognizing patterns within collections of raw data. This project seeks to train one of such devices (Nicla SenseME) with dog data from the Earth Species' Bio-logger Ethogram Benchmark (BEBE). The board will be used as a smart dog collar, with its ML inferences controlling haptics vibrations in a bracelet wore by a person.

A Satisfactory Model, New Issues, Next Steps

Dataset Manipulation

ML Model Training

New Issues

Next Steps

1 comment

About This Project

More Lab Notes From This Project

Browse Other Projects on Experiment

Related Projects

Voices of food insecurity: Exploring barriers and strategies to healthy food access

103%
funded

$1,370
goal

3
lab notes

The DNA Typewriter: Building a modular system to encode text in DNA

150%
funded

$2,000
goal

4
lab notes

Analog Genetic Circuits for Interactive Learning

102%
funded

$1,150
goal

7
lab notes

A Satisfactory Model, New Issues, Next Steps

Dataset Manipulation

ML Model Training

New Issues

Next Steps

1 comment

About This Project

More Lab Notes From This Project

Browse Other Projects on Experiment

Related Projects

Voices of food insecurity: Exploring barriers and strategies to healthy food access

103%funded

$1,370goal

3lab notes

The DNA Typewriter: Building a modular system to encode text in DNA

150%funded

$2,000goal

4lab notes

Analog Genetic Circuits for Interactive Learning

102%funded

$1,150goal

7lab notes

103%
funded

$1,370
goal

3
lab notes

150%
funded

$2,000
goal

4
lab notes

102%
funded

$1,150
goal

7
lab notes