About This Project
Advances on audio ML have spurred improvements in bioacoustics and open possibilities for automatically detecting and decoding the meaning of animal calls (1, 2, 3). However, state of the art models have not seen the same successes on the full range of vocalizations produced by various species, in particular the Egyptian fruit-bat. By leveraging the challenge of decoding a bat repertoire, we will develop more holistic models.
Ask the Scientists
Join The DiscussionWhat is the context of this research?
The application of machine learning in bioacoustic research hasled to significant breakthroughs in our understanding of animal behavior, ecology, and even conservation efforts. Traditional bioacoustics often involves manual tagging and categorization of all vocalizations, which necessitates expert-level knowledge of the species’ vocal behavior and can be both time-consuming and subject to human error. Deep neural networks can efficiently process large audio datasets, including unfamiliar data, reducing the need for manual curation. As such, our understanding of many species' vocal patterns is limited (1, 2, 3). Current models, designed mainly for human speech, often miss nuances in animal vocalizations. An unsupervised, broader framework to decode animal "words" (call types) is crucial.
What is the significance of this project?
Integrating expert traditional bioacoustics knowledge with speech processing ML, we aim to create an unsupervised model that can discern call-types across animal vocal repertoires. Such advances would unlock our ability to understand intricate animal communication, and develop more effective conservation strategies (1, 2, 3). Validating this model using a songbird species with a well-known vocal repertoire and structure will not only attest to its accuracy, but may also highlight subtle vocal structures previously missed by supervised analyses. Finally, our unsupervised model will be used to reveal the vocal structure and ontogeny of a bat species for which we have extensive datasets. Using data from both adults and young bats, we will be able to identify how bats learn their own language.
What are the goals of the project?
Building on the HuBERT-based AVES model and inspired by non-parametric instance discrimination tactics, our first objective is to develop an unsupervised model architecture for call-type discovery. By incorporating feature spaces specific to species and harnessing data augmentation, we intend to create a relevant latent space for pinpointing animal vocal patterns. Next, our model will be tested and refined using a zebra finch dataset with vocalizer and call-type annotations (1, 2). Finally, we aim to adapt our model to explore the vocal repertoire of the Egyptian Fruit Bat. Using our extensive dataset of freely interacting bats (young and adults identified through audio-data loggers), our focus is discerning call-types and how they emerge over development in that species.
Budget
To harness the power of cloud-based computing, we must host our data on an online platform. We are leveraging two datasets that have been previously obtained and partially curated. Each dataset comprises roughly 100 days of recordings, with each day accounting for about 50GB, making the total size approximately 10TB. The dataset curation will be pursued independently of this project by Julie Elie with the recruitment of students through UC Berkeley. Our primary focus here is on training deep neural networks. To optimize our outcomes, we have chosen to lead two approaches: 1) work with large, pre-trained models that can be fine-tuned to suit our specific domain needs, 2) developing and training new models inspired from state of the art architectures.
Having a dedicated budget for cloud storage and compute units is crucial for us to test various models online and achieve a robust classification pipeline.
Endorsed by
Project Timeline
Currently, we're progressing as planned with our HuBERT-based approach, intending to augment training data and adjust input parameters. We expect to finish the initial model training in the coming weeks. After that, we'll aim to spend another month evaluating alternative models before finalizing our training pipeline. This refined pipeline will help us pinpoint the most effective models for classification. Our aim is to achieve accurate bat call identification by February of next year.
Oct 31, 2023
Project Launched
Nov 01, 2023
Implement initial model pipeline prototype
Dec 01, 2023
Assess various model fits to zebra finch call domain within our training pipeline
Jan 01, 2024
Finalize model pipeline and assess model performance
Feb 15, 2024
Assess model fit to bat vocalizations and identify preliminary bat vocal repertoire
Meet the Team
Affiliates
Affiliates
Jonathan Wang
I'm a Software Engineer at Walmart Labs and concurrently collaborating on research with Dr. Julie Elie at the Yartsev Lab, UC Berkeley. I earned my degree from UC Berkeley in 2022, majoring in Electrical Engineering and Computer Science, with a minor in Applied Mathematics. My fascination with animal research dates back to my elementary school days in Colorado, where I was immersed in nature and the great outdoors. As I embarked on my computer science journey, I've continually sought ways to intertwine my passion for animals with technology, shaping my present research interests.
Julie Elie
I am a computational neuroethologist passionate about birds and mammals vocal communication. My multidisciplinary approach combines ethological observations, behavioral tests, bioacoustics and machine learning to understand what animals are saying and how and why they’re doing it. My main questions are: (1) how a species social structure affects the dynamic of vocal production; (2) how animals convey intents, needs or emotional states in vocalizations; (3) how much animals rely on learning to produce species typical vocalizations. I work as a post-doc in the lab of Michael Yartsev at UC Berkeley.
Lab Notes
Nothing posted yet.
Additional Information
UMAP interactive plot example (used to project UMAP clustered data from spectrogram/biosound package inputs, then visualize and interact with the datapoints when the clustered graph is shown): github
Project Backers
- 5Backers
- 100%Funded
- $4,000Total Donations
- $800.00Average Donation