Detecting sounds (not voice)

Hi all, I am considering a new project aiming at detecting different types of sounds, including different animal sounds, different mechanical sounds and not voice. Considering to use the

I have searched a bit, but havent found really powerful frameworks and examples on sound fingerprinting. There are some, but they seem outdated or experimental.

Anyone with experience out there?

"Considering to use the … "

What are you considering?

I wouldn’t be surprised if there is no existing “powerful” framework for this. My friend just got his Master’s a couple weeks ago by fingerprinting individual whales and species of whales using tons of passive recordings. I think it’s just an emerging technology.

In terms of the DSP he did I believe it wasn’t all that compute heavy, but rather very data heavy.

Do you intend to just record in the field or record and process in the field?

Yep considering the compute module or beaglebone embedded linux boards.

Will need recording and processing in the field.

Sounds interesting with the whales. Anything you can share?

I’ve already told him about your post because he was talking to me about joining forces on a similar project. Nothing I should share right now because this is his work. But I’ll let you know what he wants to share when I talk to him. Sounds like a cool project!

1 Like

There is a ton of academic literature on this kind of work. In the old days, it was underwater acoustic detection of submarines that drove the technology. Now a days, there is a lot of work that has been done and published on detecting bird calls (check the Cornell Ornithology lab website, they have a ton of information) and whale calls. Almost for sure on a range of other sounds but those are the ones I’m familiar with. You should be able to find a ton of references with Google and/or Google Scholar. Recently a lot of work has been done using machine learning to make training a detection algorithm easier. Again, should be lots of references on the web e.g.

1 Like


I spoke with my friend. He said his solution was very specific to whales and the specific data set he used. However, he told me that a paper, Toolbox For Animal Call Recognition, was very useful to him.

Here is the abstract and a method of purchasing access. You might be able to find free access another way.

This isn’t exactly going to be your catch all for determining what hardware you need. But in order to do that for a project like this it is probably worthwhile to define your problem in terms of capturing data (how, how much, how often, sample rate, storage methods), which algorithms you will use (expected library support, speed, RAM needed), and connectivity (how are you going to get the data out).

Maybe if you could get a proof of concept running in the full .NET that might be a good indicator of a product on this site working as well.

P.S. I’m not so experienced so if my advice seems off that’s probably because I’m spit balling :slight_smile:
Good luck!

Thanks a lot, you brought me into interesting directions. The paper is in pdf right here:

I have realized that the complexity of the challenge is significantly larger than first anticipated, but I will keep investigating.

Thanks again!