An audio dataset and IPython notebook for training a convolutional neural network to distinguish the sound of foosball goals from other noises using TensorFlow

So I was looking into ways to process and build machine learning models to work with audio data. Here is a beautiful article by humblesoftwaredev to do the same. 🙂 Do read.

humblesoftwaredev

tl;dr:

fooshttps://github.com/dk1027/ConvolutionalNeuralNetOnFoosballSounds
IPython notebook: CNN on Foosball sounds.ipynb
Trained CNN model using TensorFlow: model.ckpt
Pickled Pandas dataframe: full_dataset_44100.pickle

Abstract

I setup mics at the foosball table and recorded a few hours of foosball games. The audio files were labelled by hand and then segmented into one-second clips of goals / other noises. Mel spectrograms were created from the clips and about 200 samples were created and used for training, testing, and validation, resulting in 5% error on test data.

Data collection and labelling

I used a Zoom H5 XY stereo mic, a Shure SM57, and a few other mics for recording. Each mic had its own characteristic and they were placed at different locations around the table, for instance, pointing close to a goalie, high above the table pointing downward, or from one side of the table pointing at a goalie at the far side. There might be enough differences…

View original post 578 more words

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s