So I was looking into ways to process and build machine learning models to work with audio data. Here is a beautiful article by humblesoftwaredev to do the same. 🙂 Do read.
IPython notebook: CNN on Foosball sounds.ipynb
Trained CNN model using TensorFlow: model.ckpt
Pickled Pandas dataframe: full_dataset_44100.pickle
I setup mics at the foosball table and recorded a few hours of foosball games. The audio files were labelled by hand and then segmented into one-second clips of goals / other noises. Mel spectrograms were created from the clips and about 200 samples were created and used for training, testing, and validation, resulting in 5% error on test data.
Data collection and labelling
I used a Zoom H5 XY stereo mic, a Shure SM57, and a few other mics for recording. Each mic had its own characteristic and they were placed at different locations around the table, for instance, pointing close to a goalie, high above the table pointing downward, or from one side of the table pointing at a goalie at the far side. There might be enough differences…
View original post 578 more words