
Special thanks to Infochimps who did the grunt work of preparing and uploading the dataset to Amazon. Much thanks to Amazon for hosting the Million Song Dataset (they host it for free). Thierry has worked tirelessly to create the and promote the MSD within the MIR community. The prime mover for the Million Song Dataset is Columbia researcher (and Echo Nest summer intern!) Thierry Bertin-Mahieux. The Million Song Dataset started as a collaborative project between The Echo Nest and LabROSA. We hope to make the MSD as important to MIR research as the human genome is to medical research.

Overtime, you can expect to see even more data attached to the MSD, from social tags for the million songs, to deep collaborative filtering and playlist data derived from millions of listeners. The dataset does not include any audio, only the derived features. Additional datasets have been attached to the Million Song Dataset, so far they contain lyrics and cover songs.

The core of the dataset is the feature analysis and metadata for one million songs, provided by The Echo Nest.
#MILLION SONG DATASE CODE#
The code in 'msdHDF5toCSV.py' is designed to convert the HDF5 files of the Million Song Dataset to a CSV by extracting various song properties. While commercial music sites such as Spotify, iTunes, Mog and Rdio boast collections of 5 to 10 million songs, a typical Music Information Retrieval experiment is conducted on a database of 10 thousand songs or less. Million Song Dataset HDF5 to CSV Converter, Alexis Greenstreet. This has had a major impact on the quality of research. Now you can access the MSD from your Amazon EC2 instances and start computing on this dataset in minutes.Ī long standing problem for researchers in the field of music information retrieval and music recommendation systems is that due to licensing and copyright restrictions it is very difficult for researchers to get access to large sets of music data.
#MILLION SONG DATASE HOW TO#
No longer do you have to worry about how you are going to download and store half a terabyte of music data, let alone figure out how to process it all on your two year old MacBook. Some big news! The Million Song Dataset is now a available as a Public Data Set on Amazon. Million Song Dataset now on Amazon August 15, 2011
