The MTG-Jamendo Dataset for Automatic Music Tagging (original) (raw)

Melon Playlist Dataset: A Public Dataset for Audio-Based Playlist Generation and Music Tagging

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

One of the main limitations in the field of audio signal processing is the lack of large public datasets with audio representations and high-quality annotations due to restrictions of copyrighted commercial music. We present Melon Playlist Dataset, a public dataset of mel-spectrograms for 649,091 tracks and 148,826 associated playlists annotated by 30,652 different tags. All the data is gathered from Melon, a popular Korean streaming service. The dataset is suitable for music information retrieval tasks, in particular, auto-tagging and automatic playlist continuation. Even though the latter can be addressed by collaborative filtering approaches, audio provides opportunities for research on track suggestions and building systems resistant to the cold-start problem, for which we provide a baseline. Moreover, the playlists and the annotations included in the Melon Playlist Dataset make it suitable for metric learning and representation learning.

Autotagging music using supervised machine learning

Proceedings of the International …, 2007

Social tags are an important component of "Web2.0" music recommendation websites. In this paper we propose a method for predicting social tags using audio features and supervised learning. These automatically-generated tags (or "autotags") can furnish information about music that is untagged or poorly tagged. The tags can also serve to smooth the tag space from which similarities and recommendations are made by providing a set of comparable baseline tags for all tracks in a recommender system.

Three current issues in music autotagging

2011

The purpose of this paper is to address several aspects of music autotagging. We start by presenting autotagging experiments conducted with two different systems and show performances on a par with a method representative of the state-of-the-art. Beyond that, we illustrate via systematic experiments the importance of a number of issues relevant to autotagging, yet seldom reported in the literature. First, we show that the evaluation of autotagging techniques is fragile in the sense that small alterations to the set of tags to be learned, or in the set of music pieces may lead to dramatically different results. Hence we stress a set of methodological recommendations regarding data and evaluation metrics. Second, we conduct experiments on the generality of autotagging models, showing that a number of different methods at a similar performance level to the state-of-the-art fail to learn tag models able to generalize to datasets from different origins. Third we show that current performance level of a direct mapping between audio features and tags still appears insufficient to enable the possibility of exploiting natural tag correlations as a second stage to improve performance.

Improving music auto-tagging by intra-song instance bagging

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014

Bagging is one the most classic ensemble learning techniques in the machine learning literature. The idea is to generate multiple subsets of the training data via bootstrapping (random sampling with replacement), and then aggregate the output of the models trained from each subset via voting or averaging. As music is a temporal signal, we propose and study two bagging methods in this paper: the inter-song instance bagging that bootstraps song-level features, and the intra-song instance bagging that draws bootstrapping samples directly from short-time features for each training song. In particular, we focus on the latter method, as it better exploits the temporal information of music signals. The bagging methods result in surprisingly effective models for music auto-tagging: incorporating the idea to a simple linear support vector machine (SVM) based system yields accuracies that are comparable or even superior to stateof-the-art, possibly more sophisticated methods for three different datasets. As the bagging method is a meta algorithm, it holds the promise of improving other MIR systems.

Evaluation of CNN-based automatic music tagging models

2020

Recent advances in deep learning accelerated the development of content-based automatic music tagging systems. Music information retrieval (MIR) researchers proposed various architecture designs, mainly based on convolutional neural networks (CNNs), that achieve state-of-the-art results in this multi-label binary classification task. However, due to the differences in experimental setups followed by researchers, such as using different dataset splits and software versions for evaluation, it is difficult to compare the proposed architectures directly with each other. To facilitate further research, in this paper we conduct a consistent evaluation of different music tagging models on three datasets (MagnaTagATune, Million Song Dataset, and MTGJamendo) and provide reference results using common evaluation metrics (ROC-AUC and PR-AUC). Furthermore, all the models are evaluated with perturbed inputs to investigate the generalization capabilities concerning time stretch, pitch shift, dyna...

Automatic Tagging of Songs Using Machine Learning

International Journal of Database Theory and Application, 2016

In this research work automatic tagging of songs using machine learning has been performed so that searching could be made effective while selecting songs. The goal of this research paper is to propose a system that will automatically recognize the genre of the tracks and tag them respectively by using different parameters obtained by acoustic analysis. The research work utilizes different combinations of algorithms and music parameters to accurately classify tracks into genres.

Musical Genre Tag Classification With Curated and Crowdsourced Datasets

2012

Analyzing music audio files based on genres and other qualitative tags is an active field of research in machine learning. When paired with particular classification algorithms, most notably support vector machines (SVMs) and k-nearest-neighbor classifiers (KNNs), certain features, including Mel-Frequency Cepstral Coefficients (MFCCs), Chroma attributes and other spectral properties, have been shown to be effective features for classifying music by genre. In this paper we apply these methods and features across two datasets (GTZAN and the Million Song Dataset) with four different tag sources (GTZAN, The Echo Nest, MusicBrainz, and Last.fm). Two of these tag sources are professionally curated (GTZAN and MusicBrainz) while the other two are crowdsourced—that is, unmonitored users create the tags for each track. Two of the datasets had features on a track-by-track basis (GTZAN and Last.fm) while the other two are classified by artist. By exploring the cross-validation balanced accuracy...

Easy as CBA: A simple probabilistic model for tagging music

2009

ABSTRACT Many songs in large music databases are not labeled with semantic tags that could help users sort out the songs they want to listen to from those they do not. If the words that apply to a song can be predicted from audio, then those predictions can be used both to automatically annotate a song with tags, allowing users to get a sense of what qualities characterize a song at a glance. Automatic tag prediction can also drive retrieval by allowing users to search for the songs most strongly characterized by a particular word.

MULTIVARIATE AUTOREGRESSIVE MIXTURE MODELS FOR MUSIC AUTO-TAGGING

cosmal.ucsd.edu

We propose the multivariate autoregressive model for content based music auto-tagging. At the song level our approach leverages the multivariate autoregressive mixture (ARM) model, a generative time-series model for audio, which assumes each feature vector in an audio fragment is a linear function of previous feature vectors. To tackle tagmodel estimation, we propose an efficient hierarchical EM algorithm for ARMs (HEM-ARM), which summarizes the acoustic information common to the ARMs modeling the individual songs associated with a tag. We compare the ARM model with the recently proposed dynamic texture mixture (DTM) model. We hence investigate the relative merits of different modeling choices for music time-series: i) the flexibility of selecting higher memory order in ARM, ii) the capability of DTM to learn specific frequency basis for each particular tag and iii) the effect of the hidden layer of the DT versus the time efficiency of learning and inference with fully observable AR components. Finally, we experiment with a support vector machine (SVM) approach that classifies songs based on a kernel calculated on the frequency responses of the corresponding song ARMs. We show that the proposed approach outperforms SVMs trained on a different kernel function, based on a competing generative model.

The million song dataset

Proceedings of the 11th …, 2011

We introduce the Million Song Dataset, a freely-available collection of audio features and metadata for a million contemporary popular music tracks. We describe its creation process, its content, and its possible uses. Attractive features of the Million Song Database include the range of existing resources to which it is linked, and the fact that it is the largest current research dataset in our field. As an illustration, we present year prediction as an example application, a task that has, until now, been difficult to study owing to the absence of a large set of suitable data. We show positive results on year prediction, and discuss more generally the future development of the dataset.