Dmitry Bogdanov | Pompeu Fabra University (original) (raw)

Papers by Dmitry Bogdanov

Recent advances in web-based machine learning (ML) tools empower a wide range of application deve... more Recent advances in web-based machine learning (ML) tools empower a wide range of application developers in both industrial and creative contexts. The availability of pretrained ML models and JavaScript (JS) APIs in frameworks like TensorFlow.js enabled developers to use AI technologies without demanding domain expertise. Nevertheless, there is a lack of pre-trained models in web audio compared to other domains, such as text and image analysis. Motivated by this, we present a collection of open pre-trained TensorFlow.js models for music-related tasks on the Web. Our models currently allow for different types of music classification (e.g., genres, moods, danceability, voice or instrumentation), tempo estimation, and music feature embeddings. To facilitate their use, we provide a dedicated JS add-on module essentia.js-model within the Essentia.js library for audio and music analysis. It has a simple API, enabling end-to-end analysis from audio input to prediction results on web browser...

Transactions of the International Society for Music Information Retrieval, 2021

Open-source software libraries have a significant impact on the development of Audio Signal Proce... more Open-source software libraries have a significant impact on the development of Audio Signal Processing and Music Information Retrieval (MIR) systems. Despite the abundance of such tools, there is a lack of an extensive and easy-to-use reference library for audio feature extraction on Web clients. In this article, we present Essentia.js, an open-source JavaScript (JS) library for audio and music analysis on both web clients and JS engines. Along with the Web Audio API, it can be used for both offline and real-time audio feature extraction on web browsers. Essentia.js is modular, lightweight, and easy-touse, deploy, maintain, and integrate into the existing plethora of JS libraries and web technologies. It is powered by a WebAssembly back end cross-compiled from the Essentia C++ library, which facilitates a JS interface to a wide range of low-level and high-level audio features, including signal processing MIR algorithms as well as pre-trained TensorFlow.js machine learning models. It also provides a higherlevel JS API and add-on MIR utility modules along with extensive documentation, usage examples, and tutorials. We benchmark the proposed library on two popular web browsers and the Node.js engine, and four devices, including mobile Android and iOS, comparing it to the native performance of Essentia and the Meyda JS library.

Essentia.js, a port of the popular Open-Source C++ library Essentia, comes to the web w... more Essentia.js, a port of the popular Open-Source C++ library Essentia, comes to the web world to become the reference library to use together with the Web Audio API. Whilst there are many libraries for audio analysis and feature extraction in native computing languages, this Master thesis exposes the need of such a library in JavaScript, and proposes Essentia.js as the best option to cover those needs. The Music Information Retrieval community needs this tool to be able to develop software and research using web technologies, to continue evolving and stay in the state of the art. Having compiled the C++ library into a JavaScript audio analysis library, a study on the efficiency of the library is carried out by benchmarking the execution of algorithms available in Essentia.js and comparing them to their equivalents from Meyda.js, a library written in JavaScript. Also,...

Recent advances in deep learning accelerated the development of content-based automatic music tag... more Recent advances in deep learning accelerated the development of content-based automatic music tagging systems. Music information retrieval (MIR) researchers proposed various architecture designs, mainly based on convolutional neural networks (CNNs), that achieve state-of-the-art results in this multi-label binary classification task. However, due to the differences in experimental setups followed by researchers, such as using different dataset splits and software versions for evaluation, it is difficult to compare the proposed architectures directly with each other. To facilitate further research, in this paper we conduct a consistent evaluation of different music tagging models on three datasets (MagnaTagATune, Million Song Dataset, and MTGJamendo) and provide reference results using common evaluation metrics (ROC-AUC and PR-AUC). Furthermore, all the models are evaluated with perturbed inputs to investigate the generalization capabilities concerning time stretch, pitch shift, dyna...

Many studies in music classification are concerned with obtaining the highest possible cross-vali... more Many studies in music classification are concerned with obtaining the highest possible cross-validation result. However, some studies have noted that cross-validation may be prone to biases and that additional evaluations based on independent out-of-sample data are desirable. In this paper we present a methodology and software tools for cross-collection evaluation for music classification tasks. The tools allow users to conduct large-scale evaluations of classifier models trained within the AcousticBrainz platform, given an independent source of ground-truth annotations, and its mapping with the classes used for model training. To demonstrate the application of this methodology we evaluate five models trained on genre datasets commonly used by researchers for genre classification, and use collaborative tags from Last.fm as an independent source of ground truth. We study a number of evaluation strategies using our tools on validation sets from 240,000 to 1,740,000 music recordings and discuss the results.

While a vast amount of editorial metadata is being actively gathered and used by music collectors... more While a vast amount of editorial metadata is being actively gathered and used by music collectors and enthusiasts, it is often neglected by music information retrieval and musicology researchers. In this paper we propose to explore Discogs, one of the largest databases of such data available in the public domain. Our main goal is to show how largescale analysis of its editorial metadata can raise questions and serve as a tool for musicological research on a number of example studies. The metadata that we use describes music releases, such as albums or EPs. It includes information about artists, tracks and their durations, genre and style, format (such as vinyl, CD, or digital files), year and country of each release. Using this data we study correlations between different genre and style labels, assess their specificity and analyze typical track durations. We estimate trends in prevalence of different genres, styles, and formats across different time periods. In our analysis of styles we use electronic music as an example. Our contribution also includes the tools we developed for our analysis and the generated datasets that can be re-used by MIR researchers and musicologists. 1 https://discogs.com 2 Discogs mission statement is "to build the biggest and most comprehensive music database and marketplace". 3 https://data.discogs.com/ c Dmitry Bogdanov, Xavier Serra. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Dmitry Bogdanov, Xavier Serra. "Quantifying music trends and facts using editorial metadata from the Discogs database",

Freesound Loop Dataset This dataset contains 9,455 loops from Freeso... more Freesound Loop Dataset This dataset contains 9,455 loops from Freesound.org and the corresponding annotations. These loops have tempo, key, genre and instrumentation annotation. Dataset Construction To collect this dataset, the following steps were performed: Freesound was queried with "loop" and "bpm", so as to collect loops which have a beats-per-minute(BPM) annotations. The sounds were analysed with AudioCommons extractor, so as to obtain key information. The textual metadata of each sound was analysed, to obtain the BPM proposed by the user, and to obtain genre information. Annotators used a web interface to annotate around 3,000 loops. Dataset Organisation The dataset contains two folders and two files in the root directory: 'FSL10K' encloses the audio files and their metadata and analysis. The audios are in the 'audio' folder and are named '<freesound_sound_id>.wav'. The AudioCommons analysis of the loops is present in the 'ac_analysis' directory, while the Essentia analysis of the loops obtained through the Freesound API is on the 'fs_analysis' directory. The textual metadata for each audio can be found in the 'metadata.json'. Finally, the audio analysis provided by the algorithms which were benchmarked in the paper is on the 'benchmark' directory. 'annotations' holds the expert provided annotation for the sounds in the dataset. The annotations are separated in a folder for each annotator and each annotation is stored as a .json file, named 'sound-<freesound_sound_id>.json', with a key for each of the features extracted. Licenses All the sounds have some kind of Creative Commons license. The license of each sound in the dataset can be obtained from the 'FSL10K/metadata.json' file Authors and Contact This dataset was developed by António Ramires et. al. Any questions related to this dataset please contact: António Ramires antonio.ramires@upf.edu aframires@gmail.com References Please cite this pa [...]

Open-source software libraries for audio/music analysis and feature extraction have a significant... more Open-source software libraries for audio/music analysis and feature extraction have a significant impact on the development of Audio Signal Processing and Music Information Retrieval (MIR) systems. Despite the abundance of such tools on the native computing platforms, there is a lack of an extensive and easy-to-use reference library for audio feature extraction on the Web. In this paper, we present Essentia.js, an open-source JavaScript (JS) library for audio and music analysis on both web clients and JS-based servers. Along with the Web Audio API, it can be used for efficient and robust real-time audio feature extraction on the web browsers. Essentia.js is modular, lightweight, and easy-to-use, deploy, maintain, and integrate into the existing plethora of JS libraries and Web technologies. It is powered by a WebAssembly back-end of the Essentia C++ library, which facilitates a JS interface to a wide range of low-level and high-level audio features. It also provides a higher-level J...

2020 28th European Signal Processing Conference (EUSIPCO), 2021

Automatic tagging of music is an important research topic in Music Information Retrieval and audi... more Automatic tagging of music is an important research topic in Music Information Retrieval and audio analysis algorithms proposed for this task have achieved improvements with advances in deep learning. In particular, many state-of-theart systems use Convolutional Neural Networks and operate on mel-spectrogram representations of the audio. In this paper, we compare commonly used mel-spectrogram representations and evaluate model performances that can be achieved by reducing the input size in terms of both lesser amount of frequency bands and larger frame rates. We use the MagnaTagaTune dataset for comprehensive performance comparisons and then compare selected configurations on the larger Million Song Dataset. The results of this study can serve researchers and practitioners in their trade-off decision between accuracy of the models, data storage size and training and inference times. Index Terms-music auto-tagging, audio classification, convolutional neural networks hop size max-pooling size (time) 12 KHz 16 KHz ×1 4,

Technology and music have a centuries old history of coexistence: from luthiers to music informat... more Technology and music have a centuries old history of coexistence: from luthiers to music information research. The emergence of machine learning for artificial intelligence in music technology has the potential to change the way music is experienced, learned, played and listened. This raises concerns related to its fair and transparent use, avoiding discrimination, designing sustainable experimental frameworks, and being aware of the biases the algorithms and datasets have. The first edition of the Workshop Designing Human-Centric Music Information Research systems aims at bringing together people interested in discussing the ethical implications of our technologies and proposing robust ways to assess our system for discrimination, sustainability, and transparency. We strongly believe that research on fairness, accountability, transparency advances through multi-disciplinary research. Thus, this first edition hosts two keynotes talks which bring a refreshing perspective from two dif...

This paper introduces the AcousticBrainz Genre Dataset, a large-scale collection of hierarchical ... more This paper introduces the AcousticBrainz Genre Dataset, a large-scale collection of hierarchical multi-label genre annotations from different metadata sources. It allows researchers to explore how the same music pieces are annotated differently by different communities following their own genre taxonomies, and how this could be addressed by genre recognition systems. Genre labels for the dataset are sourced from both expert annotations and crowds, permitting comparisons between strict hierarchies and folksonomies. Music features are available via the AcousticBrainz database. To guide research, we suggest a concrete research task and provide a baseline as well as an evaluation method. This task may serve as an example of the development and validation of automatic annotation algorithms on complementary datasets with different taxonomies and coverage. With this dataset, we hope to contribute to developments in content-based music genre recognition as well as cross-disciplinary studies...

This dataset contains the outputs of running the second prototype of the Audio Commons Audio Extr... more This dataset contains the outputs of running the second prototype of the Audio Commons Audio Extractor (ACExtractorV2) over 292k clips of the Freesound collection. This version of the audio extractor is described in Deliverable D4.7 of the AudioCommons project, and includes several music properties such as pitch, key and tempo (along with their confidence measures) which can be applied to music samples and music loops. It also includes prototype versions of the timbral models described in Deliverable 5.6. The preset dataset is structured as a single JSON file with a dictionary in which keys correspond to Freesound sound IDs. For each sound, the full output of the Audio Commons Audio Extractor is provided as another dictionary with the following keys: booming, note_midi, note_confidence, brightness, log_attack_time, sharpness, tonality_confidence, single_event, tempo, roughness, dynamic_range, dept...

Openly available datasets are a key factor in the advancement of data-driven research approaches,... more Openly available datasets are a key factor in the advancement of data-driven research approaches, including many of the ones used in sound and music computing. In the last few years, quite a number of new audio datasets have been made available but there are still major shortcomings in many of them to have a significant research impact. Among the common shortcomings are the lack of transparency in their creation and the difficulty of making them completely open and sharable. They often do not include clear mechanisms to amend errors and many times they are not large enough for current machine learning needs. This paper introduces Freesound Datasets, an online platform for the collaborative creation of open audio datasets based on principles of transparency, openness, dynamic character, and sustainability. As a proof-of-concept, we present an early snapshot of a large-scale audio dataset built using this platform. It consists of audio samples from Freesound organised in a hierarchy based on the AudioSet Ontology. We believe that building and maintaining datasets following the outlined principles and using open tools and collaborative approaches like the ones presented here will have a significant impact in our research community.

Comunicacio presentada a: ML4MD Machine Learning for Music Discovery Workshop del congres ICML201... more Comunicacio presentada a: ML4MD Machine Learning for Music Discovery Workshop del congres ICML2019 celebrat el 15 de juny de 2019 a Long Beach, California.

ArXiv, 2019

Automatic tagging of music is an important research topic in Music Information Retrieval achieved... more Automatic tagging of music is an important research topic in Music Information Retrieval achieved improvements with advances in deep learning. In particular, many state-of-the-art systems use Convolutional Neural Networks and operate on mel-spectrogram representations of the audio. In this paper, we compare commonly used mel-spectrogram representations and evaluate model performances that can be achieved by reducing the input size in terms of both lesser amount of frequency bands and larger frame rates. We use the MagnaTagaTune dataset for comprehensive performance comparisons and then compare selected configurations on the larger Million Song Dataset. The results of this study can serve researchers and practitioners in their trade-off decision between accuracy of the models, data storage size and training and inference times.

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020

Essentia is a reference open-source C++/Python library for audio and music analysis. In this work... more Essentia is a reference open-source C++/Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia, allow predictions with pre-trained deep learning models, and are designed to offer flexibility of use, easy extensibility, and real-time inference. To show the potential of this new interface with TensorFlow, we provide a number of pre-trained state-of-the-art music tagging and classification CNN models. We run an extensive evaluation of the developed models. In particular, we assess the generalization capabilities in a crosscollection evaluation utilizing both external tag datasets as well as manual annotations tailored to the taxonomies of our models.

Comunicacio presentada al MediaEval 2018 Workshop celebrat a Sophia Antipolis (Franca) del 29 al ... more Comunicacio presentada al MediaEval 2018 Workshop celebrat a Sophia Antipolis (Franca) del 29 al 31 d'octubre de 2018.

This paper introduces the AcousticBrainz Genre Dataset, a large-scale collection of hierarchical ... more This paper introduces the AcousticBrainz Genre Dataset, a large-scale collection of hierarchical multi-label genre annotations from different metadata sources. It allows researchers to explore how the same music pieces are annotated differently by different communities following their own genre taxonomies, and how this could be addressed by genre recognition systems. Genre labels for the dataset are sourced from both expert annotations and crowds, permitting comparisons between strict hierarchies and folksonomies. Music features are available via the Acoustic- Brainz database. To guide research, we suggest a concrete research task and provide a baseline as well as an evaluation method. This task may serve as an example of the development and validation of automatic annotation algorithms on complementary datasets with different taxonomies and coverage. With this dataset, we hope to contribute to developments in content-based music genre recognition as well as cross-disciplinary studi...

Comunicacio presentada al Detection and Classification of Acoustic Scenes and Events 2017 Worksho... more Comunicacio presentada al Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), celebrat el dia 16 de novembre de 2017 a Munic, Alemanya.

Transactions of the International Society for Music Information Retrieval, 2021

2020 28th European Signal Processing Conference (EUSIPCO), 2021

ArXiv, 2019

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020

This paper introduces the AcousticBrainz Genre Dataset, a large-scale collection of hierarchical ... more This paper introduces the AcousticBrainz Genre Dataset, a large-scale collection of hierarchical multi-label genre annotations from different metadata sources. It allows researchers to explore how the same music pieces are annotated differently by different communities following their own genre taxonomies, and how this could be addressed by genre recognition systems. Genre labels for the dataset are sourced from both expert annotations and crowds, permitting comparisons between strict hierarchies and folksonomies. Music features are available via the Acoustic- Brainz database. To guide research, we suggest a concrete research task and provide a baseline as well as an evaluation method. This task may serve as an example of the development and validation of automatic annotation algorithms on complementary datasets with different taxonomies and coverage. With this dataset, we hope to contribute to developments in content-based music genre recognition as well as cross-disciplinary studi...