Freesound Datasets: A Platform For The Creation Of Open Audio Datasets (original) (raw)

Acousticbrainz: A Community Platform For Gathering Music Information Obtained From Audio

2015

We introduce the AcousticBrainz project, an open platform for gathering music information. At its core, Acous-ticBrainz is a database of music descriptors computed from audio recordings using a number of state-of-the-art Music Information Retrieval algorithms. Users run a supplied feature extractor on audio files and upload the analysis results to the AcousticBrainz server. All submissions include a MusicBrainz identifier allowing them to be linked to various sources of editorial information. The feature extractor is based on the open source Essentia audio analysis library. From the data submitted by the community, we run classifiers aimed at adding musically relevant semantic information. These classifiers can be developed by the community using tools available on the AcousticBrainz website. All data in AcousticBrainz is freely available and can be accessed through the website or API. For AcousticBrainz to be successful we need to have an active community that contributes to and uses this platform, and it is this community that will define the actual uses and applications of its data.

Sound recycling from public databases: Another BigData Approach to Sound Collections

Proceedings of the 12th International Audio Mostly Conference on Augmented and Participatory Sound and Music Experiences, 2017

Discovering new sounds from large databases or Internet is a tedious task. Standard search tools and manual exploration fails to manage the actual amount of information available. This paper presents a new approach to the problem which takes advantage of grown technologies like Big Data and Machine Learning, keeping in mind compositional concepts and focusing on artistic performances. Among several different distributed systems useful for music experimentation, a new workflow is proposed based on analysis techniques from Music Information Retrieval (MIR) combined with massive online databases, dynamic user interfaces, physical controllers and real-time synthesis. Based on Free Software tools and standard communication protocols to classify, cluster and segment sound. The control architecture allows multiple clients request the API services concurrently enabling collaborative work. The resulting system can retrieve well defined or pseudo-aleatory audio samples from the web, mix and transform them in real-time during a live-coding performance, play like another instrument in a band, as a solo artist combined with visual feedback or working alone as automated multimedia installation.

A Large Publicly Accassible Prototype Audio Database for Music Research

This paper introduces Codaich, a large and diverse publicly accessible database of musical recordings for use in music information retrieval (MIR) research. The issues that must be dealt with when constructing such a database are dis- cussed, as are ways of addressing these problems. It is sug- gested that copyright restrictions may be overcome by al- lowing users to make customized feature extraction queries rather than allowing direct access to recordings themselves. The jMusicMetaManager software is introduced as a tool for improving metadata associated with recordings by auto- matically detecting inconsistencies and redundancies.

The freesound loop dataset and annotation tool

2020

Music loops are essential ingredients in electronic music production, and there is a high demand for pre-recorded loops in a variety of styles. Several commercial and community databases have been created to meet this demand, but most of them are not suitable for research due to their strict licensing. In this paper, we present the Freesound Loop Dataset (FSLD), a new large-scale dataset of music loops annotated by experts. The loops originate from Freesound, a community database of audio recordings released under Creative Commons licenses, so the audio in our dataset may be redistributed. The annotations include instrument, meter, key and genre tags. We describe the methodology used to assemble and annotate the data, and report on the distribution of tags in the data and inter-annotator agreement. We also present to the community an online loop annotator tool that we developed. To illustrate the usefulness of FSLD, we present short case studies on using it to estimate tempo and key...

WASABI: a Two Million Song Database Project with Audio and Cultural Metadata plus WebAudio enhanced Client Applications

2017

This paper presents the WASABI project, started in 2017, which aims at (1) the construction of a 2 million song knowledge base that combines metadata collected from music databases on the Web, metadata resulting from the analysis of song lyrics, and metadata resulting from the audio analysis, and (2) the development of semantic applications with high added value to exploit this semantic database. A preliminary version of the WASABI database is already on-line 1 and will be enriched all along the project. The main originality of this project is the collaboration between the algorithms that will extract semantic metadata from the web and from song lyrics with the algorithms that will work on the audio. The following WebAudio enhanced applications will be associated with each song in the database: an online mixing table, guitar amp simulations with a virtual pedal-board, audio analysis visualization tools, annotation tools, a similarity search tool that works by uploading audio extract...

Freesound 2: An Improved Platform for Sharing Audio Clips

2011

Freesound. org is an online collaborative sound database where people from different disciplines share recorded sound clips under Creative Commons licenses. It was started in 2005 and it is being further developed by the Music Technology Group (MTG) of the Universitat Pompeu Fabra. Freesound's initial goal was to give support to sound researchers, who often have trouble finding large royalty-free sound databases to test their algorithms, and to sound artists, who use pre-recorded sounds in their pieces.

The million song dataset

Proceedings of the 11th …, 2011

We introduce the Million Song Dataset, a freely-available collection of audio features and metadata for a million contemporary popular music tracks. We describe its creation process, its content, and its possible uses. Attractive features of the Million Song Database include the range of existing resources to which it is linked, and the fact that it is the largest current research dataset in our field. As an illustration, we present year prediction as an example application, a task that has, until now, been difficult to study owing to the absence of a large set of suitable data. We show positive results on year prediction, and discuss more generally the future development of the dataset.

Can't trust the feeling? How open data reveals unexpected behavior of high-level music descriptors

2020

Copyright restrictions prevent the widespread sharing of commercial music audio. Therefore, the availability of resharable pre-computed music audio features has become critical. In line with this, the AcousticBrainz platform offers a dynamically growing, open and community-contributed large-scale resource of locally computed low-level and high-level music descriptors. Beyond enabling research reuse, the availability of such an open resource allows for renewed reflection on the music descriptors we have at hand: while they were validated to perform successfully under lab conditions, they now are being run 'in the wild'. Their response to these more ecological conditions can shed light on the degree to which they truly had construct validity. In this work, we seek to gain further understanding into this, by analyzing high-level classifier-based music descriptor output in AcousticBrainz. While no hard ground truth is available on what the true value of these descriptors should ...

MTG-DB: A Repository for Music Audio Processing

2004

Content-based audio processing researchers need audio and its related metadata to develop and test algorithms. We present a common repository of audio, metadata, ontologies and algorithms. We detail the hardware implementation, in the form of massive storage and computation cluster, the software and databases design and the ontology management of the current system. The repository, as far as copyright licenses allow, is open to researchers outside the Music Technology Group to test and evaluate their algorithms.

Big Data for Musicology

Proceedings of the 1st International Workshop on Digital Libraries for Musicology - DLfM '14, 2014

Digital music libraries and collections are growing quickly and are increasingly made available for research. We argue that the use of large data collections will enable a better understanding of music performance and music in general, which will benefit areas such as music search and recommendation, music archiving and indexing, music production and education. However, to achieve these goals it is necessary to develop new musicological research methods, to create and adapt the necessary technological infrastructure, and to find ways of working with legal limitations. Most of the necessary basic technologies exist, but they need to be brought together and applied to musicology. We aim to address these challenges in the Digital Music Lab project, and we feel that with suitable methods and technology Big Music Data can provide new opportunities to musicology.