Music Signal Processing Research Papers (original) (raw)
The goal of live song identification is to recognize a song based on a short, noisy cell phone recording of a live performance. We propose a system for known-artist live song identification and provide empirical evidence of its... more
The goal of live song identification is to recognize a song based on a short, noisy cell phone recording of a live performance. We propose a system for known-artist live song identification and provide empirical evidence of its feasibility. The proposed system represents audio as a sequence of hashprints, which are binary fingerprints that are derived from applying a set of spectro-temporal filters to a spectro-gram representation. The spectro-temporal filters can be learned in an unsupervised manner on a small amount of data, and can thus tailor its representation to each artist. Matching is performed using a cross-correlation approach with downsampling and rescoring. We evaluate our approach on the Gracenote live song identification benchmark data set, and compare our results to five other base-line systems. Compared to the previous state-of-the-art, the proposed system improves the mean reciprocal rank from .68 to .79, while simultaneously reducing the average runtime per query from 10 seconds down to 0.9 seconds.
As a result of massive digitization efforts and the world wide web, there is an exploding amount of available digital data describing and representing music at various semantic levels and in diverse formats. For example, in the case of... more
As a result of massive digitization efforts and the world wide web, there is an exploding amount of available digital data describing and representing music at various semantic levels and in diverse formats. For example, in the case of the Beatles songs, there are numerous recordings including an increasing number of cover songs and arrangements as well as MIDI data and other symbolic music representations. The general goal of music synchronization is to align the multiple information sources related to a given piece of music. This becomes a difficult problem when the various representations reveal significant differences in structure and polyphony, while exhibiting various types of artifacts. In this paper, we address the issue of how music synchronization techniques are useful for automatically revealing critical passages with significant difference between the two versions to be aligned. Using the corpus of the Beatles songs as test bed, we analyze the kind of differences occurring in audio and MIDI versions available for the songs.
Uma sinopse sobre séries ortogonais generalizadas (e.g. Legendre-Fourier), com ênfase na representação trigonométrica de Fourier, é introduzida. A apresentação envolve o fenômeno de Gibbs, e critérios para a convergência de séries... more
Uma sinopse sobre séries ortogonais generalizadas (e.g. Legendre-Fourier), com ênfase na representação trigonométrica de Fourier, é introduzida. A apresentação envolve o fenômeno de Gibbs, e critérios para a convergência de séries (condições de Dirichlet, teorema de Fourier, teorema de Fejér). Implicações do “Reino de Fourier” na Engenharia Acústica: consonâncias & dissonâncias, instrumentos musicais. Apresenta-se a teoria dos tapers de Tukey, a janela de Lanczos e o uso da série de Fourier para modelar fractais determinísticos. A passagem para o contínuo conduz à transformada de Fourier, cujas propriedades são revisadas. O princípio da incerteza de Gabor-Heisenberg é conectado à teoria de Fourier. Cálculos computacionais do espectro conduzem à transformada discreta de Fourier (DFT) e mecanismos para o cálculo eficiente do espectro (algoritmos rápidos). Os resultados de Heidman sobre complexidade multiplicativa para a DFT são apresentados. A análise espectral clássica evolui: a análise moderna com base em wavelets funciona ligada à abordagem Fourier. O teorema da amostragem é discutido, com uma demonstração do tipo “viva Fourier”, assim como o teorema 2BT sobre dimensionalidade de sinais. Aplicações modernas na descontaminação de sinais são argumentadas sob enfoque pragmático. A adaptação da análise de Fourier para sinais estocásticos conduz às séries estocásticas de Fourier, e expansões de Kahunen-Loève. Até a modelagem não-linear para sistemas com base em séries de Volterra é apresentada. Por fim, o reino de Fourier conquista definitivamente o mundo finito e digital, migrando para a transformada de Fourier de corpo finito (transformada de Galois-Fourier).
Music is a ubiquitous and vital part of the lives of billions of people worldwide. Musical creations and performances are among the most complex and intricate of our cultural artifacts, and the emotional power of music can touch us in... more
Music is a ubiquitous and vital part of the lives of billions of people worldwide. Musical creations and performances are among the most complex and intricate of our cultural artifacts, and the emotional power of music can touch us in surprising and profound ways. In view of the rapid and sustained growth of digital music sharing and distribution, the development of computational methods to help users find and organize music information has become an important field of research in both industry and academia. The Dagstuhl Seminar 16092 was devoted to a research area known as music structure analysis, where the general objective is to uncover patterns and relationships that govern the organization of notes, events, and sounds in music. Gathering researchers from different fields, we critically reviewed the state of the art for computational approaches to music structure analysis in order to identify the main limitations of existing methodologies. This triggered interdisciplinary discussions that leveraged insights from fields as disparate as psychology, music theory, composition, signal processing, machine learning, and information sciences to address the specific challenges of understanding structural information in music. Finally, we explored novel applications of these technologies in music and multimedia retrieval, content creation, musicology, education, and human-computer interaction. In this report, we give an overview of the various contributions and results of the seminar. We start with an executive summary, which describes the main topics, goals, and group activities. Then, we present a list of abstracts giving a more detailed overview of the participants' contributions as well as of the ideas and results discussed in the group meetings of our seminar. Seminar February 28–March 4, 2016 – http://www.dagstuhl.de/16092 1998 ACM Subject Classification H.5.5 Sound and Music Computing In this executive summary, we start with a short introduction to computational music structure analysis and then summarize the main topics and questions raised in this seminar.
A basic and one of the oldest socio-cognitive domains of the human species is music. Listening to music regularly helps keep the neurons and synapses more active. Neurological studies have identified that music is a valuable tool for... more
A basic and one of the oldest socio-cognitive domains of the human species is music. Listening to music regularly helps keep the neurons and synapses more active. Neurological studies have identified that music is a valuable tool for evaluating the brain system. It has been observed that different parts of the brain are involved in processing music. They include the auditory cortex, frontal cortex, cerebral cortex and even the motor cortex. The objective of this study is to analyze the effect of Hindustani music on brain activity during normal relaxing conditions using electroencephalography (EEG). Two male (age 20-23) healthy subjects without special musical education participated in the study. EEG signals were acquired at the frontal (F3/F4, F7/F8), temporal (T3/T4, T5/T6), central (C3/C4), parietal (P3/P4), and occipital (O1/O2) lobes of the brains listening to music in three conditions, namely rest before music, with music, and after withdrawal of the music. Frequency analysis was done for the alpha and delta brain rhythms. The finding shows that arousal based activities were enhanced while listening to Hindustani music of contrasting emotions (romantic/sorrow) for both the subjects. The important conclusion is that when the music stimuli is removed, discernible alpha and delta brain rhythms remain for some time, showing residual arousal. This is analogous to ‘Hysteresis’ where the system retains some ‘memory’ of the former stimulated state. This was further corroborated by a non linear analysis (Detrended Fluctuation Analysis) of the alpha rhythms.
This paper studies the emotion recognition from musical tracks in the 2-dimensional valence-arousal (V-A) emotional space. We propose a method based on convolutional (CNN) and recurrent neural networks (RNN), having significantly fewer... more
This paper studies the emotion recognition from musical tracks in the 2-dimensional valence-arousal (V-A) emotional space. We propose a method based on convolutional (CNN) and recurrent neural networks (RNN), having significantly fewer parameters compared with state-of-the-art (SOTA) method for the same task. We utilize one CNN layer followed by two branches of RNNs trained separately for arousal and valence. The method was evaluated using the "MediaEval2015 emotion in music'" dataset. We achieved an RMSE of 0.202 for arousal and 0.268 for valence, which is the best result reported on this dataset.
The rapidly growing corpus of digital audio material requires novel retrieval strategies for exploring large music collections. Traditional retrieval strategies rely on metadata that describe the actual audio content in words. In the case... more
The rapidly growing corpus of digital audio material requires novel retrieval strategies for exploring large music collections. Traditional retrieval strategies rely on metadata that describe the actual audio content in words. In the case that such textual descriptions are not available, one requires content-based retrieval strategies which only utilize the raw audio material. In this contribution, we discuss content-based retrieval strategies that follow the query-by-example paradigm: given an audio query, the task is to retrieve all documents that are somehow similar or related to the query from a music collection. Such strategies can be loosely classified according to their specificity, which refers to the degree of similarity between the query and the database documents. Here, high specificity refers to a strict notion of similarity, whereas low specificity to a rather vague one. Furthermore, we introduce a second classification principle based on gran-ularity, where one distinguishes between fragment-level and document-level retrieval. Using a classification scheme based on specificity and granularity, we identify various classes of retrieval scenarios, which comprise audio identification, audio matching, and version identification. For these three important classes, we give an overview of representative state-of-the-art approaches, which also illustrate the sometimes subtle but crucial differences between the retrieval scenarios. Finally, we give an outlook on a user-oriented retrieval system, which combines the various retrieval strategies in a unified framework. 1998 ACM Subject Classification H.5.5 Sound and Music Computing, J.5 Arts and Humanities– Music, H.5.1 Multimedia Information Systems, I.5 Pattern Recognition
Chroma-based audio features, which closely correlate to the aspect of harmony, are a well-established tool in processing and analyzing music data. There are many ways of computing and enhancing chroma features, which results in a large... more
Chroma-based audio features, which closely correlate to the aspect of harmony, are a well-established tool in processing and analyzing music data. There are many ways of computing and enhancing chroma features, which results in a large number of chroma variants with different properties. In this paper, we present a chroma toolbox [13], which contains MATLAB implementations for extracting various types of recently proposed pitch-based and chroma-based audio features. Providing the MATLAB implementations on a well-documented website under a GNU-GPL license, our aim is to foster research in music information retrieval. As another goal, we want to raise awareness that there is no single chroma variant that works best in all applications. To this end, we discuss two example applications showing that the final music analysis result may crucially depend on the initial feature design step.
In this paper, we present a digital library system for managing heterogeneous music collections.
The digital revolution has brought about a massive increase in the availability and distribution of music-related documents of various modalities comprising textual, audio, as well as visual material. Therefore, the development of... more
The digital revolution has brought about a massive increase in the availability and distribution of music-related documents of various modalities comprising textual, audio, as well as visual material. Therefore, the development of techniques and tools for organizing, structuring, retrieving, navigating, and presenting music-related data has become a major strand of research—the field is often referred to as Music Information Retrieval (MIR). Major challenges arise because of the richness and diversity of music in form and content leading to novel and exciting research problems. In this article, we give an overview of new developments in the MIR field with a focus on content-based music analysis tasks including audio retrieval, music synchronization, structure analysis, and performance analysis.
The development of automated methods for revealing the repetitive structure of a given music recording is of central importance in music information retrieval. In this paper , we present a novel scape plot representation that allows for... more
The development of automated methods for revealing the repetitive structure of a given music recording is of central importance in music information retrieval. In this paper , we present a novel scape plot representation that allows for visualizing repetitive structures of the entire music recording in a hierarchical, compact, and intuitive way. In a scape plot, each point corresponds to an audio segment identified by its center and length. As our main contribution , we assign to each point a color value so that two segment properties become apparent. Firstly, we use the lightness component of the color to indicate the repetitive-ness of the encoded segment, where we revert to a recently introduced fitness measure. Secondly, we use the hue component of the color to reveal the relations between different segments. To this end, we introduce a novel grouping procedure that automatically maps related segments to similar hue values. By discussing a number of popular and classical music examples, we illustrate the potential and visual appeal of our representation and also indicate limitations.
Background music is often used to generate a specific atmosphere or to draw our attention to specific events. For example in movies or computer games it is often the accompanying music that conveys the emotional state of a scene and plays... more
Background music is often used to generate a specific atmosphere or to draw our attention to specific events. For example in movies or computer games it is often the accompanying music that conveys the emotional state of a scene and plays an important role for immersing the viewer or player into the virtual environment. In view of home-made videos, slide shows, and other consumer-generated visual media streams, there is a need for computer-assisted tools that allow users to generate aesthetically appealing music tracks in an easy and intuitive way. In this contribution, we consider a data-driven scenario where the musical raw material is given in form of a database containing a variety of audio recordings. Then, for a given visual media stream, the task consists in identifying, manipulating, overlaying, concatenating, and blending suitable music clips to generate a music stream that satisfies certain constraints imposed by the visual data stream and by user specifications. It is our main goal to give an overview of various content-based music processing and retrieval techniques that become important in data-driven sound track generation. In particular, we sketch a general pipeline that highlights how the various techniques act together and come into play when generating musically plausible transitions between subsequent music clips.
A swarm of bees buzzing " Let it be " by the Beatles or the wind gently howling the romantic " Gute Nacht " by Schu-bert – these are examples of audio mosaics as we want to create them. Given a target and a source recording, the goal of... more
A swarm of bees buzzing " Let it be " by the Beatles or the wind gently howling the romantic " Gute Nacht " by Schu-bert – these are examples of audio mosaics as we want to create them. Given a target and a source recording, the goal of audio mosaicing is to generate a mosaic recording that conveys musical aspects (like melody and rhythm) of the target, using sound components taken from the source. In this work, we propose a novel approach for automatically generating audio mosaics with the objective to preserve the source's timbre in the mosaic. Inspired by algorithms for non-negative matrix factorization (NMF), our idea is to use update rules to learn an activation matrix that, when multiplied with the spectrogram of the source recording, resembles the spectrogram of the target recording. However, when applying the original NMF procedure , the resulting mosaic does not adequately reflect the source's timbre. As our main technical contribution, we propose an extended set of update rules for the iterative learning procedure that supports the development of sparse diagonal structures in the activation matrix. We show how these structures better retain the source's timbral characteristics in the resulting mosaic.
Sheet music and audio recordings represent and describe music on different semantic levels. Sheet music describes abstract high-level parameters such as notes, keys, measures, or repeats in a visual form. Because of its explicitness and... more
Sheet music and audio recordings represent and describe music on different semantic levels. Sheet music describes abstract high-level parameters such as notes, keys, measures, or repeats in a visual form. Because of its explicitness and compactness, most musicologists discuss and analyze the meaning of music on the basis of sheet music. On the contrary, most people enjoy music by listening to audio recordings, which represent music in an acoustic form. In particular, the nuances and subtleties of musical performances, which are generally not written down in the score, make the music come alive. In this paper, we address the problem of bridging the gap between the sheet music domain and the audio domain. In particular, we discuss aspects on music representations, music synchronization, and optical music recognition, while indicating various strategies and open research problems.
The general goal of music synchronization is to automatically align the multiple information sources such as audio recordings , MIDI files, or digitized sheet music related to a given musical work. In computing such alignments, one... more
The general goal of music synchronization is to automatically align the multiple information sources such as audio recordings , MIDI files, or digitized sheet music related to a given musical work. In computing such alignments, one typically has to face a delicate tradeoff between robustness and accuracy. In this paper, we introduce novel audio features that combine the high temporal accuracy of onset features with the robustness of chroma features. We show how previous synchronization methods can be extended to make use of these new features. We report on experiments based on polyphonic Western music demonstrating the improvements of our proposed synchronization framework.
In this paper, we present a novel set of tempo-related audio features for applications in audio retrieval. As opposed to existing feature sets commonly used in the retrieval domain which mainly focus on local spectral characteristics of... more
In this paper, we present a novel set of tempo-related audio features for applications in audio retrieval. As opposed to existing feature sets commonly used in the retrieval domain which mainly focus on local spectral characteristics of the audio signal, our features capture its local temporal behaviour w.r.t. tempo, rhythm, and meter. As a key component to obtaining a high level of feature robustness we introduce the cyclic beat spectrum (CBS) consisting of residual tempo classes which are constructed similarly to the well-known pitch chroma classes. We illustrate the use of the newly constructed features by applying them to robust timescale invariant audio identification.
This paper approaches the problem of annotating measure positions in Western classical music recordings. Such annotations can be useful for navigation, segmentation, and cross-version analysis of music in different types of... more
This paper approaches the problem of annotating measure positions in Western classical music recordings. Such annotations can be useful for navigation, segmentation, and cross-version analysis of music in different types of representations. In a case study based on Wagner's opera " Die Walküre " , we analyze two types of annotations. First, we report on an experiment where several human listeners generated annotations in a manual fashion. Second, we examine computer-generated annotations which were obtained by using score-to-audio alignment techniques. As one main contribution of this paper, we discuss the inconsistencies of the different annotations and study possible musical reasons for deviations. As another contribution, we propose a kernel-based method for automatically estimating confidences of the computed annotations which may serve as a first step towards improving the quality of this automatic method.
Locating boundaries between coherent and/or repetitive segments of a time series is a challenging problem pervading many scientific domains. In this paper we propose an unsupervised method for boundary detection, combining three basic... more
Locating boundaries between coherent and/or repetitive segments of a time series is a challenging problem pervading many scientific domains. In this paper we propose an unsupervised method for boundary detection, combining three basic principles: novelty, homogene-ity, and repetition. In particular, the method uses what we call structure features, a representation encapsulating both local and global properties of a time series. We demonstrate the usefulness of our approach in detecting music structure boundaries, a task that has received much attention in recent years and for which exist several benchmark datasets and publicly available annotations. We find our method to significantly outperform the best accuracies published so far. Importantly, our boundary approach is generic, thus being applicable to a wide range of time series beyond the music and audio domains.
In recent years, the processing of audio recordings by exploiting additional musical knowledge has turned out to be a promising research direction. In particular, additional note information as specified by a musical score or a MIDI file... more
In recent years, the processing of audio recordings by exploiting additional musical knowledge has turned out to be a promising research direction. In particular, additional note information as specified by a musical score or a MIDI file has been employed to support various audio processing tasks such as source separation, audio parameterization, performance analysis, or instrument equalization. In this contribution, we provide an overview of approaches for score-informed source separation and illustrate their potential by discussing innovative applications and interfaces. Additionally, to illustrate some basic principles behind these approaches, we demonstrate how score information can be integrated into the well-known non-negative matrix factorization (NMF) framework. Finally, we compare this approach to advanced methods based on parametric models. 1998 ACM Subject Classification H.5.5 Sound and Music Computing, J.5 Arts and Humanities– Music, H.5.1 Multimedia Information Systems, I.5 Pattern Recognition
In the field of Music Information Retrieval (MIR), the automated detection of the singing voice within a given music recording constitutes a challenging and important research problem. In this study, our goal is to find those segments... more
In the field of Music Information Retrieval (MIR), the automated detection of the singing voice within a given music recording constitutes a challenging and important research problem. In this study, our goal is to find those segments within a classical opera recording, where one or several singers are active. As our main contributions, we first propose a novel audio feature that extends a state-of-the-art feature set that has previously been applied to singing voice detection in popular music recordings. Second, we describe a simple bootstrapping procedure that helps to improve the results in the case that the test data is not reflected well by the training data. Third, we show that a cross-version approach can help to stabilize the results even further .
Rapid growth of digital music data in the Internet during the recent years has led to increase of user demands for search based on different types of meta data. One kind of meta data that we focused in this paper is the emotion or mood... more
Rapid growth of digital music data in the Internet
during the recent years has led to increase of user demands for
search based on different types of meta data. One kind of meta
data that we focused in this paper is the emotion or mood of
music. Music emotion recognition is a prevalent research topic
today. We collected a database including 280 pieces of popular
music with four basic emotions of Thayer's two Dimensional
model. We used a two level classifier the process of which could
be briefly summarized in three steps: 1) Extracting most suitable features from pieces of music in the database to describe each music song; 2) Applying feature selection approaches to decrease correlations between features; 3) Using SVM classifier in two level to train these features. Finally we increased accuracy rate from 72.14% with simple SVM to 87.27% with our hierarchical classifier.
This paper approaches the problem of annotating measure positions in Western classical music recordings. Such annotations can be useful for navigation, segmentation, and cross-version analysis of music in different types of... more
This paper approaches the problem of annotating measure positions in Western classical music recordings. Such annotations can be useful for navigation, segmentation, and cross-version analysis of music in different types of representations. In a case study based on Wagner's opera "Die Walküre", we analyze two types of annotations. First, we report on an experiment where several human listeners generated annotations in a manual fashion. Second, we examine computer-generated annotations which were obtained by using score-to-audio alignment techniques. As one main contribution of this paper, we discuss the inconsistencies of the different annotations and study possible musical reasons for deviations. As another contribution, we propose a kernel-based method for automatically estimating confidences of the computed annotations which may serve as a first step towards improving the quality of this automatic method.
As a result of massive digitization efforts and the world wide web, there is an exploding amount of available digital data describing and representing music at various semantic levels and in diverse formats. For example, in the case of... more
As a result of massive digitization efforts and the world wide web, there is an exploding amount of available digital data describing and representing music at various semantic levels and in diverse formats. For example, in the case of the Beatles songs, there are numerous recordings including an increasing number of cover songs and arrangements as well as MIDI data and other symbolic music representations. The general goal of music synchronization is to align the multiple information sources related to a given piece of music. This becomes a difficult problem when the various representations reveal significant differences in structure and polyphony, while exhibiting various types of artifacts. In this paper, we address the issue of how music synchronization techniques are useful for automatically revealing critical passages with significant difference between the two versions to be aligned. Using the corpus of the Beatles songs as test bed, we analyze the kind of differences occurri...
Abstract. As a result of massive digitization efforts and the world wide web, there is an exploding amount of available digital data describing and representing music at various semantic levels and in diverse formats. For example, in the... more
Abstract. As a result of massive digitization efforts and the world wide web, there is an exploding amount of available digital data describing and representing music at various semantic levels and in diverse formats. For example, in the case of the Beatles songs, there are numerous recordings including an increasing number of cover songs and arrangements as well as MIDI data and other symbolic music representations. The general goal of music synchronization is to align the multiple information sources related to a given piece of ...
conference about anarchism and Arquivo Edgar Leuenroth of Unicamp.
This work addresses the problem of multichannel source separation combining two powerful approaches, multi-channel spectral factorization with recent monophonic deep-learning (DL) based spectrum inference. Individual source spectra at... more
This work addresses the problem of multichannel source separation combining two powerful approaches, multi-channel spectral factorization with recent monophonic deep-learning (DL) based spectrum inference. Individual source spectra at different channels are estimated with a Masker-Denoiser Twin Network (MaD TwinNet), able to model long-term temporal patterns of a musical piece. The monophonic source spectro-grams are used within a spatial covariance mixing model based on Complex Non-Negative Matrix Factorization (CNMF) that predicts the spatial characteristics of each source. The proposed framework is evaluated on the task of singing voice separation with a large multichannel dataset. Experimental results show that our joint DL+CNMF method outperforms both the individual monophonic DL-based separation and the multichannel CNMF baseline methods.
For a single musical work, there often exists a large number of relevant digital documents including various audio recordings, MIDI files, or digitized sheet music. The general goal of music synchronization is to automatically align the... more
For a single musical work, there often exists a large number of relevant digital documents including various audio recordings, MIDI files, or digitized sheet music. The general goal of music synchronization is to automatically align the multiple information sources related to a given musical work. In computing such alignments, one typically has to face a delicate tradeoff between robustness, accuracy, and efficiency. In this paper, we introduce various refinement strategies for music synchronization. First, we introduce novel audio features that combine the temporal accuracy of onset features with the robustness of chroma features. Then, we show how these features can be used within an efficient and robust multiscale synchronization framework. In addition we introduce an interpolation method for further increasing the temporal resolution. Finally , we report on our experiments based on polyphonic Western music demonstrating the respective improvements of the proposed refinement strategies.
Chroma-based audio features are a well-established tool for analyzing and comparing music data. By identifying spectral components that differ by a musical octave, chroma features show a high degree of invariance to variations in timbre.... more
Chroma-based audio features are a well-established tool for analyzing and comparing music data. By identifying spectral components that differ by a musical octave, chroma features show a high degree of invariance to variations in timbre. In this paper, we describe a novel procedure for making chroma features even more robust to changes in timbre and instru-mentation while keeping their discriminative power. Our idea is based on the generally accepted observation that the lower mel-frequency cepstral coefficients (MFCCs) are closely related to timbre. Now, instead of keeping the lower coefficients , we will discard them and only keep the upper coefficients. Furthermore, using a pitch scale instead of a mel scale allows us to project the remaining coefficients onto the twelve chroma bins. Our systematic experiments show that the resulting chroma features have indeed gained a significant boost towards timbre invariance.
The extraction of local tempo and beat information from audio recordings constitutes a challenging task, particularly for music that reveals significant tempo variations. Furthermore, the existence of various pulse levels such as measure,... more
The extraction of local tempo and beat information from audio recordings constitutes a challenging task, particularly for music that reveals significant tempo variations. Furthermore, the existence of various pulse levels such as measure, tactus, and tatum often makes the determination of absolute tempo problematic. In this paper, we present a robust mid-level representation that encodes local tempo information. Similar to the well-known concept of cyclic chroma features, where pitches differing by octaves are identified, we introduce the concept of cyclic tempograms, where tempi differing by a power of two are identified. Furthermore, we describe how to derive cyclic tempograms from music signals using two different methods for periodicity analysis and finally sketch some applications to tempo-based audio segmentation.
Techniques based on non-negative matrix factorization (NMF) can be used to efficiently decompose a magnitude spectrogram into a set of template (column) vectors and activation (row) vectors. To better control this decomposition, NMF has... more
Techniques based on non-negative matrix factorization (NMF) can be used to efficiently decompose a magnitude spectrogram into a set of template (column) vectors and activation (row) vectors. To better control this decomposition, NMF has been extended using prior knowledge and parametric models. In this paper, we present such an extended approach that uses additional score information to guide the decomposition process. Here, opposed to previous methods, our main idea is to impose constraints on both the template as well as the activation side. We show that using such double constraints results in musically meaningful decompositions similar to parametric approaches , while being computationally less demanding and easier to implement. Furthermore, additional onset constraints can be incorporated in a straightforward manner without sacrificing robustness. We evaluate our approach in the context of separating note groups (e. g. the left or right hand) from monaural piano recordings.
The separation of different sound sources from polyphonic music recordings constitutes a complex task since one has to account for different musical and acoustical aspects. In the last years, various score-informed procedures have been... more
The separation of different sound sources from polyphonic music recordings constitutes a complex task since one has to account for different musical and acoustical aspects. In the last years, various score-informed procedures have been suggested where musical cues such as pitch, timing, and track information are used to support the source separation process. In this paper, we discuss a framework for decomposing a given music recording into note-wise audio events which serve as elementary building blocks. In particular, we introduce an interface that employs the additional score information to provide a natural way for a user to interact with these audio events. By simply selecting arbitrary note groups within the score a user can access, modify, or analyze corresponding events in a given audio recording. In this way, our framework not only opens up new ways for audio editing applications, but also serves as a valuable tool for evaluating and better understanding the results of source separation algorithms.
Our goal is to improve the perceptual quality of transient signal components extracted in the context of music source separation. Many state-of-the-art techniques are based on applying a suitable decomposition to the magnitude of the... more
Our goal is to improve the perceptual quality of transient signal components extracted in the context of music source separation. Many state-of-the-art techniques are based on applying a suitable decomposition to the magnitude of the Short-Time Fourier Transform (STFT) of the mixture signal. The phase information required for the reconstruction of individual component signals is usually taken from the mixture, resulting in a complex-valued, modified STFT (MSTFT). There are different methods for reconstructing a time-domain signal whose STFT approximates the target MSTFT. Due to phase inconsistencies, these reconstructed signals are likely to contain artifacts such as pre-echos preceding transient components. In this paper, we propose a simple, yet effective extension of the iterative signal reconstruction procedure by Grif-fin and Lim to remedy this problem. In a first experiment, under laboratory conditions, we show that our method considerably attenuates pre-echos while still showing similar convergence properties as the original approach. A second, more realistic experiment involving score-informed audio decomposition shows that the proposed method still yields improvements, although to a lesser extent , under non-idealized conditions.
Existing mobility models for wireless sensor networks generally do not preserve a uniform scattering of the sensor nodes within the monitored area. This paper proposes a coverage-preserving random mobility model called DPRMM. Direction... more
Existing mobility models for wireless sensor networks generally do not preserve a uniform scattering of the sensor nodes within the monitored area. This paper proposes a coverage-preserving random mobility model called DPRMM. Direction and velocity are randomly chosen according to the local information about sensor density. Our mobility model is devised in a manner that sensors move towards the least covered regions within their neighborhood. We show that this guarantees a rapid convergence to the steady state while preserving a uniform coverage degree on the monitored region. Throughout the simulations we have carried out, we find that the analytical study is corroborated by practical experiments. Our experiments show that the average distance made by a target without being detected is approximately enhanced at least by a factor of 2 using DPRMM.
Our goal is to improve the perceptual quality of signal components extracted in the context of music source separation. Specifically, we focus on decomposing polyphonic, mono-timbral piano recordings into the sound events that correspond... more
Our goal is to improve the perceptual quality of signal components extracted in the context of music source separation. Specifically, we focus on decomposing polyphonic, mono-timbral piano recordings into the sound events that correspond to the individual notes of the underlying composition. Our separation technique is based on score-informed Non-Negative Matrix Factorization (NMF) that has been proposed in earlier works as an effective means to enforce a musically meaningful decomposition of piano music. However, the method still has certain shortcomings for complex mixtures where the tones strongly overlap in frequency and time. As the main contribution of this paper, we propose a restoration stage based on refined Wiener filter masks to score-informed NMF. Our idea is to introduce notewise soft masks created from a dictionary of perfectly isolated piano tones, which are then adapted to match the timbre of the target components. A basic experiment with mixtures of piano tones shows improvements of our novel reconstruction method with regard to perceptually motivated separation quality metrics. A second experiment with more complex piano recordings shows that further investigations into the concept are necessary for real-world applicability.
The automated analysis of vibrato in complex music signals is a highly challenging task. A common strategy is to proceed in a two-step fashion. First, a fundamental frequency (F0) trajectory for the musical voice that is likely to exhibit... more
The automated analysis of vibrato in complex music signals is a highly challenging task. A common strategy is to proceed in a two-step fashion. First, a fundamental frequency (F0) trajectory for the musical voice that is likely to exhibit vibrato is estimated. In a second step, the trajectory is then analyzed with respect to periodic frequency modulations. As a major drawback, however, such a method cannot recover from errors made in the inherently difficult first step, which severely limits the performance during the second step. In this work, we present a novel vibrato analysis approach that avoids the first error-prone F0-estimation step. Our core idea is to perform the analysis directly on a signal's spectrogram representation where vibrato is evident in the form of characteristic spectro-temporal patterns. We detect and parameterize these patterns by locally comparing the spectrogram with a predefined set of vibrato templates. Our systematic experiments indicate that this approach is more robust than F0-based strategies.
Recent digitization efforts have led to large music collections, which contain music documents of various modes comprising textual, visual and acoustic data. In this paper, we present a multimodal music player for presenting and browsing... more
Recent digitization efforts have led to large music collections, which contain music documents of various modes comprising textual, visual and acoustic data. In this paper, we present a multimodal music player for presenting and browsing digitized music collections consisting of heterogeneous document types. In particular, we concentrate on music documents of two widely used types for representing a musical work, namely visual music representation (scanned images of sheet music) and associated interpretations (audio recordings). We introduce novel user interfaces for multi-modal (audiovisual) music presentation as well as intuitive navigation and browsing. Our system offers high quality audio playback with time-synchronous display of the digitized sheet music associated to a musical work. Furthermore, our system enables a user to seamlessly crossfade between various interpretations belonging to the currently selected musical work.
Electronic Music (EM) is a popular family of genres which has increasingly received attention as a research subject in the field of MIR. A fundamental structural unit in EM are loops—audio fragments whose length can span several seconds.... more
Electronic Music (EM) is a popular family of genres which has increasingly received attention as a research subject in the field of MIR. A fundamental structural unit in EM are loops—audio fragments whose length can span several seconds. The devices commonly used to produce EM, such as sequencers and digital audio workstations, impose a musical structure in which loops are repeatedly triggered and overlaid. This particular structure allows new perspectives on well-known MIR tasks. In this paper we first review a prototypical production technique for EM from which we derive a simplified model. We then use our model to illustrate approaches for the following task: given a set of loops that were used to produce a track, decompose the track by finding the points in time at which each loop was activated. To this end, we repurpose established MIR techniques such as fingerprinting and non-negative matrix factor deconvolution.
This paper approaches the problem of annotating measure positions in Western classical music recordings. Such annotations can be useful for navigation, segmentation, and cross-version analysis of music in different types of... more
This paper approaches the problem of annotating measure positions in Western classical music recordings. Such annotations can be useful for navigation, segmentation, and cross-version analysis of music in different types of representations. In a case study based on Wagner’s opera “Die Walküre”, we analyze two types of annotations. First, we report on an experiment where several human listeners generated annotations in a manual fashion. Second, we examine computer-generated annotations which were obtained by using score-to-audio alignment techniques. As one main contribution of this paper, we discuss the inconsistencies of the different annotations and study possible musical reasons for deviations. As another contribution, we propose a kernel-based method for automatically estimating confidences of the computed annotations which may serve as a first step towards improving the quality of this automatic method.
The decomposition of a monaural audio recording into musically meaningful sound sources or voices constitutes a fundamental problem in music information retrieval. In this paper, we consider the task of separating a monaural piano... more
The decomposition of a monaural audio recording into musically meaningful sound sources or voices constitutes a fundamental problem in music information retrieval. In this paper, we consider the task of separating a monaural piano recording into two sound sources (or voices) that correspond to the left hand and the right hand. Since in this scenario the two sources share many physical properties, sound separation approaches identifying sources based on their spectral envelope are hardly applicable. Instead, we propose a score-informed approach, where explicit note events specified by the score are used to parameterize the spectrogram of a given piano recording. This parameterization then allows for constructing two spectrograms considering only the notes of the left hand and the right hand, respectively. Finally , inversion of the two spectrograms yields the separation result. First experiments show that our approach, which involves high-resolution music synchronization and para-metric modeling techniques, yields good results for real-world non-synthetic piano recordings.