Audio Signal Processing Research Papers (original) (raw)
Feature extraction and dimensionality reduction are important tasks in many fields of science dealing with signal processing and analysis. The relevance of these techniques is increasing as current sensory devices are developed with ever... more
Feature extraction and dimensionality reduction are important tasks in many fields of science dealing with signal processing and analysis. The relevance of these techniques is increasing as current sensory devices are developed with ever higher resolution, and problems involving multimodal data sources become more common. A plethora of feature extraction methods are available in the literature collectively grouped under the field of Multivariate Analysis (MVA). This paper provides a uniform treatment of several methods: Principal Component Analysis (PCA), Partial Least Squares (PLS), Canonical Correlation Analysis (CCA) and Orthonormalized PLS (OPLS), as well as their non-linear extensions derived by means of the theory of reproducing kernel Hilbert spaces. We also review their connections to other methods for classification and statistical dependence estimation, and introduce some recent developments to deal with the extreme cases of large-scale and low-sized problems. To illustrate the wide applicability of these methods in both classification and regression problems, we analyze their performance in a benchmark of publicly available data sets, and pay special attention to specific real applications involving audio processing for music genre prediction and hyperspectral satellite images for Earth and climate monitoring.
A base-4 leading zero detector (LZD) design is proposed in this paper. The design is similar to the approach originally proposed by V.G. Oklobdzija with a different technique. The circuit modules used in... more
A base-4 leading zero detector (LZD) design is proposed in this paper. The design is similar to the
approach originally proposed by V.G. Oklobdzija with a different technique. The circuit modules used in
the base-4 LZD approach are designed and several N-bit LZD circuits are implemented with a standard-
cell realization in the Taiwan Semiconductor Manufacturing Company (TSMC) 0.65um CMOS process.
The performance and layout area of the base-4 LZD realization is compared for implementations that
contain only 4-to-1 and 2-to-1 multiplexers
For many audiovisual applications, the integration and synchronization of audio and video signals is essential. The objective of this paper is to develop a system that displays the active objects in the captured video signal, integrated... more
For many audiovisual applications, the integration and synchronization of audio and video signals is essential. The objective of this paper is to develop a system that displays the active objects in the captured video signal, integrated with their respective audio signals in the form of text. The video and audio signals are captured and processed separately. The signals are buffered and integrated and synchronized using a time-stamping technique. Time-stamps provide the timing information for each of the audio and video processes, the speech recognition and the object detection, respectively. This information is necessary to correlate the audio packets to the video frames. Hence, integration is achieved without the use of video information, such as lip movements. The results obtained are based on a specific implementation of the speech recognition module, which is determined to be the bottleneck process in the proposed system.
Composers working in the sonic arts have frequently found themselves attempting to use spatial audio in ways that didn’t work as intended. Maybe more than any other facet of technological music, mastering spatial audio seems to involve a... more
Composers working in the sonic arts have frequently found themselves attempting to use spatial audio in ways that didn’t work as intended. Maybe more than any other facet of technological music, mastering spatial audio seems to involve a learning process in which one slowly discovers the things that work and those that don’t. The purpose of this paper is to foster understanding of spatial audio through examples of practical problems. These problems include both some general misconceptions about spatial hearing and some specific examples of things gone wrong. A particular lesson to be learned from this discussion is that there is no silver bullet for solving spatial audio problems, and every situation needs to be understood in appropriate terms.
Musical genres are categorical labels created by humans to characterize pieces of music. A musical genre is characterized by the common characteristics shared by its members. These characteristics typically are related to the... more
Musical genres are categorical labels created by humans to characterize pieces of music. A musical genre is characterized by the common characteristics shared by its members. These characteristics typically are related to the instrumentation, rhythmic structure, and harmonic content of the music. Genre hierarchies are commonly used to structure the large collections of music available on the Web. Currently musical genre annotation is performed manually. Automatic musical genre classification can assist or replace the human user in this process and would be a valuable addition to music information retrieval systems. In addition, automatic musical genre classification provides a framework for developing and evaluating features for any type of content-based analysis of musical signals.
Filter banks with fixed time-frequency resolution, such as the Short-Time Fourier Transform (STFT), are a common tool for many audio analysis and processing applications allowing effective implementation via the Fast Fourier Transform... more
Filter banks with fixed time-frequency resolution, such as the Short-Time Fourier Transform (STFT), are a common tool for many audio analysis and processing applications allowing effective implementation via the Fast Fourier Transform (FFT). The fixed time-frequency resolution of the STFT can lead to the undesirable smearing of events in both time and frequency. In this paper, we suggest adaptively varying STFT time-frequency resolution in order to reduce filter bank-specific artifacts while retaining adequate frequency resolution. Several strategies for systematic adaptation of time-frequency resolution are proposed. The introduced approach is demonstrated as applied to spectrogram displays, noise reduction, and spectral effects processing.
The field of Text-to-Speech has experienced huge improvements last years benefiting from deep learning techniques. Producing realistic speech becomes possible now. As a consequence, the research on the control of the expressiveness,... more
The field of Text-to-Speech has experienced huge improvements last years benefiting from deep learning techniques. Producing realistic speech becomes possible now. As a consequence, the research on the control of the expressiveness, allowing to generate speech in different styles or manners, has attracted increasing attention lately. Systems able to control style have been developed and show impressive results. However the control parameters often consist of latent variables and remain complex to interpret. In this paper, we analyze and compare different latent spaces and obtain an interpretation of their influence on expressive speech. This will enable the possibility to build controllable speech synthesis systems with an understandable behaviour.
This paper proposes a mining-based method to achieve event detection for broadcasting tennis videos. Utilizing visual and aural information, we extract some high-level features to describe video segments. The audiovisual features are... more
This paper proposes a mining-based method to achieve event detection for broadcasting tennis videos. Utilizing visual and aural information, we extract some high-level features to describe video segments. The audiovisual features are further transformed to symbolic streams and an efficient mining technique is applied to derive all frequent patterns that characterize tennis events. After mining, we categorize frequent patterns into several kinds of events and therefore achieve event detection for tennis videos by checking the correspondence between mined patterns and events. The experimental results show that the proposed approach is a promising way to detect events in broadcasting tennis video.
Mixing multitrack music is an expert task where characteristics of the individual elements and their sum are manipulated in terms of balance, timbre and positioning, to resolve technical issues and to meet the creative vision of the... more
Mixing multitrack music is an expert task where characteristics of the individual elements and their sum are manipulated in terms of balance, timbre and positioning, to resolve technical issues and to meet the creative vision of the artist or engineer. In this paper we conduct a mixing experiment where eight songs are each mixed by eight different engineers. We consider a range of features describing the dynamic, spatial and spectral characteristics of each track, and perform a multidimensional analysis of variance to assess whether the instrument, song and/or engineer is the determining factor that explains the resulting variance, trend, or consistency in mixing methodology. A number of assumed mixing rules from literature are discussed in the light of this data, and implications regarding the automation of various mixing processes are explored. Part of the data used in this work is published in a new online multitrack dataset through which public domain recordings, mixes, and mix ...
Flamenco singing is characterized by pitch instability, micro-tonal ornamentations, large vibrato ranges, and a high degree of melodic variability. These musical features make the automatic identification of flamenco singers a difficult... more
Flamenco singing is characterized by pitch instability, micro-tonal ornamentations, large vibrato ranges, and a high degree of melodic variability. These musical features make the automatic identification of flamenco singers a difficult computational task. In this article we present an end-to-end pipeline for flamenco singer identification based on acoustic motif embeddings. In the approach taken, the fundamental frequency obtained directly from the raw audio signal is approximated. This approximation reduces the high variability of the audio signal and allows for small melodic patterns to be discovered using a sequential pattern mining technique, thus creating a dictionary of motifs. Several acoustic features are then used to extract fixed length embeddings of variable length motifs by using convolutional architectures. We test the quality of the embeddings in a flamenco singer identification task, comparing our approach with previous deep learning architectures, and study the effe...
An approach to watermarking digital signals using frequency modulation-'Chirp Coding'-is considered. The principles underlying this approach are based on the use of a matched filter to reconstruct a 'chirp stream'code... more
An approach to watermarking digital signals using frequency modulation-'Chirp Coding'-is considered. The principles underlying this approach are based on the use of a matched filter to reconstruct a 'chirp stream'code that is uniquely robust. The method is generic in the sense that it can, in principle, be used for a variety of different signal (the authentication of speech and biomedical signals, for example). Further, by generating a bit stream that is signal dependent, chirp coding provides a method of self-authentication, ...
This article refers to ongoing research related to the Postdoctoral project " multichannel audio devices applied to music, sound art and bioacoustics " developed by the author for UFF (2016-2018) with support from CAPES and in... more
This article refers to ongoing research related to the Postdoctoral project " multichannel audio devices applied to music, sound art and bioacoustics " developed by the author for UFF (2016-2018) with support from CAPES and in collaboration with the Laboratory of Acoustics and Sound Arts – LASom/DM/IA/UNICAMP. This work aims to (1) consolidate a spatial sound projection device involving signal processing; and (2) its application in the arts (music) and science (biology and ecology) domains.
These projects deal with my intention to offer the musician novel tools for performance and/or improvisation techniques. I think of them in contrast with traditional modular approach. What I have in mind is neither a system with static... more
These projects deal with my intention to offer the musician novel tools for performance and/or improvisation techniques. I think of them in contrast with traditional modular approach. What I have in mind is neither a system with static parameters, e.g. a step sequencer with fixed rhythmic patterns or divisions and fixed interval ratio, nor a pure stochastic-based system. Moreover, these two projects are related to each other so to create a more general environment that may be used both in musical and in an interactive domain.
The entire work starts from my Bachelor’s thesis, ‘Real-Time Intelligent Harmonizer based on AMDF Pitch Detection’.
Sound equalization is a common approach for objectively or subjectively defining the reproduction level at specific frequency bands. It is also well-known that the human auditory system demonstrates an inner process for sound-weighting.... more
Sound equalization is a common approach for objectively or subjectively defining the reproduction level at specific frequency bands. It is also well-known that the human auditory system demonstrates an inner process for sound-weighting. Due to this, the perceived loudness changes with the frequency and the user-defined sound reproduction gain, resulting into a deviation of the intended and the perceived equalization scheme as the sound level changes. In this work we introduce a novel equalization approach that takes into account the above perceptual loudness effect in order to achieve subjectively constant equalization. A series of listening tests shows that the proposed equalization technique is an efficient and listener-preferred alternative for both professional and home audio reproduction applications.
In this paper, a new approach for automatic audio classification using non-negative matrix factorization (NMF) is presented. Training is performed onto each audio class individually, whilst during the test phase each test recording is... more
In this paper, a new approach for automatic audio classification using non-negative matrix factorization (NMF) is presented. Training is performed onto each audio class individually, whilst during the test phase each test recording is projected onto the several training matrices. Experiments demonstrating the efficiency of the proposed approach were performed for musical instrument classification. Several perceptual features as well as MPEG-7 descriptors were measured for 300 sound recordings consisting of 6 different musical instrument classes. Subsets of the feature set were selected using branch-and-bound search, in order to obtain the most discriminating features for classification. Several NMF techniques were utilized, namely the standard NMF method, the local NMF, and the sparse NMF. The experiments demonstrate an almost perfect classification (classification error 1.0%), outperforming the state-of-the-art techniques tested for the aforementioned experiment.
Objective quality assessment models have been used more and more in recent years to assess or monitor speech and audio quality in many multime-dia and audio processing systems. These methods offer a clear and repeatable way to evaluate a... more
Objective quality assessment models have been used more and more in recent years to assess or monitor speech and audio quality in many multime-dia and audio processing systems. These methods offer a clear and repeatable way to evaluate a customer experience by measuring perceived quality on a subjective scale, which is easily understood, such as a quality rating scale, ranging from excellent quality to a low quality. Subsequently, the aim of service providers is to offer reliable services providing the end-user/customer with the best possible quality in the context of the current network conditions to avoid a customer churn. This paper presents a design and performance evaluation of parametric models estimating the audio quality experienced by the end user of broadcasting systems and web-casting applications. The Random Forest (RF) algorithm is used to design non-intrusive parametric models, establishing the relationship between the feature description and the perceived quality scores. For this, the broadcast and web-cast sub-databases were created , where the web-cast sub-database includes 17,280 degraded samples and the broadcast sub-database contains 1,080 degraded samples obtained from the Slovak Radio. The results reported for the proposed paramet-ric audio quality models have validated Random Forest as a powerful technique that provides a good efficiency in terms of Pearson Correlation Coefficient (PCC) and Root Mean Squared Error (RMSE).
Natural Language Processing is in growing demand with recent developments. This Generator model is one such example of a music generation system conditioned on lyrics. The model proposed has been tested on songs having lyrics written only... more
Natural Language Processing is in growing demand with recent developments. This Generator model is one such example of a music generation system conditioned on lyrics. The model proposed has been tested on songs having lyrics written only in English, but the idea can be generalized to various languages. This paper’s objective is to mainly explain how one can create a music generator using statistical machine learning methods. This paper also explains how effectively outputs can be formulated, which are the music signals as they are million sized over a short period frame. The parameters mentioned in the paper only serve an explanatory purpose. This paper discusses the effective statistical formulation of output thereby decreasing the vast amount of estimation of output parameters, and how to reconstruct the audio signals from predicted parameters by using ‘phase-shift algorithm
The automated extraction of chord labels from audio recordings constitutes a major task in music information retrieval. To evaluate computer-based chord labeling procedures , one requires ground truth annotations for the underlying audio... more
The automated extraction of chord labels from audio recordings constitutes a major task in music information retrieval. To evaluate computer-based chord labeling procedures , one requires ground truth annotations for the underlying audio material. However, the manual generation of such annotations on the basis of audio recordings is tedious and time-consuming. On the other hand, trained musicians can easily derive chord labels from symbolic score data. In this paper, we bridge this gap by describing a procedure that allows for transferring annotations and chord labels from the score domain to the audio domain and vice versa. Using music synchronization techniques, the general idea is to locally warp the annotations of all given data streams onto a common time axis, which then allows for a cross-domain evaluation of the various types of chord labels. As a further contribution of this paper, we extend this principle by introducing a multi-perspective evaluation framework for simultaneously comparing chord recognition results over multiple performances of the same piece of music. The revealed inconsistencies in the results do not only indicate limitations of the employed chord labeling strategies but also deepen the understanding of the underlying music material.
The computer-based harmonic analysis of music recordings with the goal to automatically extract chord labels directly from the given audio data constitutes a major task in music information retrieval. In most automated chord recognition... more
The computer-based harmonic analysis of music recordings with the goal to automatically extract chord labels directly from the given audio data constitutes a major task in music information retrieval. In most automated chord recognition procedures, the given music recording is first converted into a sequence of chroma-based audio features and then pattern matching techniques are applied to map the chroma features to chord labels. In this paper, we analyze the role of the feature extraction step within the recognition pipeline of various chord recognition procedures based on template matching strategies and hidden Markov models. In particular, we report on numerous experiments which show how the various procedures depend on the type of the underlying chroma feature as well as on parameters that control temporal and spectral aspects.
— To enhance security and robustness of digital audio watermarking algorithms, this paper presents an algorithm based on mean-quantization in Discrete Wavelet Transform (DWT) domain. A binary image is used as a watermark, and is encrypted... more
— To enhance security and robustness of digital audio watermarking algorithms, this paper presents an algorithm based on mean-quantization in Discrete Wavelet Transform (DWT) domain. A binary image is used as a watermark, and is encrypted with a chaotic encryption with a secret key. This approach is based on the embedding of an encrypted watermark in the lower frequency components using a two wavelet function with adaptation to the frame size. The reason for embedding the watermark in the lower frequency components is that these components' energy is high enough to embed the watermark in such a way that the watermark is inaudible; therefore, it should not alter the audible content and should not be easy to remove. The algorithm has a good security because only the authorized can detect the copyright information embedded to the host audio signal. The watermark can be blindly extracted without knowledge of the original signal. To evaluate the performance of the presented audio watermarking method, objective quality tests, including bit error rate (BER), normalized cross correlation (NCC), peak-signal to noise ratio (PSNR) are conducted for the watermark and Signal-to-Noise Ratio (SNR) for audio signals. The test results show that the approach maintains high audio quality, and yields a high recovery rate after attacks by commonly used audio data manipulations such as noise addition, amplitude modification, low-pass filtering, re-quantization, re-sampling, cropping, cutting, and compression. Simulation results show that our approach not only makes sure robustness against common attacks, but it also further improves systemic security and robustness against malicious attack.
Most recommender systems present recommended products in lists to the user. By doing so, much information is lost about the mutual similarity between recommended products. We propose a graphical shopping interface, which represents the... more
Most recommender systems present recommended products in lists to the user. By doing so, much information is lost about the mutual similarity between recommended products. We propose a graphical shopping interface, which represents the mutual similarities of the recommended products in a two dimensional space, where similar products are located close to each other and dissimilar products far apart. The graphical shopping interface can be used to navigate through the complete product space in a number of steps. We show a prototype application of the system to MP3-players.
O presente relatório tem como meta o estudo e aplicação dos conceitos compreendidos à disciplina "Processamento de Sinais" quando voltados ao processamento, modelagem e quantização da fala através da execução de diversos experimentos,... more
O presente relatório tem como meta o estudo e aplicação dos conceitos compreendidos à disciplina "Processamento de Sinais" quando voltados ao processamento, modelagem e quantização da fala através da execução de diversos experimentos, simulações e modelagens computacionais dispostas num roteiro de tarefas proposto por [1], constituindo assim um estudo dirigido. Anexo a este roteiro, dispôs-se de uma pasta com arquivos de dados, de áudio e funções úteis à execução do estudo.
The concept of similarity matrices (SMs) has been widely used for a multitude of music analysis and retrieval tasks including audio structure analysis or version identification. For such tasks, the improvement of structural properties of... more
The concept of similarity matrices (SMs) has been widely used for a multitude of music analysis and retrieval tasks including audio structure analysis or version identification. For such tasks, the improvement of structural properties of the similarity matrix at an early state of the processing pipeline has turned out to be of crucial importance. In this paper, we present the SM toolbox, which contains MATLAB implementations for computing and enhancing similarity matrices in various ways. Furthermore, our toolbox includes a number of additional tools for parsing, navigation, and visualization synchronized with audio playback. Finally, we provide the code for a recently proposed audio thumbnailing procedure that demonstrates the applicability and importance of enhancement concepts. Providing MATLAB implementations on a website under a GNU-GPL license and including many illustrative examples, our aim is to foster research and education in music information retrieval.
A method for automatic transcription of polyphonic music is proposed in this work that models the temporal evolution of musical tones. The model extends the shift-invariant probabilistic latent component analysis method by supporting the... more
A method for automatic transcription of polyphonic music is proposed in this work that models the temporal evolution of musical tones. The model extends the shift-invariant probabilistic latent component analysis method by supporting the use of spectral templates that correspond to sound states such as attack, sustain, and decay. The order of these templates is controlled using hidden Markov model-based temporal constraints. In addition, the model can exploit multiple templates per pitch and instrument source. The shift-invariant aspect of the model makes it suitable for music signals that exhibit frequency modulations or tuning changes. Pitch-wise hidden Markov models are also utilized in a postprocessing step for note tracking. For training, sound state templates were extracted for various orchestral instruments using isolated note samples. The proposed transcription system was tested on multiple-instrument recordings from various datasets. Experimental results show that the proposed model is superior to a non-temporally constrained model and also outperforms various state-of-the-art transcription systems for the same experiment.
Humans tend to organize perceived information into hierarchies and structures, a principle that also applies to music. Even musically untrained listeners unconsciously analyze and segment music with regard to various musical aspects, for... more
Humans tend to organize perceived information into hierarchies and structures, a principle that also applies to music. Even musically untrained listeners unconsciously analyze and segment music with regard to various musical aspects, for example, identifying recurrent themes or detecting temporal boundaries between contrasting musical parts. This paper gives an overview of state-of-the-art methods for computational music structure analysis, where the general goal is to divide an audio recording into temporal segments corresponding to musical parts and to group these segments into musically meaningful categories. There are many different criteria for segmenting and structuring music audio. In particular, one can identify three conceptually different approaches, which we refer to as repetition-based, novelty-based, and homogeneity-based approaches. Furthermore, one has to account for different musical dimensions such as melody, harmony, rhythm, and timbre. In our state-of-the-art report, we address these different issues in the context of music structure analysis, while discussing and categorizing the most relevant and recent articles in this field.
As a result of massive digitization efforts and the world wide web, there is an exploding amount of available digital data describing and representing music at various semantic levels and in diverse formats. For example, in the case of... more
As a result of massive digitization efforts and the world wide web, there is an exploding amount of available digital data describing and representing music at various semantic levels and in diverse formats. For example, in the case of the Beatles songs, there are numerous recordings including an increasing number of cover songs and arrangements as well as MIDI data and other symbolic music representations. The general goal of music synchronization is to align the multiple information sources related to a given piece of music. This becomes a difficult problem when the various representations reveal significant differences in structure and polyphony, while exhibiting various types of artifacts. In this paper, we address the issue of how music synchronization techniques are useful for automatically revealing critical passages with significant difference between the two versions to be aligned. Using the corpus of the Beatles songs as test bed, we analyze the kind of differences occurring in audio and MIDI versions available for the songs.
La realizzazione del brano Incontro di Giorgio Nottoli, ci ha posto di fronte a varie e profondamente diverse questioni, esse affrontate con i giusti mezzi ci hanno permesso di iniziare a realizzare gli strumenti necessari per... more
La realizzazione del brano Incontro di Giorgio Nottoli, ci ha posto di fronte a varie e profondamente diverse questioni, esse affrontate con i giusti mezzi ci hanno permesso di iniziare a realizzare gli strumenti necessari per l’interpretazione ed una futura esecuzione dello stesso.
I documenti forniti dall’autore, oltre ad il manuale dello strumento principe(l’EMS VCS3) di questa composizione hanno reso più semplice il lavoro di porting dello strumento, facendoci intendere a fondo come realizzare digitalmente uno strumento analogico complesso e composto da varie tipologie interconnesse di hardware.
FAUST2 e Max3 ci hanno aiutato lungo il percorso; permettendoci di realizzare lo strumento digitale ponendoci una "tabula rasa" capace di farci capire ed intendere quali scelte algoritmiche effettuare a seconda degli aspetti fisici necessari, mostrando ed ampliando le capacità espressive dell’esecutore ed estendendole attraverso l’utilizzo di adeguati mezzi elettronici.
Il seguente testo affronterà dunque uno degli aspetti piú interessanti dello strumento VCS3, ovvero la matrice di cui lo stesso si avvale per connettere tutti i frammenti del "puzzle strumentale elettronico", facendo da fil rouge tra filtri oscillatori ed effetti che Zinovieff4 introdusse in questo primo strumento "commerciale".
The focus of this thesis is on the signal processing techniques used to increase the au-dio quality of the most common digital audio effects employed in electronic musical instruments also taking into account the feasibility of the... more
The focus of this thesis is on the signal processing techniques used to increase the au-dio quality of the most common digital audio effects employed in electronic musical instruments also taking into account the feasibility of the proposed algorithms’ implementation in accordance with the design constraints and the available computational limits. More in detail four different issues have been analyzed throughout this dissertation: artificial reverberation, analysis and emu-lation of nonlinear devices, audio morphing, and room response equalization.
Among the audio effects, one of the most used is definitely artificial reverberation. A great deal of research has been devoted in the last decades to improve the performance of digital artificial reverberators. Thanks to the progress of technology the traditional techniques based on recursive structures (i.e., IIR filters) are accompanied by new approaches based on fast convolution tech-niques and hybrid reverberator structures. On this basis, an efficient real-time implementation of a fast convolution algorithm has been proposed taking into account an embedded system. More-over, a technique for reducing the computational load required by this operation using psycho-acoustic expedients has been presented considering a joint assessment of the energy decay relief and the absolute threshold of hearing. Finally, some techniques for the approximation of the convolution operation with recursive structures at low computational cost have been suggested.
Although the convolution operation allows the exact reproduction of a linear system, it is im-portant to consider that most of the audio effects are nonlinear systems (i.e., compressors, distor-tion, amplifiers). For this reason, the most commonly used techniques for the emulation of non-linear systems based on a black box approach have been studied and analyzed. In particular, a technique for the approximation of the dynamic convolution operation by exploiting the princi-pal component analysis has been proposed. Using this procedure it is possible to reduce the cost of dynamic convolution without lowering the perceived audio quality. An adaptive algorithm for the identification of nonlinear systems using orthogonal functions has also been presented.
In order to provide greater flexibility and major artistic expression to musicians, several audio morphing techniques have been analyzed. In particular, this procedure makes possible to com-bine two or more audio signals in order to create new sounds that are acoustically interesting. This study has led to the development of an audio morphing algorithm for percussive hybrid sound generation. The main features of the presented approach are preprocessing of the audio references performed in the frequency domain and time domain linear interpolation to execute the morphing.
Finally, equalization techniques for improving the quality of sound reproduction systems by compensating the room transfer function have been taken into account. In particular, two algo-rithms for adaptive minimum-phase equalization and a mixed-phase equalization technique have been proposed. In order to verify the suitability of the proposed systems, experiments on a realis-tic scenario have been carried out.
This paper reviews 3D sound technology. It focuses on the core technical issue of 3D sound: the scientific and engineering means by which a listener of a stereo reproduction system can perceive the direction of a sound in a 3D space. The... more
This paper reviews 3D sound technology. It focuses on the core technical issue of 3D sound: the scientific and engineering means by which a listener of a stereo reproduction system can perceive the direction of a sound in a 3D space. The scientific basis is first discussed, followed by the practical techniques for 3D stereo reproduction.
The decomposition of audio signals into perceptually meaningful modulation components is highly desirable for the development of new audio effects on the one hand and as a building block for future efficient audio compression algorithms... more
The decomposition of audio signals into perceptually meaningful modulation components is highly desirable for the development of new audio effects on the one hand and as a building block for future efficient audio compression algorithms on the other hand. In the past, there has always been a distinction between paramet-ric coding methods and waveform coding: While waveform coding methods scale easily up to transparency (provided the necessary bit rate is available), parametric coding schemes are subjected to the limitations of the underlying source models. Otherwise, parametric methods usually offer a wealth of manipu-lation possibilities which can be exploited for application of audio effects, while waveform coding is strictly limited to the best as possible reproduction of the original signal. The analy-sis/synthesis approach presented in this paper is an attempt to show a way to bridge this gap by enabling a seamless transition between both approaches.
This paper aims to develop a system for estimating a vehicle's speed by analyzing its drive by acoustics with a passive audio microphone. Analysis of the vehicle"s acoustics would primarily use the phenomenon of Doppler shift, and the... more
This paper aims to develop a system for estimating a vehicle's speed by analyzing its drive by acoustics with a passive audio microphone. Analysis of the vehicle"s acoustics would primarily use the phenomenon of Doppler shift, and the instant at which vehicle is at closest-point-of approach. This approach uses a technique called Seam carving to track harmonics formed by vehicle particularly its engine noise. The method proposed is computationally inexpensive and can very easily be developed into mobile application.
Music recommendation is receiving increasing attention as the music industry develops venues to deliver music over the Internet. The goal of music recommendation is to present users lists of songs that they are likely to enjoy.... more
Music recommendation is receiving increasing attention as the music industry develops venues to deliver music over the Internet. The goal of music recommendation is to present users lists of songs that they are likely to enjoy. Collaborative-filtering and content-based recommendations are two widely used approaches that have been proposed for music recommendation. However, both approaches have their own disadvantages: collaborative-filtering methods need a large collection of user history data and content-based methods lack the ability of understanding the interests and preferences of users. To overcome these limitations, this paper presents a novel dynamic music similarity measurement strategy that utilizes both content features and user access patterns. The seamless integration of them significantly improves the music similarity measurement accuracy and performance. Based on this strategy, recommended songs are obtained by a means of label propagation over a graph representing music similarity. Experimental results on a real data set collected from http://www.newwisdom.net demonstrate the effectiveness of the proposed approach.
Assim como muitas formas de arte surgiram no século XX, mixar músicas caminhou junto com as inovações tecnológicas que envolve som e imagem. Num diálogo leve com o leitor, Reynaldo Leite-expõe o conceito fundamental para você exercitar... more
Assim como muitas formas de arte surgiram no século XX, mixar músicas caminhou junto com as inovações tecnológicas que envolve som e imagem. Num diálogo leve com o leitor, Reynaldo Leite-expõe o conceito fundamental para você exercitar sua criatividade, sem fórmulas de sucesso ou regras rígidas. Um sólido embasamento teórico e prático, útil para iniciantes e iniciados na arte da mixagem.
Disponível na Amazon Books.
In a musical signals, the spectral and temporal contents of instruments often overlap. If the number of channels is at least the same as the number of instruments, it is possible to apply statistical tools to highlight the characteristics... more
In a musical signals, the spectral and temporal contents of instruments often overlap. If the number of channels is at least the same as the number of instruments, it is possible to apply statistical tools to highlight the characteristics of each instrument, making their identification possible. However, in the underdetermined case, in which there are fewer channels than sources, the task becomes challenging. One possible way to solve this problem is to seek for regions in the time and/or frequency domains in which the content of a given instrument appears isolated. The strategy presented in this paper explores the spectral disjointness among instruments by identifying isolated partials, from which a number of features are extracted. The information contained in those features, in turn, is used to infer which instrument is more likely to have generated that partial. Hence, the only condition for the method to work is that at least one isolated partial exists for each instrument somewhere in the signal. If several isolated partials are available, the results are summarized into a single, more accurate classification. Experimental results using 25 instruments demonstrate the good discrimination capabilities of the method.
The design and implementation of lossless audio signal processing using Finite Field Transforms is discussed. Finite field signal processing techniques are described. The effects of filter length and coefficient accuracy are also... more
The design and implementation of lossless audio signal processing using Finite Field Transforms is discussed. Finite field signal processing techniques are described. The effects of filter length and coefficient accuracy are also discussed. Finite field transform algorithms which would be suitable for lossless signal processing are presented
One of the challenges in computational acoustics is the identification of models that can simulate and predict the physical behavior of a system generating an acoustic signal. Whenever such models are used for commercial applications an... more
One of the challenges in computational acoustics is the identification of models that can simulate and predict the physical behavior of a system generating an acoustic signal. Whenever such models are used for commercial applications an additional constraint is the time-to-market, making automation of the sound design process desirable. In previous works, a computational sound design approach has been proposed for the parameter estimation problem involving timbre matching by deep learning, which was applied to the synthesis of pipe organ tones. In this work we refine previous results by introducing the former approach in a multi-stage algorithm that also adds heuristics and a stochastic optimization method operating on objective cost functions based on psychoacoustics. The optimization method shows to be able to refine the first estimate given by the deep learning approach and substantially improve the objective metrics, with the additional benefit of reducing the sound design process time. Subjective listening tests are also conducted to gather additional insights on the results. Index Terms-physics-based acoustic modeling, neural networks , computational sound design, iterative optimization
The impulse response of an acoustical space or transducer is one of their most important characterization. In order to perform the measurement of their impulse responses, four of the most suited methods are compared : MLS (Maximum Length... more
The impulse response of an acoustical space or transducer is one of their most important characterization. In order to perform the measurement of their impulse responses, four of the most suited methods are compared : MLS (Maximum Length Sequence), IRS (Inverse Repeated Sequence), Time-Stretched Pulses and SineSweep. These different methods have already been described in the literature. Nevertheless, the choice of one of these methods depending on the measurement conditions is critical. Therefore, an extensive comparison has been realized. This comparison has been done through the implementation and realization of a complete, fast, reliable and cheap measurement system. Finally, a conclusion for the use of each method according to the principal measurment conditions is presented. It is shown that in the presence of non white noise, the MLS and IRS techniques seem to be more accurate. On the contrary, in quiet environments the Logarithmic SineSweep method seems to be the most appropriate.
In today's society, a large number of mobile electronic devices are used on a regular basis. These devices often require a lot of settings to be made by the user. One way of minimizing this and introducing smartness into these devices is... more
In today's society, a large number of mobile electronic devices are used on a regular basis. These devices often require a lot of settings to be made by the user. One way of minimizing this and introducing smartness into these devices is through the concept of context awareness. The purpose of this Master's thesis is to implement a context classifier on a commercially available PDA and to evaluate the feasibility of such an application.
This paper presents a low latency pitch shifting algorithm based on the Short-Time Fourier Transform (STFT). Unlike existing STFT-based implementations of pitch shifting, the presented algorithm is more robust to reductions of the Fourier... more
This paper presents a low latency pitch shifting algorithm based on the Short-Time Fourier Transform (STFT). Unlike existing STFT-based implementations of pitch shifting, the presented algorithm is more robust to reductions of the Fourier transform size. As a result, it achieves latencies as low as 12ms and still produces good quality, whereas other algorithms are performing much worse with similar low latency constraints. The presented algorithm also provides an alternate way of mitigating the well-known phasiness problem of the phase vocoder.
Research has used the cardiac orienting response to show that structural changes in the auditory environment cause people to briefly but automatically pay attention to messages such as radio broadcasts, podcasts, and web streaming. The... more
Research has used the cardiac orienting response to show that structural changes in the auditory environment cause people to briefly but automatically pay attention to messages such as radio broadcasts, podcasts, and web streaming. The voice change--an example of an auditory structural feature--elicits orienting across multiple repetitions. This article reports two experiments designed to investigate whether automatic attention allocation to repeated instances of other auditory structural features--namely production effects, jingles, and silence--is a robust phenomenon or if repetition leads to habituation. In Study 1 we show that listeners of a simulated radio broadcast exhibit orienting responses following the onset of auditory structural features that differ in semantic content. The prediction that listeners would not habituate to feature repetition was not supported. Instead, both jingles and synthesized production effects result in more iconic orienting responses to the second repetition compared to the first. However orienting significantly diminished following the third repetition of both. Study 2 replicates this result using multiple repetitions of structural features containing identical semantic content.
- by Robert F Potter and +1
- •
- Marketing, Cognitive Psychology, Psychophysiology, Advertising
An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular,... more
An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the cross-correlation and the magnitude of the corresponding the cross-power spectral density of a pair of indicator functions are input to various classifiers, such as voted perceptrons, radial basis function networks, random trees, and support vector machines for dialogue/non-dialogue detection. To boost classifier efficiency AdaBoost is also exploited. The aforementioned classifiers are trained using ground truth indicator functions determined by human annotators for 41 dialogue and another 20 non-dialogue audio instances. For testing, actual indicator functions are derived by applying audio activity detection and actor clustering to audio recordings. 23 instances are randomly chosen among the aforementioned 41 dialogue instances, 17 of which correspond to dialogue scenes and 6 to non-dialogue ones. Accuracy ranging between 0.739 and 0.826 is reported.
Electronic communication is increasingly susceptible to eavesdropping and malicious interventions. The issues of security and privacy have traditionally been approached using tools from cryptography... more
Electronic communication is increasingly susceptible to eavesdropping and malicious interventions. The issues of security and privacy have traditionally been
approached using tools from cryptography and steganography.
Steganography can be feasible alternative to cryptography in various countries where usage of encryption is illegal.
In this paper, a novel scheme of data hiding is introduced which provide high level of security to digital media.4LSB and phase encoding algorithm are used for data embedding in video and audio files respectively. Quality of video file is strictly preserved even after secret data embedding. Experimental results have demonstrated the feasibility and efficiency of the proposed work.