A bss-based approach for localization of simultaneous speakers in reverberant conditions (original) (raw)
Related papers
Proc. Eur. Signal Processing Conf.(EUSIPCO), Florence, Italy, 2006
The problem of blind separation of multiple acoustic sources has been recently addressed by the TRINICON framework. By exploiting higher order statistics, it allows to successfully separate acoustic sources when propagation takes place in a reverberating environment. In this paper we apply TRINICON to the problem of source localization, emphasizing the fact that it is possible to achieve small localization errors also when source separation is not perfectly obtained. Extensive simulations have been carried out in order ...
Sound Localization Analysis of Stereo Audio Signals Based on Blind Source Separation
2008
In this paper, utilizing blind source separation (BSS) based on independent component analysis (ICA), we propose a method to analyze and control sound localizatíon for each of the sound sources wíth only information of the multichannel sig nals as mixture of multiple sound sources. In the conventional BSS, the demixing filter has dístortion caused by ambiguíty of amplitude. To obtain the separated signals without distor tion, the reconstruction of the original sound localization at audio channels using inverse自Iter of the demixing fi lter is proposed. The inverse filter of the demixing filter is use白l for analysis of sound localization, however, the inverse fi lter has not only the effect of sound-Iocalization reconstruction but also the effect of the distortion compensation. The com pensation of distortion operates as imposition of distortion to the signals other than the one distorted by BSS. Thus, when the inverse fi lter is used for control of sound localizatíon dト rectly, the quality of the controlled sound source degrades. In this paper, we propose a method to extract the informa tion of sound localization without containing the function of the compensation. By obtaining the demixing fi lter without distortion, the effect of the compensation of distortion is re moved合om the inverse fi lter. The obtaíned inverse fi lter is a replica of source-to-channel transfer function which deter mínes sound localizatíon. By modi命ing the inverse fi lter, sound localization of source can be controlled individually and合eely.
Multi-source localization in reverberant environments
The very large relative bandwidth of acoustic sources, coupled with the high number of reflections of a typical listening room, makes localization a challenging task, since all basic assumptions of classical array processing algorithms constitute at the best viable approximations in real-world environments. In this work, a novel decentralized approach for acoustic localization in reverberant environment is presented. It is based on a two-stage strategy. First, candidate source positions are found by a Time-Delay-Of-Arrivals (TDOA) analysis of signals received by colocated pairs of microphones. Differential delays are estimated by a robust ROOT-MUSIC based technique, applied to the sample cross-spectrum of whitened signals recorded from each microphone pair. A subsequent clustering stage in the spatial coordinates validates the raw TDOA estimates, eliminating most of false detections. The new algorithm is capable of tracking multiple speakers at the same time, exhibits a very good co...
EURASIP Journal on Advances in Signal Processing, 2007
Speaker localization with microphone arrays has received significant attention in the past decade as a means for automated speaker tracking of individuals in a closed space for videoconferencing systems, directed speech capture systems, and surveillance systems. Traditional techniques are based on estimating the relative time difference of arrivals (TDOA) between different channels, by utilizing crosscorrelation function. As we show in the context of speaker localization, these estimates yield poor results, due to the joint effect of reverberation and the directivity of sound sources. In this paper, we present a novel method that utilizes a priori acoustic information of the monitored region, which makes it possible to localize directional sound sources by taking the effect of reverberation into account. The proposed method shows significant improvement of performance compared with traditional methods in "noise-free" condition. Further work is required to extend its capabilities to noisy environments.
Efficient source localization and tracking in reverberant environments using microphone arrays
2005
Abstract In this paper, we propose an algorithm for acoustic source localization and tracking that is suitable for reverberant environments. The approach that we propose is based on the iterative identification of the FIR channels that link source and microphones through an LMS method (multi-channel LMS), but we propose additional solutions that significantly improve this method in terms of computational efficiency and localization reliability, without affecting its convergence properties.
A clustering approach to multi-source localization in reverberant rooms
Proceedings of the 2000 IEEE Sensor Array and Multichannel Signal Processing Workshop. SAM 2000 (Cat. No.00EX410), 2000
Among the appealing features of the proposed approach are the capability of tracking multiple speakers simultaneously and the high accuracy of the closed form TDOA estimator.
In this paper, the tasks of speech source localization, source counting and source separation are addressed for an unknown number of sources in a stereo recording scenario. In the first stage, the angles of arrival of individual source signals are estimated through a peak finding scheme applied to the angular spectrum which has been derived using non-linear GCC-PHAT. Then, based on the known channel mixture coefficients, we propose an approach for separating the sources based on Maximum Likelihood (ML) estimation. The predominant source in each timefrequency bin is identified through ML assuming a diffuse noise model. The separation performance is improved over a binary time-frequency masking method. The performance is measured by obtaining the existing metrics for blind source separation evaluation. The experiments are performed on synthetic speech mixtures in both anechoic and reverberant environments.
Multi-source localization in reverberant environments by ROOT-MUSIC and clustering
2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), 2000
Localization of acoustic sources in reverberant environments by microphone arrays remains a challenging task in audio signal processing. As a matter of fact, most assumptions of commonly adopted models are not met in real applications. Moreover, in practical systems it is not convenient or possible to employ sophisticated and costly architectures, that require precise synchronization and fast data shuffling among sensors.
Acoustic Source Localization Based on Geometric Projection in Reverberant and Noisy Environments
IEEE Journal of Selected Topics in Signal Processing
Acoustic source localization (ASL) is a fundamental yet still challenging signal processing problem in sound acquisition, speech communication, and human-machine interfaces. Many ASL algorithms have been developed, such as the steered response power (SRP), the SRP-phase transform (SRP-PHAT), the minimum variance distortionless response (MVDR), the multiple signal classification (MUSIC), the Householder transform based methods, to name but a few. Most of those algorithms require hundreds or even thousands of snapshots to produce one reliable estimate, which make them difficult to track moving sources. Moreover, not much efforts have been reported in the literature to show the intrinsic relationships among those methods. This paper deals with the ASL problem with its focal point placed on how to achieve ASL with a short frame of acoustic signal (corresponding to a single snapshot in the frequency domain). It reformulates the ASL problem from the perspective of geometric projection. Four types of power functions are proposed, leading to several different algorithms for ASL. By analyzing those power functions, we show the equivalence between the popularly used conventional algorithms and our methods, which provides some new insights into the conventional algorithms. The relationships among different algorithms are discussed, which make it easy to comprehend the pros and cons of each of those methods. Experiments in real acoustic environments corroborate the theoretical analysis, which in turn justifies the contribution of this paper.
A Least-Squares Approach to Blind Source Separation in Multispeaker Environments
Journal of Computers, 2007
We are proposing a new approach to the solution of the cocktail party problem (CPP). The goal of the CPP is to isolate the speech signals of individuals who are concurrently talking while being recorded with a properly positioned microphone array. The new approach provides a powerful yet simple alternative to commonly used methods for the separation of speakers. It relies on the existence of so called exclusive activity periods (EAPs) in the source signals. EAPs are time intervals during which only one source is active and all other sources are inactive (i.e. zero). The existence of EAPs is not guaranteed for arbitrary signal classes. EAPs occur very frequently, however, in recordings of conversational speech. The methods proposed in this paper show how EAPs can be detected and how they can be exploited to improve the performance of blind source separation systems in speech processing applications. We consider both, the instantaneous mixture and the convolutive mixture case. A comparison of the proposed method with other popular source separation methods is drawn. The results show an improved performance of the proposed method over earlier approaches.