Robust Three-Microphone Speech Source Localization Using Randomized Singular Value Decomposition (original) (raw)
Related papers
Journal of Signal Processing Systems, 2017
Robust speech source localization (SSL) is an important component of the speech processing pipeline for hearing aid devices (HADs). SSL via time direction of arrival (TDOA) estimation has been known to improve performance of HADs in noisy environments, thereby providing better listening experience for hearing aid users. Smartphones now possess the capability to connect to the HADs through wired or wireless channel. In this paper, we present our findings about the nonuniform non-linear microphone array (NUNLA) geometry for improving SSL for HADs using an L-shaped three-element microphone array available on modern smartphones. The proposed method is implemented on a frame-based TDOA estimation algorithm using a modified Dictionary-based singular value decomposition method (SVD) method for localizing single speech sources under very low signal to noise ratios (SNR). Unlike most methods developed for uniform microphone arrays, the proposed method has low spatial aliasing as well as low spatial ambiguity while providing a robust low-error with 360° DOA scanning capability. We present the comparison among different types of microphone arrays, as well as compare their performance using the proposed method.
STAP approach for DOA estimation using microphone arrays
Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2010, 2010
In this paper, the Space-Time Array Processing (STAP) approach is applied to sound source localization using adaptive microphone arrays. Two beamforming methods, conventional and MVDR are used for estimating the direction-of arrival (DOA) of sound signals arrived at the microphone array from different sensors in the observation area. The simulation scenario describes a situation where three sensors generating three different sound signals (warning, alarm and emergency) and one source of natural noise (car) are located at various points in the observation area. The results obtained demonstrate that in contrast to the conventional method of beamforming the MVDR gives accurate estimates of The DOA of all sound signals generated by sensors in the observation area.
The Journal of the Acoustical Society of America, 2014
Sound source localization using a two-microphone array is an active area of research, with considerable potential for use with video conferencing, mobile devices, and robotics. Based on the observed time-differences of arrival between sound signals, a probability distribution of the location of the sources is considered to estimate the actual source positions. However, these algorithms assume a given number of sound sources. This paper describes an updated research account on the solution presented in Escolano et al. [J. Acoust. Am. Soc. 132(3), 1257-1260 (2012)], where nested sampling is used to explore a probability distribution of the source position using a Laplacian mixture model, which allows both the number and position of speech sources to be inferred. This paper presents different experimental setups and scenarios to demonstrate the viability of the proposed method, which is compared with some of the most popular sampling methods, demonstrating that nested sampling is an accurate tool for speech localization. V
Proceedings of Meetings on Acoustics, 2017
Speech source localization has numerous application areas such as hearing aid devices (HAD) and consumer electronics applications. Utilizing the powerful processing hardware of smartphones, we demonstrate that smartphones are capable of instantaneous estimation of sound location. In this paper, we present instantaneous direction of arrival (DOA) by using traditional Generalized Cross Correlation (GCC) followed by a spatial post-filtering stage. A simple voice activity detector (VAD) is used for the post-filtering stage to improve noise robustness in some realistic reverberant noisy environments. Root mean square error (RMSE) is used as an evaluation criterion for the proposed method. Both real recorded data and simulated data under different noise types are used for experiments. A real-time implementation of the method on an Androidbased smartphone is also presented.
Effect of Reverberation on Different DOA Estimation Techniques using Microphone Array
The automatic estimation of DOA is very important for many practical applications such as in automatic speech Recognition (ASR), Speaker tracking, teleconferencing, Human Computer Interaction (HCI) in particular and Human Machine Interface (HCI) in general, Robotic audition, Blind signal separation (BSS) etc. Acoustic-source localization based on microphone-arrays has been a mainstream research topic for over two decades. The solution available in the literature can be broadly categorized into three categories mainly those based on maximizing the steered response power (SRP) of a beamformer, those based on High-resolution spectral estimation (HRSE) methods; and those based on time difference of arrival (TDOA) algorithm source localization methods of the second categories are all based on the analysis of the spatial convergence matrix (SCM) of the array sensors signals. The SCM is usually unknown and needs to be estimated from the acquired data. Such solutions rely on high resolution spectral estimation techniques, popular algorithm based on HRSE are minimum variance beamformer and multiple signals classification (MUSIC) algorithm. These algorithms can be extended to wide band signals ex. Speech by transforming the signal into narrow band signal. Each narrow band signal can be processed individually (incoherent method) or a universal focusing SCM can be generated to perform a coherent localization.
Multiple source localization using spherical microphone arrays
2020
Direction-of-Arrival (DOA) estimation is a fundamental task in acoustic signal processing and is used in source separation, localization, tracking, environment mapping, speech enhancement and dereverberation. In applications such as hearing aids, robot audition, teleconferencing and meeting diarization, the presence of multiple simultaneously active sources often occurs. Therefore DOA estimation which is robust to Multi-Source (MS) scenarios is of particular importance. In the past decade, interest in Spherical Microphone Arrays (SMAs) has been rapidly grown due to its ability to analyse the sound field with equal resolution in all directions. Such symmetry makes SMAs suitable for applications in robot audition where potential variety of heights and positions of the talkers are expected. Acoustic signal processing for SMAs is often formulated in the Spherical Harmonic Domain (SHD) which describes the sound field in a form that is independent of the geometry of the SMA. DOA estimatio...
Robotics and Autonomous Systems, 2019
Human-robot interaction in natural settings requires filtering out the different sources of sounds from the environment. Such ability usually involves the use of microphone arrays to localize, track and separate sound sources online. Multimicrophone signal processing techniques can improve robustness to noise but the processing cost increases with the number of microphones used, limiting response time and widespread use on different types of mobile robots. Since sound source localization methods are the most expensive in terms of computing resources as they involve scanning a large 3D space, minimizing the amount of computations required would facilitate their implementation and use on robots. The robot's shape also brings constraints on the microphone array geometry and configurations. In addition, sound source localization methods usually return noisy features that need to be smoothed and filtered by tracking the sound sources. This paper presents a novel sound source localization method, called SRP-PHAT-HSDA, that scans space with coarse and fine resolution grids to reduce the number of memory lookups. A microphone directivity model is used to reduce the number of directions to scan and ignore non significant pairs of microphones. A configuration method is also introduced to automatically set parameters that are normally empirically tuned according to the shape of the microphone array. For sound source tracking, this paper presents a modified 3D Kalman (M3K) method capable of simultaneously tracking in 3D the directions of sound sources. Using a 16-microphone array and low cost hardware, results show that SRP-PHAT-HSDA and M3K perform at least as well as other sound source localization and tracking methods while using up to 4 and 30 times less computing resources respectively.
Direction finding of more sources than sensors is appealing in situations with small sensor arrays. Potential applications include surveillance, teleconferencing, and auditory scene analysis for hearing aids. A new technique for time-frequency-sparse sources, such as speech and vehicle sounds, uses a coherence test to identify low-rank time-frequency bins. These low-rank bins are processed in one of two ways: ͑1͒ narrowband spatial spectrum estimation at each bin followed by summation of directional spectra across time and frequency or ͑2͒ clustering low-rank covariance matrices, averaging covariance matrices within clusters, and narrowband spatial spectrum estimation of each cluster. Experimental results with omnidirectional microphones and colocated directional microphones demonstrate the algorithm's ability to localize 3-5 simultaneous speech sources over 4 s with 2-3 microphones to less than 1 degree of error, and the ability to localize simultaneously two moving military vehicles and small arms gunfire.
Comparison Between Different Sound Source Localization Techniques Based on a Real Data Collection
2008 Hands-Free Speech Communication and Microphone Arrays, 2008
Comparing the different sound source localization techniques, proposed in the literature during the last decade, represents a relevant topic in order to establish advantages and disadvantages of a given approach in a real-time implementation. Traditionally, algorithms for sound source localization rely on an estimation of Time Difference of Arrival (TDOA) at microphone pairs through GCC-PHAT. When several microphone pairs are available the source position can be estimated as the point in space that best fits the set of TDOA measurements by applying Global Coherence Field (GCF), also known as SRP-PHAT, or Oriented Global Coherence Field (OGCF). A first interesting analysis compares the performance of GCF and OGCF to a suboptimal LS search method. In a second step, Adaptive Eigenvalue Decomposition is implemented as an alternative to GCC-PHAT in TDOA estimation. Comparative experiments are conducted on signals acquired by a linear array during WOZ experiments in an interactive-TV scenario. Changes in performance according to different SNR levels are reported.