Sina Hafezi - Academia.edu (original) (raw)
Papers by Sina Hafezi
arXiv (Cornell University), Nov 29, 2023
Signal-dependent beamformers are advantageous over signal-independent beamformers when the acoust... more Signal-dependent beamformers are advantageous over signal-independent beamformers when the acoustic scenario-be it real-world or simulated-is straightforward in terms of the number of sound sources, the ambient sound field and their dynamics. However, in the context of augmented reality audio using head-worn microphone arrays, the acoustic scenarios encountered are often far from straightforward. The design of robust, high-performance, adaptive beamformers for such scenarios is an ongoing challenge. This is due to the violation of the typically required assumptions on the noise field caused by, for example, rapid variations resulting from complex acoustic environments, and/or rotations of the listener's head. This work proposes a multi-channel speech enhancement algorithm which utilises the adaptability of signal-dependent beamformers while still benefiting from the computational efficiency and robust performance of signal-independent super-directive beamformers. The algorithm has two stages. (i) The first stage is a hybrid beamformer based on a dictionary of weights corresponding to a set of noise field models. (ii) The second stage is a wide-band subspace post-filter to remove any artifacts resulting from (i). The algorithm is evaluated using both real-world recordings and simulations of a cocktail-party scenario. Noise suppression, intelligibility and speech quality results show a significant performance improvement by the proposed algorithm compared to the baseline super-directive beamformer. A datadriven implementation of the noise field dictionary is shown to provide more noise suppression, and similar speech intelligibility and quality, compared to a parametric dictionary.
IEEE Conference Proceedings, 2016
Zenodo (CERN European Organization for Nuclear Research), Jan 25, 2018
A common approach to multiple Direction-of-Arrival (DOA) estimation of speech sources is to ident... more A common approach to multiple Direction-of-Arrival (DOA) estimation of speech sources is to identify Time-Frequency (TF) bins with dominant Single Source (SS) and apply DOA estimation such as Multiple Signal Classification (MUSIC) only on those TF bins. In the state-of-the-art Direct Path Dominance (DPD)-MUSIC, the covariance matrix, used as the input to MUSIC, is calculated using only the TF bins over a local TF region where only a SS is dominant. In this work, we propose an alternative approach to MUSIC in which all the SS-dominant TF bins for each speaker across TF domain are globally used to improve the quality of covariance matrix for MUSIC. Our recently proposed Multi-Source Estimation Consistency (MSEC) technique, which exploits the consistency of initial DOA estimates within a time frame based on adaptive clustering, is used to estimate the SS-dominant TF bins for each speaker. The simulation using spherical microphone array shows that our proposed MSEC-MUSIC significantly outperforms the state-of-the-art DPD-MUSIC with less than 6.5°mean estimation error and strong robustness to widely varying source separation for up to 5 sources in the presence of realistic reverberation and sensor noise.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
A two-stage multi-channel speech enhancement method is proposed which consists of a novel adaptiv... more A two-stage multi-channel speech enhancement method is proposed which consists of a novel adaptive beamformer, Hybrid Minimum Variance Distortionless Response (MVDR), Isotropic-MVDR (Iso), and a novel multi-channel spectral Principal Components Analysis (PCA) denoising. In the first stage, the Hybrid-MVDR performs multiple MVDRs using a dictionary of pre-defined noise field models and picks the minimum-power outcome, which benefits from the robustness of signal-independent beamforming and the performance of adaptive beamforming. In the second stage, the outcomes of Hybrid and Iso are jointly used in a two-channel PCA-based denoising to remove the 'musical noise' produced by Hybrid beamformer. On a dataset of real 'cocktail-party' recordings with head-worn array, the proposed method outperforms the baseline superdirective beamformer in noise suppression (fwSegSNR, SDR, SIR, SAR) and speech intelligibility (STOI) with similar speech quality (PESQ) improvement.
arXiv (Cornell University), Mar 15, 2023
2022 International Workshop on Acoustic Signal Enhancement (IWAENC)
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Acoustic beamforming is routinely used to improve the SNR of the received signal in applications ... more Acoustic beamforming is routinely used to improve the SNR of the received signal in applications such as hearing aids, robot audition, augmented reality, teleconferencing, source localisation and source tracking. The beamformer can be made adaptive by using an estimate of the time-varying noise covariance matrix in the spectral domain to determine an optimised beam pattern in each frequency bin that is specific to the acoustic environment and that can respond to temporal changes in it. However, robust estimation of the noise covariance matrix remains a challenging task especially in non-stationary acoustic environments. This paper presents a compact model of the signal covariance matrix that is defined by a small number of parameters whose values can be reliably estimated. The model leads to a robust estimate of the noise covariance matrix which can, in turn, be used to construct a beamformer. The performance of beamformers designed using this approach is evaluated for a spherical microphone array under a range of conditions using both simulated and measured room impulse responses. The proposed approach demonstrates consistent gains in intelligibility and perceptual quality metrics compared to the static and adaptive beamformers used as baselines.
EAGE Workshop on Fiber Optic Sensing for Energy Applications in Asia Pacific, 2020
EAGE GeoTech 2021 Second EAGE Workshop on Distributed Fibre Optic Sensing, 2021
Direction-of-Arrival (DOA) estimation is a fundamental task in acoustic signal processing and is ... more Direction-of-Arrival (DOA) estimation is a fundamental task in acoustic signal processing and is used in source separation, localization, tracking, environment mapping, speech enhancement and dereverberation. In applications such as hearing aids, robot audition, teleconferencing and meeting diarization, the presence of multiple simultaneously active sources often occurs. Therefore DOA estimation which is robust to Multi-Source (MS) scenarios is of particular importance. In the past decade, interest in Spherical Microphone Arrays (SMAs) has been rapidly grown due to its ability to analyse the sound field with equal resolution in all directions. Such symmetry makes SMAs suitable for applications in robot audition where potential variety of heights and positions of the talkers are expected. Acoustic signal processing for SMAs is often formulated in the Spherical Harmonic Domain (SHD) which describes the sound field in a form that is independent of the geometry of the SMA. DOA estimatio...
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017
Pseudointensity vectors (PIVs) provide a means of Direction of Arrival (DOA) estimation for Spher... more Pseudointensity vectors (PIVs) provide a means of Direction of Arrival (DOA) estimation for Spherical Microphone Arrays (SMAs) using only the zeroth and the first-order spherical harmonics. An Augmented Intensity Vector (AIV) is proposed which improves the accuracy of PIVs by exploiting higher order spherical harmonics. We compared DOA estimation using our proposed AIVs against PIVs, Steered Response Power (SRP) and subspace methods where the number of sources, their angular separation, the reverberation time of the room and the sensor noise level are varied. The results show that the proposed approach outperforms the baseline methods and performs at least as accurately as the state-of-the-art method with strong robustness to reverberation, sensor noise and number of sources. In the single and multiple source scenarios tested, which include realistic levels of reverberation and noise, the proposed method had average error of 1.5°and 2°, respectively.
The Journal of the Acoustical Society of America, 2019
A conventional approach to wideband multi-source (MS) direction-of-arrival (DOA) estimation is to... more A conventional approach to wideband multi-source (MS) direction-of-arrival (DOA) estimation is to perform single source (SS) DOA estimation in time-frequency (TF) bins for which a SS assumption is valid. The typical SS-validity confidence metrics analyse the validity of the SS assumption over a fixed-size TF region local to the TF bin. The performance of such methods degrades as the number of simultaneously active sources increases due to the associated decrease in the size of the TF regions where the SS assumption is valid. A SS-validity confidence metric is proposed that exploits a dynamic MS assumption over relatively larger TF regions. The proposed metric first clusters the initial DOA estimates (one per TF bin) and then uses the members' spatial consistency as well as its cluster's spread to weight each TF bin. Distance-based and density-based clustering are employed as two alternative approaches for clustering DOAs. A noise-robust density-based clustering is also used ...
A cased hole well with inflow control devices (ICDs) was logged for production profiling as part ... more A cased hole well with inflow control devices (ICDs) was logged for production profiling as part of a field trial campaign testing a new fiber-optic wireline system. Pulled by a conventional tractor, both the fiber-optic wireline and a set of conventional production logging tools (PLTs) were placed at the bottom of a horizontal wellbore, locating the fiber across the reservoir for sensing purposes. The well produced at two different choke settings, enabling both technologies to capture low flow rate as well as high flow rate. The main objective with the testing was to compare the two technologies for production flow allocation and learn more about fiber-optic analytics. The two different measurements were performed as close in time as possible. While the fiber-optic cable was sensing, the PLT was stationary and not logging, and while PLT was logging, the fiber optic was deactivated. From fiber optics, high-quality noise logging plots were generated with distributed acoustics (DAS), ...
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017
The extraction of multiple Direction-of-Arrival (DoA) information from estimated spatial spectra ... more The extraction of multiple Direction-of-Arrival (DoA) information from estimated spatial spectra can be challenging when such spectra are noisy or the sources are adjacent. Smoothing or clustering techniques are typically used to remove the effect of noise or irregular peaks in the spatial spectra. As we will explain and show in this paper, the smoothing-based techniques require prior knowledge of minimum angular separation of the sources and the clustering-based techniques fail on noisy spatial spectrum. A broad class of localization techniques give direction estimates in each Time Frequency (TF) bin. Using this information as input, a novel technique for obtaining robust localization of multiple simultaneous sources is proposed using Estimation Consistency (EC) in the TF domain. The method is evaluated in the context of spherical microphone arrays. This technique does not require prior knowledge of the sources and by removing the noise in the estimated spatial spectrum makes clust...
2017 Hands-free Speech Communications and Microphone Arrays (HSCMA), 2017
In Direction-of-Arrival (DOA) estimation for multiple sources, removal of noisy data points from ... more In Direction-of-Arrival (DOA) estimation for multiple sources, removal of noisy data points from a set of local DOA estimates increases the resulting estimation accuracy, especially when there are many sources and they have small angular separation. In this work, we propose a post-processing technique for the enhancement of DOA extraction from a set of local estimates using the consistency of these estimates within the time frame based on adaptive multi-source assumption. Simulations in a realistic reverberant environment with sensor noise and up to 5 sources demonstrate that the proposed technique outperforms the baseline and state-of-the-art approaches. In these tests the proposed technique had the worst average error of 9°, robustness of 5° to widely varying source separation and 3° to number of sources.
The Journal of the Acoustical Society of America
A conventional approach to wideband multi-source (MS) direction-of-arrival (DOA) estimation is to... more A conventional approach to wideband multi-source (MS) direction-of-arrival (DOA) estimation is to perform single source (SS) DOA estimation in time-frequency (TF) bins for which a SS assumption is valid. Such methods use the W-disjoint orthogonality (WDO) assumption due to the speech sparseness. As the number of sources increases, the chance of violating the WDO assumption increases. As shown in the challenging scenarios with multiple simultaneously active sources over a short period of time masking each other, it is possible for a strongly masked source (due to inconsistency of activity or quietness) to be rarely dominant in a TF bin. SS-based DOA estimators fail in the detection or accurate localization of masked sources in such scenarios. Two analytical approaches are proposed for narrowband DOA estimation based on the MS assumption in a bin in the spherical harmonic domain. In the first approach, eigenvalue decomposition is used to decompose a MS scenario into multiple SS scenarios, and a SS-based analytical DOA estimation is performed on each. The second approach analytically estimates two DOAs per bin assuming the presence of two active sources per bin. The evaluation validates the improvement to double accuracy and robustness to sensor noise compared to the baseline methods.
Direction-of-Arrival (DOA) estimation for multiple simultaneously active acoustic sources without... more Direction-of-Arrival (DOA) estimation for multiple simultaneously active acoustic sources without knowledge of the number of sources and the noise level remains a challenging task. A method of source counting for DOA estimation using density-based clustering is proposed. Multiple Density-based Spatial Clustering of Applications with Noise (DBSCAN) with varying noise sensitivity is applied in an evolutionary procedure to obtain weighted centroids. An autonomous DB-SCAN is finally run on the weighted centroids to extract the final DOA estimates. The results using generated and estimated DOAs show that the proposed technique significantly outperforms the conventional histogram peak picking as well as the original DBSCAN and variations of Kmeans with ≤4° DOA estimation accuracy and improves the source counting.
2017 25th European Signal Processing Conference (EUSIPCO), Aug 1, 2017
A common approach to multiple Direction-of-Arrival (DOA) estimation of speech sources is to ident... more A common approach to multiple Direction-of-Arrival (DOA) estimation of speech sources is to identify Time-Frequency (TF) bins with dominant Single Source (SS) and apply DOA estimation such as Multiple Signal Classification (MUSIC) only on those TF bins. In the state-of-the-art Direct Path Dominance (DPD)-MUSIC, the covariance matrix, used as the input to MUSIC, is calculated using only the TF bins over a local TF region where only a SS is dominant. In this work, we propose an alternative approach to MUSIC in which all the SS-dominant TF bins for each speaker across TF domain are globally used to improve the quality of covariance matrix for MUSIC. Our recently proposed Multi-Source Estimation Consistency (MSEC) technique, which exploits the consistency of initial DOA estimates within a time frame based on adaptive clustering, is used to estimate the SS-dominant TF bins for each speaker. The simulation using spherical microphone array shows that our proposed MSEC-MUSIC significantly outperforms the state-of-the-art DPD-MUSIC with less than 6.5°mean estimation error and strong robustness to widely varying source separation for up to 5 sources in the presence of realistic reverberation and sensor noise.
arXiv (Cornell University), Nov 29, 2023
Signal-dependent beamformers are advantageous over signal-independent beamformers when the acoust... more Signal-dependent beamformers are advantageous over signal-independent beamformers when the acoustic scenario-be it real-world or simulated-is straightforward in terms of the number of sound sources, the ambient sound field and their dynamics. However, in the context of augmented reality audio using head-worn microphone arrays, the acoustic scenarios encountered are often far from straightforward. The design of robust, high-performance, adaptive beamformers for such scenarios is an ongoing challenge. This is due to the violation of the typically required assumptions on the noise field caused by, for example, rapid variations resulting from complex acoustic environments, and/or rotations of the listener's head. This work proposes a multi-channel speech enhancement algorithm which utilises the adaptability of signal-dependent beamformers while still benefiting from the computational efficiency and robust performance of signal-independent super-directive beamformers. The algorithm has two stages. (i) The first stage is a hybrid beamformer based on a dictionary of weights corresponding to a set of noise field models. (ii) The second stage is a wide-band subspace post-filter to remove any artifacts resulting from (i). The algorithm is evaluated using both real-world recordings and simulations of a cocktail-party scenario. Noise suppression, intelligibility and speech quality results show a significant performance improvement by the proposed algorithm compared to the baseline super-directive beamformer. A datadriven implementation of the noise field dictionary is shown to provide more noise suppression, and similar speech intelligibility and quality, compared to a parametric dictionary.
IEEE Conference Proceedings, 2016
Zenodo (CERN European Organization for Nuclear Research), Jan 25, 2018
A common approach to multiple Direction-of-Arrival (DOA) estimation of speech sources is to ident... more A common approach to multiple Direction-of-Arrival (DOA) estimation of speech sources is to identify Time-Frequency (TF) bins with dominant Single Source (SS) and apply DOA estimation such as Multiple Signal Classification (MUSIC) only on those TF bins. In the state-of-the-art Direct Path Dominance (DPD)-MUSIC, the covariance matrix, used as the input to MUSIC, is calculated using only the TF bins over a local TF region where only a SS is dominant. In this work, we propose an alternative approach to MUSIC in which all the SS-dominant TF bins for each speaker across TF domain are globally used to improve the quality of covariance matrix for MUSIC. Our recently proposed Multi-Source Estimation Consistency (MSEC) technique, which exploits the consistency of initial DOA estimates within a time frame based on adaptive clustering, is used to estimate the SS-dominant TF bins for each speaker. The simulation using spherical microphone array shows that our proposed MSEC-MUSIC significantly outperforms the state-of-the-art DPD-MUSIC with less than 6.5°mean estimation error and strong robustness to widely varying source separation for up to 5 sources in the presence of realistic reverberation and sensor noise.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
A two-stage multi-channel speech enhancement method is proposed which consists of a novel adaptiv... more A two-stage multi-channel speech enhancement method is proposed which consists of a novel adaptive beamformer, Hybrid Minimum Variance Distortionless Response (MVDR), Isotropic-MVDR (Iso), and a novel multi-channel spectral Principal Components Analysis (PCA) denoising. In the first stage, the Hybrid-MVDR performs multiple MVDRs using a dictionary of pre-defined noise field models and picks the minimum-power outcome, which benefits from the robustness of signal-independent beamforming and the performance of adaptive beamforming. In the second stage, the outcomes of Hybrid and Iso are jointly used in a two-channel PCA-based denoising to remove the 'musical noise' produced by Hybrid beamformer. On a dataset of real 'cocktail-party' recordings with head-worn array, the proposed method outperforms the baseline superdirective beamformer in noise suppression (fwSegSNR, SDR, SIR, SAR) and speech intelligibility (STOI) with similar speech quality (PESQ) improvement.
arXiv (Cornell University), Mar 15, 2023
2022 International Workshop on Acoustic Signal Enhancement (IWAENC)
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Acoustic beamforming is routinely used to improve the SNR of the received signal in applications ... more Acoustic beamforming is routinely used to improve the SNR of the received signal in applications such as hearing aids, robot audition, augmented reality, teleconferencing, source localisation and source tracking. The beamformer can be made adaptive by using an estimate of the time-varying noise covariance matrix in the spectral domain to determine an optimised beam pattern in each frequency bin that is specific to the acoustic environment and that can respond to temporal changes in it. However, robust estimation of the noise covariance matrix remains a challenging task especially in non-stationary acoustic environments. This paper presents a compact model of the signal covariance matrix that is defined by a small number of parameters whose values can be reliably estimated. The model leads to a robust estimate of the noise covariance matrix which can, in turn, be used to construct a beamformer. The performance of beamformers designed using this approach is evaluated for a spherical microphone array under a range of conditions using both simulated and measured room impulse responses. The proposed approach demonstrates consistent gains in intelligibility and perceptual quality metrics compared to the static and adaptive beamformers used as baselines.
EAGE Workshop on Fiber Optic Sensing for Energy Applications in Asia Pacific, 2020
EAGE GeoTech 2021 Second EAGE Workshop on Distributed Fibre Optic Sensing, 2021
Direction-of-Arrival (DOA) estimation is a fundamental task in acoustic signal processing and is ... more Direction-of-Arrival (DOA) estimation is a fundamental task in acoustic signal processing and is used in source separation, localization, tracking, environment mapping, speech enhancement and dereverberation. In applications such as hearing aids, robot audition, teleconferencing and meeting diarization, the presence of multiple simultaneously active sources often occurs. Therefore DOA estimation which is robust to Multi-Source (MS) scenarios is of particular importance. In the past decade, interest in Spherical Microphone Arrays (SMAs) has been rapidly grown due to its ability to analyse the sound field with equal resolution in all directions. Such symmetry makes SMAs suitable for applications in robot audition where potential variety of heights and positions of the talkers are expected. Acoustic signal processing for SMAs is often formulated in the Spherical Harmonic Domain (SHD) which describes the sound field in a form that is independent of the geometry of the SMA. DOA estimatio...
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017
Pseudointensity vectors (PIVs) provide a means of Direction of Arrival (DOA) estimation for Spher... more Pseudointensity vectors (PIVs) provide a means of Direction of Arrival (DOA) estimation for Spherical Microphone Arrays (SMAs) using only the zeroth and the first-order spherical harmonics. An Augmented Intensity Vector (AIV) is proposed which improves the accuracy of PIVs by exploiting higher order spherical harmonics. We compared DOA estimation using our proposed AIVs against PIVs, Steered Response Power (SRP) and subspace methods where the number of sources, their angular separation, the reverberation time of the room and the sensor noise level are varied. The results show that the proposed approach outperforms the baseline methods and performs at least as accurately as the state-of-the-art method with strong robustness to reverberation, sensor noise and number of sources. In the single and multiple source scenarios tested, which include realistic levels of reverberation and noise, the proposed method had average error of 1.5°and 2°, respectively.
The Journal of the Acoustical Society of America, 2019
A conventional approach to wideband multi-source (MS) direction-of-arrival (DOA) estimation is to... more A conventional approach to wideband multi-source (MS) direction-of-arrival (DOA) estimation is to perform single source (SS) DOA estimation in time-frequency (TF) bins for which a SS assumption is valid. The typical SS-validity confidence metrics analyse the validity of the SS assumption over a fixed-size TF region local to the TF bin. The performance of such methods degrades as the number of simultaneously active sources increases due to the associated decrease in the size of the TF regions where the SS assumption is valid. A SS-validity confidence metric is proposed that exploits a dynamic MS assumption over relatively larger TF regions. The proposed metric first clusters the initial DOA estimates (one per TF bin) and then uses the members' spatial consistency as well as its cluster's spread to weight each TF bin. Distance-based and density-based clustering are employed as two alternative approaches for clustering DOAs. A noise-robust density-based clustering is also used ...
A cased hole well with inflow control devices (ICDs) was logged for production profiling as part ... more A cased hole well with inflow control devices (ICDs) was logged for production profiling as part of a field trial campaign testing a new fiber-optic wireline system. Pulled by a conventional tractor, both the fiber-optic wireline and a set of conventional production logging tools (PLTs) were placed at the bottom of a horizontal wellbore, locating the fiber across the reservoir for sensing purposes. The well produced at two different choke settings, enabling both technologies to capture low flow rate as well as high flow rate. The main objective with the testing was to compare the two technologies for production flow allocation and learn more about fiber-optic analytics. The two different measurements were performed as close in time as possible. While the fiber-optic cable was sensing, the PLT was stationary and not logging, and while PLT was logging, the fiber optic was deactivated. From fiber optics, high-quality noise logging plots were generated with distributed acoustics (DAS), ...
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017
The extraction of multiple Direction-of-Arrival (DoA) information from estimated spatial spectra ... more The extraction of multiple Direction-of-Arrival (DoA) information from estimated spatial spectra can be challenging when such spectra are noisy or the sources are adjacent. Smoothing or clustering techniques are typically used to remove the effect of noise or irregular peaks in the spatial spectra. As we will explain and show in this paper, the smoothing-based techniques require prior knowledge of minimum angular separation of the sources and the clustering-based techniques fail on noisy spatial spectrum. A broad class of localization techniques give direction estimates in each Time Frequency (TF) bin. Using this information as input, a novel technique for obtaining robust localization of multiple simultaneous sources is proposed using Estimation Consistency (EC) in the TF domain. The method is evaluated in the context of spherical microphone arrays. This technique does not require prior knowledge of the sources and by removing the noise in the estimated spatial spectrum makes clust...
2017 Hands-free Speech Communications and Microphone Arrays (HSCMA), 2017
In Direction-of-Arrival (DOA) estimation for multiple sources, removal of noisy data points from ... more In Direction-of-Arrival (DOA) estimation for multiple sources, removal of noisy data points from a set of local DOA estimates increases the resulting estimation accuracy, especially when there are many sources and they have small angular separation. In this work, we propose a post-processing technique for the enhancement of DOA extraction from a set of local estimates using the consistency of these estimates within the time frame based on adaptive multi-source assumption. Simulations in a realistic reverberant environment with sensor noise and up to 5 sources demonstrate that the proposed technique outperforms the baseline and state-of-the-art approaches. In these tests the proposed technique had the worst average error of 9°, robustness of 5° to widely varying source separation and 3° to number of sources.
The Journal of the Acoustical Society of America
A conventional approach to wideband multi-source (MS) direction-of-arrival (DOA) estimation is to... more A conventional approach to wideband multi-source (MS) direction-of-arrival (DOA) estimation is to perform single source (SS) DOA estimation in time-frequency (TF) bins for which a SS assumption is valid. Such methods use the W-disjoint orthogonality (WDO) assumption due to the speech sparseness. As the number of sources increases, the chance of violating the WDO assumption increases. As shown in the challenging scenarios with multiple simultaneously active sources over a short period of time masking each other, it is possible for a strongly masked source (due to inconsistency of activity or quietness) to be rarely dominant in a TF bin. SS-based DOA estimators fail in the detection or accurate localization of masked sources in such scenarios. Two analytical approaches are proposed for narrowband DOA estimation based on the MS assumption in a bin in the spherical harmonic domain. In the first approach, eigenvalue decomposition is used to decompose a MS scenario into multiple SS scenarios, and a SS-based analytical DOA estimation is performed on each. The second approach analytically estimates two DOAs per bin assuming the presence of two active sources per bin. The evaluation validates the improvement to double accuracy and robustness to sensor noise compared to the baseline methods.
Direction-of-Arrival (DOA) estimation for multiple simultaneously active acoustic sources without... more Direction-of-Arrival (DOA) estimation for multiple simultaneously active acoustic sources without knowledge of the number of sources and the noise level remains a challenging task. A method of source counting for DOA estimation using density-based clustering is proposed. Multiple Density-based Spatial Clustering of Applications with Noise (DBSCAN) with varying noise sensitivity is applied in an evolutionary procedure to obtain weighted centroids. An autonomous DB-SCAN is finally run on the weighted centroids to extract the final DOA estimates. The results using generated and estimated DOAs show that the proposed technique significantly outperforms the conventional histogram peak picking as well as the original DBSCAN and variations of Kmeans with ≤4° DOA estimation accuracy and improves the source counting.
2017 25th European Signal Processing Conference (EUSIPCO), Aug 1, 2017
A common approach to multiple Direction-of-Arrival (DOA) estimation of speech sources is to ident... more A common approach to multiple Direction-of-Arrival (DOA) estimation of speech sources is to identify Time-Frequency (TF) bins with dominant Single Source (SS) and apply DOA estimation such as Multiple Signal Classification (MUSIC) only on those TF bins. In the state-of-the-art Direct Path Dominance (DPD)-MUSIC, the covariance matrix, used as the input to MUSIC, is calculated using only the TF bins over a local TF region where only a SS is dominant. In this work, we propose an alternative approach to MUSIC in which all the SS-dominant TF bins for each speaker across TF domain are globally used to improve the quality of covariance matrix for MUSIC. Our recently proposed Multi-Source Estimation Consistency (MSEC) technique, which exploits the consistency of initial DOA estimates within a time frame based on adaptive clustering, is used to estimate the SS-dominant TF bins for each speaker. The simulation using spherical microphone array shows that our proposed MSEC-MUSIC significantly outperforms the state-of-the-art DPD-MUSIC with less than 6.5°mean estimation error and strong robustness to widely varying source separation for up to 5 sources in the presence of realistic reverberation and sensor noise.