Suliang Bu - Academia.edu (original) (raw)

Papers by Suliang Bu

Research paper thumbnail of TDOA Estimation of Speech Source in Noisy Reverberant Environments

2022 IEEE Spoken Language Technology Workshop (SLT)

Research paper thumbnail of Joint Estimation of DOA and Distance in Noisy Reverberant Conditions

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Sound Source localization (SSL) using microphone arrays is an active research topic with many app... more Sound Source localization (SSL) using microphone arrays is an active research topic with many applications, but noise and reverberation make the direction-of-arrival (DOA) and distance estimation a challenging problem. In this work, we propose a novel method to jointly estimate the DOA and distance in noisy and reverberant environments. Our method exploits the linear phase structure across frequencies in a steering vector (SV). We convert the joint estimation issue into an optimization problem, which can be solved by Newton's method augmented by a gradient ascent method. Our method does not depend on certain microphone array geometry, and it can also be extended to estimate the elevation angle. We conducted experimental evaluations in simulated noisy and reverberant acoustic conditions, which verified the superiority of our proposed method to several established methods in estimation accuracy and computation efficiency.

Research paper thumbnail of Steering vector correction in MVDR beamformer for speech enhancement

Research paper thumbnail of Modeling Speech Structure to Improve T-F Masks for Speech Enhancement and Recognition

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Research paper thumbnail of A novel dynamic parameters calculation approach for model compensation

Model compensation approach has been successfully applied to various noise robust speech recognit... more Model compensation approach has been successfully applied to various noise robust speech recognition tasks. In this paper, based on Continuous Time (CT) approximation, the dynamic mismatch function is derived without further approximation. With such mismatch function, a novel approach to deriving the formula for calculating the dynamic statistics is presented. Besides, we also provide an insight on the processing of the pseudo inverse of non-square discrete cosine transform (DCT) matrix during model compensation. Experiments on Aurora 4 showed that the proposed approach obtained 23.2% relative WER reduction over traditional first-order Vector Taylor Series (VTS) approach.

Research paper thumbnail of A Robust Nonlinear Microphone Array Postfilter for Noise Reduction

2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 2018

We propose a robust nonlinear microphone array postfilter for noise reduction. This postfilter is... more We propose a robust nonlinear microphone array postfilter for noise reduction. This postfilter is formulated as a function of noise power ratio before and after beamforming and a local speech-to-observation power ratio. The two ratios are readily obtained during beamforming, and can be approximated by local speech posterior probability or time-frequency masks of neural network. This avoids the difficulty in estimating local speech and noise variances of a beamformed signal. On the CHiME-3 test set, we have evaluated our proposed postfilter in comparison with two other postfiltering methods, and our proposed postfilter has produced the best objective scores on the simulated noisy speech as well as higher listening preference scores on real noisy speech.

Research paper thumbnail of A novel dynamic parameters calculation approach for model compensation

Research paper thumbnail of Learning Speech Structure to Improve Time-Frequency Masks

Research paper thumbnail of A Probability Weighted Beamformer for Noise Robust ASR

Interspeech 2018, Sep 2, 2018

We investigate a novel approach to spatial filtering that is adaptive to conditions at different ... more We investigate a novel approach to spatial filtering that is adaptive to conditions at different time-frequency (TF) points for noise removal by taking advantage of speech sparsity. Our approach combines a noise reduction beamformer with a minimum variance distortionless response (MVDR) beamformer or Generalized Eigenvalue (GEV) beamformer through TF posterior probabilities of speech presence (PPSP). To estimate PPSP, we study both statistical model-based and neural network based methods, where in the former, we use complex Gaussian mixture modeling (CGMM) on temporally augmented spatial spectral features, and in the latter, we use neural network (NN) based TF masks to initialize speech and noise covariance matrices in CGMM. We have conducted experiments on CHiME-3 task. On its real noisy speech test set, our methods of feature augmentation, TF dependent spatial filter, and NN-based mask initialization on covariances for CGMM have yielded relative word error rate (WER) reductions cumulatively by 8%, 16%, and 25% over the original CGMM based MVDR. On the real test data, the three methods have also produced consistent WER reductions when replacing MVDR by GEV.

Research paper thumbnail of Multiple beamformers with ROVER for the CHiME-5 Challenge

CHiME 2018 Workshop on Speech Processing in Everyday Environments

Research paper thumbnail of A novel static parameter calculation method for model compensation

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015

Research paper thumbnail of A Novel Method to Correct Steering Vectors in MVDR Beamformer for Noise Robust ASR

Interspeech 2019

Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to o... more Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to obtain. In this work, we investigate a novel method to identify and correct phase errors in SV for MVDR beamforming. Our idea stems from the linear relationship in the phase of a microphone component in narrowband SVs across frequency, as modeled by acoustic transfer function. We utilize this property and feedforward neural nets to make phase prediction for the microphone components in SVs, and use the predicted phase selectively for phase error correction and MVDR beamforming. Our method is robust to large fluctuations in phase spectrum wrapped within [−π, π]. We have evaluated our approach on CHiME-3 and obtained improved performances on both word error rate and short-time objective intelligibility in low reverberant acoustic environments.

Research paper thumbnail of A Novel Method to Correct Steering Vectors in MVDR Beamformer for Noise Robust ASR

Interspeech 2019

Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to o... more Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to obtain. In this work, we investigate a novel method to identify and correct phase errors in SV for MVDR beamforming. Our idea stems from the linear relationship in the phase of a microphone component in narrowband SVs across frequency, as modeled by acoustic transfer function. We utilize this property and feedforward neural nets to make phase prediction for the microphone components in SVs, and use the predicted phase selectively for phase error correction and MVDR beamforming. Our method is robust to large fluctuations in phase spectrum wrapped within [−π, π]. We have evaluated our approach on CHiME-3 and obtained improved performances on both word error rate and short-time objective intelligibility in low reverberant acoustic environments.

Research paper thumbnail of A Novel Method to Correct Steering Vectors in MVDR Beamformer for Noise Robust ASR

Interspeech 2019

Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to o... more Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to obtain. In this work, we investigate a novel method to identify and correct phase errors in SV for MVDR beamforming. Our idea stems from the linear relationship in the phase of a microphone component in narrowband SVs across frequency, as modeled by acoustic transfer function. We utilize this property and feedforward neural nets to make phase prediction for the microphone components in SVs, and use the predicted phase selectively for phase error correction and MVDR beamforming. Our method is robust to large fluctuations in phase spectrum wrapped within [−π, π]. We have evaluated our approach on CHiME-3 and obtained improved performances on both word error rate and short-time objective intelligibility in low reverberant acoustic environments.

Research paper thumbnail of Second order vector taylor series based robust speech recognition

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014

Research paper thumbnail of TDOA Estimation of Speech Source in Noisy Reverberant Environments

2022 IEEE Spoken Language Technology Workshop (SLT)

Research paper thumbnail of Joint Estimation of DOA and Distance in Noisy Reverberant Conditions

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Sound Source localization (SSL) using microphone arrays is an active research topic with many app... more Sound Source localization (SSL) using microphone arrays is an active research topic with many applications, but noise and reverberation make the direction-of-arrival (DOA) and distance estimation a challenging problem. In this work, we propose a novel method to jointly estimate the DOA and distance in noisy and reverberant environments. Our method exploits the linear phase structure across frequencies in a steering vector (SV). We convert the joint estimation issue into an optimization problem, which can be solved by Newton's method augmented by a gradient ascent method. Our method does not depend on certain microphone array geometry, and it can also be extended to estimate the elevation angle. We conducted experimental evaluations in simulated noisy and reverberant acoustic conditions, which verified the superiority of our proposed method to several established methods in estimation accuracy and computation efficiency.

Research paper thumbnail of Steering vector correction in MVDR beamformer for speech enhancement

Research paper thumbnail of Modeling Speech Structure to Improve T-F Masks for Speech Enhancement and Recognition

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Research paper thumbnail of A novel dynamic parameters calculation approach for model compensation

Model compensation approach has been successfully applied to various noise robust speech recognit... more Model compensation approach has been successfully applied to various noise robust speech recognition tasks. In this paper, based on Continuous Time (CT) approximation, the dynamic mismatch function is derived without further approximation. With such mismatch function, a novel approach to deriving the formula for calculating the dynamic statistics is presented. Besides, we also provide an insight on the processing of the pseudo inverse of non-square discrete cosine transform (DCT) matrix during model compensation. Experiments on Aurora 4 showed that the proposed approach obtained 23.2% relative WER reduction over traditional first-order Vector Taylor Series (VTS) approach.

Research paper thumbnail of A Robust Nonlinear Microphone Array Postfilter for Noise Reduction

2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 2018

We propose a robust nonlinear microphone array postfilter for noise reduction. This postfilter is... more We propose a robust nonlinear microphone array postfilter for noise reduction. This postfilter is formulated as a function of noise power ratio before and after beamforming and a local speech-to-observation power ratio. The two ratios are readily obtained during beamforming, and can be approximated by local speech posterior probability or time-frequency masks of neural network. This avoids the difficulty in estimating local speech and noise variances of a beamformed signal. On the CHiME-3 test set, we have evaluated our proposed postfilter in comparison with two other postfiltering methods, and our proposed postfilter has produced the best objective scores on the simulated noisy speech as well as higher listening preference scores on real noisy speech.

Research paper thumbnail of A novel dynamic parameters calculation approach for model compensation

Research paper thumbnail of Learning Speech Structure to Improve Time-Frequency Masks

Research paper thumbnail of A Probability Weighted Beamformer for Noise Robust ASR

Interspeech 2018, Sep 2, 2018

We investigate a novel approach to spatial filtering that is adaptive to conditions at different ... more We investigate a novel approach to spatial filtering that is adaptive to conditions at different time-frequency (TF) points for noise removal by taking advantage of speech sparsity. Our approach combines a noise reduction beamformer with a minimum variance distortionless response (MVDR) beamformer or Generalized Eigenvalue (GEV) beamformer through TF posterior probabilities of speech presence (PPSP). To estimate PPSP, we study both statistical model-based and neural network based methods, where in the former, we use complex Gaussian mixture modeling (CGMM) on temporally augmented spatial spectral features, and in the latter, we use neural network (NN) based TF masks to initialize speech and noise covariance matrices in CGMM. We have conducted experiments on CHiME-3 task. On its real noisy speech test set, our methods of feature augmentation, TF dependent spatial filter, and NN-based mask initialization on covariances for CGMM have yielded relative word error rate (WER) reductions cumulatively by 8%, 16%, and 25% over the original CGMM based MVDR. On the real test data, the three methods have also produced consistent WER reductions when replacing MVDR by GEV.

Research paper thumbnail of Multiple beamformers with ROVER for the CHiME-5 Challenge

CHiME 2018 Workshop on Speech Processing in Everyday Environments

Research paper thumbnail of A novel static parameter calculation method for model compensation

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015

Research paper thumbnail of A Novel Method to Correct Steering Vectors in MVDR Beamformer for Noise Robust ASR

Interspeech 2019

Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to o... more Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to obtain. In this work, we investigate a novel method to identify and correct phase errors in SV for MVDR beamforming. Our idea stems from the linear relationship in the phase of a microphone component in narrowband SVs across frequency, as modeled by acoustic transfer function. We utilize this property and feedforward neural nets to make phase prediction for the microphone components in SVs, and use the predicted phase selectively for phase error correction and MVDR beamforming. Our method is robust to large fluctuations in phase spectrum wrapped within [−π, π]. We have evaluated our approach on CHiME-3 and obtained improved performances on both word error rate and short-time objective intelligibility in low reverberant acoustic environments.

Research paper thumbnail of A Novel Method to Correct Steering Vectors in MVDR Beamformer for Noise Robust ASR

Interspeech 2019

Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to o... more Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to obtain. In this work, we investigate a novel method to identify and correct phase errors in SV for MVDR beamforming. Our idea stems from the linear relationship in the phase of a microphone component in narrowband SVs across frequency, as modeled by acoustic transfer function. We utilize this property and feedforward neural nets to make phase prediction for the microphone components in SVs, and use the predicted phase selectively for phase error correction and MVDR beamforming. Our method is robust to large fluctuations in phase spectrum wrapped within [−π, π]. We have evaluated our approach on CHiME-3 and obtained improved performances on both word error rate and short-time objective intelligibility in low reverberant acoustic environments.

Research paper thumbnail of A Novel Method to Correct Steering Vectors in MVDR Beamformer for Noise Robust ASR

Interspeech 2019

Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to o... more Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to obtain. In this work, we investigate a novel method to identify and correct phase errors in SV for MVDR beamforming. Our idea stems from the linear relationship in the phase of a microphone component in narrowband SVs across frequency, as modeled by acoustic transfer function. We utilize this property and feedforward neural nets to make phase prediction for the microphone components in SVs, and use the predicted phase selectively for phase error correction and MVDR beamforming. Our method is robust to large fluctuations in phase spectrum wrapped within [−π, π]. We have evaluated our approach on CHiME-3 and obtained improved performances on both word error rate and short-time objective intelligibility in low reverberant acoustic environments.

Research paper thumbnail of Second order vector taylor series based robust speech recognition

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014