Suliang Bu - Academia.edu (original) (raw)
Papers by Suliang Bu
2022 IEEE Spoken Language Technology Workshop (SLT)
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Sound Source localization (SSL) using microphone arrays is an active research topic with many app... more Sound Source localization (SSL) using microphone arrays is an active research topic with many applications, but noise and reverberation make the direction-of-arrival (DOA) and distance estimation a challenging problem. In this work, we propose a novel method to jointly estimate the DOA and distance in noisy and reverberant environments. Our method exploits the linear phase structure across frequencies in a steering vector (SV). We convert the joint estimation issue into an optimization problem, which can be solved by Newton's method augmented by a gradient ascent method. Our method does not depend on certain microphone array geometry, and it can also be extended to estimate the elevation angle. We conducted experimental evaluations in simulated noisy and reverberant acoustic conditions, which verified the superiority of our proposed method to several established methods in estimation accuracy and computation efficiency.
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Model compensation approach has been successfully applied to various noise robust speech recognit... more Model compensation approach has been successfully applied to various noise robust speech recognition tasks. In this paper, based on Continuous Time (CT) approximation, the dynamic mismatch function is derived without further approximation. With such mismatch function, a novel approach to deriving the formula for calculating the dynamic statistics is presented. Besides, we also provide an insight on the processing of the pseudo inverse of non-square discrete cosine transform (DCT) matrix during model compensation. Experiments on Aurora 4 showed that the proposed approach obtained 23.2% relative WER reduction over traditional first-order Vector Taylor Series (VTS) approach.
2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 2018
We propose a robust nonlinear microphone array postfilter for noise reduction. This postfilter is... more We propose a robust nonlinear microphone array postfilter for noise reduction. This postfilter is formulated as a function of noise power ratio before and after beamforming and a local speech-to-observation power ratio. The two ratios are readily obtained during beamforming, and can be approximated by local speech posterior probability or time-frequency masks of neural network. This avoids the difficulty in estimating local speech and noise variances of a beamformed signal. On the CHiME-3 test set, we have evaluated our proposed postfilter in comparison with two other postfiltering methods, and our proposed postfilter has produced the best objective scores on the simulated noisy speech as well as higher listening preference scores on real noisy speech.
Interspeech 2018, Sep 2, 2018
We investigate a novel approach to spatial filtering that is adaptive to conditions at different ... more We investigate a novel approach to spatial filtering that is adaptive to conditions at different time-frequency (TF) points for noise removal by taking advantage of speech sparsity. Our approach combines a noise reduction beamformer with a minimum variance distortionless response (MVDR) beamformer or Generalized Eigenvalue (GEV) beamformer through TF posterior probabilities of speech presence (PPSP). To estimate PPSP, we study both statistical model-based and neural network based methods, where in the former, we use complex Gaussian mixture modeling (CGMM) on temporally augmented spatial spectral features, and in the latter, we use neural network (NN) based TF masks to initialize speech and noise covariance matrices in CGMM. We have conducted experiments on CHiME-3 task. On its real noisy speech test set, our methods of feature augmentation, TF dependent spatial filter, and NN-based mask initialization on covariances for CGMM have yielded relative word error rate (WER) reductions cumulatively by 8%, 16%, and 25% over the original CGMM based MVDR. On the real test data, the three methods have also produced consistent WER reductions when replacing MVDR by GEV.
CHiME 2018 Workshop on Speech Processing in Everyday Environments
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015
Interspeech 2019
Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to o... more Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to obtain. In this work, we investigate a novel method to identify and correct phase errors in SV for MVDR beamforming. Our idea stems from the linear relationship in the phase of a microphone component in narrowband SVs across frequency, as modeled by acoustic transfer function. We utilize this property and feedforward neural nets to make phase prediction for the microphone components in SVs, and use the predicted phase selectively for phase error correction and MVDR beamforming. Our method is robust to large fluctuations in phase spectrum wrapped within [−π, π]. We have evaluated our approach on CHiME-3 and obtained improved performances on both word error rate and short-time objective intelligibility in low reverberant acoustic environments.
Interspeech 2019
Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to o... more Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to obtain. In this work, we investigate a novel method to identify and correct phase errors in SV for MVDR beamforming. Our idea stems from the linear relationship in the phase of a microphone component in narrowband SVs across frequency, as modeled by acoustic transfer function. We utilize this property and feedforward neural nets to make phase prediction for the microphone components in SVs, and use the predicted phase selectively for phase error correction and MVDR beamforming. Our method is robust to large fluctuations in phase spectrum wrapped within [−π, π]. We have evaluated our approach on CHiME-3 and obtained improved performances on both word error rate and short-time objective intelligibility in low reverberant acoustic environments.
Interspeech 2019
Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to o... more Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to obtain. In this work, we investigate a novel method to identify and correct phase errors in SV for MVDR beamforming. Our idea stems from the linear relationship in the phase of a microphone component in narrowband SVs across frequency, as modeled by acoustic transfer function. We utilize this property and feedforward neural nets to make phase prediction for the microphone components in SVs, and use the predicted phase selectively for phase error correction and MVDR beamforming. Our method is robust to large fluctuations in phase spectrum wrapped within [−π, π]. We have evaluated our approach on CHiME-3 and obtained improved performances on both word error rate and short-time objective intelligibility in low reverberant acoustic environments.
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014
2022 IEEE Spoken Language Technology Workshop (SLT)
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Sound Source localization (SSL) using microphone arrays is an active research topic with many app... more Sound Source localization (SSL) using microphone arrays is an active research topic with many applications, but noise and reverberation make the direction-of-arrival (DOA) and distance estimation a challenging problem. In this work, we propose a novel method to jointly estimate the DOA and distance in noisy and reverberant environments. Our method exploits the linear phase structure across frequencies in a steering vector (SV). We convert the joint estimation issue into an optimization problem, which can be solved by Newton's method augmented by a gradient ascent method. Our method does not depend on certain microphone array geometry, and it can also be extended to estimate the elevation angle. We conducted experimental evaluations in simulated noisy and reverberant acoustic conditions, which verified the superiority of our proposed method to several established methods in estimation accuracy and computation efficiency.
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Model compensation approach has been successfully applied to various noise robust speech recognit... more Model compensation approach has been successfully applied to various noise robust speech recognition tasks. In this paper, based on Continuous Time (CT) approximation, the dynamic mismatch function is derived without further approximation. With such mismatch function, a novel approach to deriving the formula for calculating the dynamic statistics is presented. Besides, we also provide an insight on the processing of the pseudo inverse of non-square discrete cosine transform (DCT) matrix during model compensation. Experiments on Aurora 4 showed that the proposed approach obtained 23.2% relative WER reduction over traditional first-order Vector Taylor Series (VTS) approach.
2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 2018
We propose a robust nonlinear microphone array postfilter for noise reduction. This postfilter is... more We propose a robust nonlinear microphone array postfilter for noise reduction. This postfilter is formulated as a function of noise power ratio before and after beamforming and a local speech-to-observation power ratio. The two ratios are readily obtained during beamforming, and can be approximated by local speech posterior probability or time-frequency masks of neural network. This avoids the difficulty in estimating local speech and noise variances of a beamformed signal. On the CHiME-3 test set, we have evaluated our proposed postfilter in comparison with two other postfiltering methods, and our proposed postfilter has produced the best objective scores on the simulated noisy speech as well as higher listening preference scores on real noisy speech.
Interspeech 2018, Sep 2, 2018
We investigate a novel approach to spatial filtering that is adaptive to conditions at different ... more We investigate a novel approach to spatial filtering that is adaptive to conditions at different time-frequency (TF) points for noise removal by taking advantage of speech sparsity. Our approach combines a noise reduction beamformer with a minimum variance distortionless response (MVDR) beamformer or Generalized Eigenvalue (GEV) beamformer through TF posterior probabilities of speech presence (PPSP). To estimate PPSP, we study both statistical model-based and neural network based methods, where in the former, we use complex Gaussian mixture modeling (CGMM) on temporally augmented spatial spectral features, and in the latter, we use neural network (NN) based TF masks to initialize speech and noise covariance matrices in CGMM. We have conducted experiments on CHiME-3 task. On its real noisy speech test set, our methods of feature augmentation, TF dependent spatial filter, and NN-based mask initialization on covariances for CGMM have yielded relative word error rate (WER) reductions cumulatively by 8%, 16%, and 25% over the original CGMM based MVDR. On the real test data, the three methods have also produced consistent WER reductions when replacing MVDR by GEV.
CHiME 2018 Workshop on Speech Processing in Everyday Environments
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015
Interspeech 2019
Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to o... more Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to obtain. In this work, we investigate a novel method to identify and correct phase errors in SV for MVDR beamforming. Our idea stems from the linear relationship in the phase of a microphone component in narrowband SVs across frequency, as modeled by acoustic transfer function. We utilize this property and feedforward neural nets to make phase prediction for the microphone components in SVs, and use the predicted phase selectively for phase error correction and MVDR beamforming. Our method is robust to large fluctuations in phase spectrum wrapped within [−π, π]. We have evaluated our approach on CHiME-3 and obtained improved performances on both word error rate and short-time objective intelligibility in low reverberant acoustic environments.
Interspeech 2019
Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to o... more Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to obtain. In this work, we investigate a novel method to identify and correct phase errors in SV for MVDR beamforming. Our idea stems from the linear relationship in the phase of a microphone component in narrowband SVs across frequency, as modeled by acoustic transfer function. We utilize this property and feedforward neural nets to make phase prediction for the microphone components in SVs, and use the predicted phase selectively for phase error correction and MVDR beamforming. Our method is robust to large fluctuations in phase spectrum wrapped within [−π, π]. We have evaluated our approach on CHiME-3 and obtained improved performances on both word error rate and short-time objective intelligibility in low reverberant acoustic environments.
Interspeech 2019
Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to o... more Accurate steering vectors (SV) are key to many beamformers. However, reliable SV is not easy to obtain. In this work, we investigate a novel method to identify and correct phase errors in SV for MVDR beamforming. Our idea stems from the linear relationship in the phase of a microphone component in narrowband SVs across frequency, as modeled by acoustic transfer function. We utilize this property and feedforward neural nets to make phase prediction for the microphone components in SVs, and use the predicted phase selectively for phase error correction and MVDR beamforming. Our method is robust to large fluctuations in phase spectrum wrapped within [−π, π]. We have evaluated our approach on CHiME-3 and obtained improved performances on both word error rate and short-time objective intelligibility in low reverberant acoustic environments.
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014