EURASIP Journal on Applied Signal Processing 2003:11, 1064–1073 c © 2003 Hindawi Publishing Corporation (original) (raw)

An Integrated Real-Time Beamforming and Postfiltering System for Nonstationary Noise Environments

EURASIP Journal on Advances in Signal Processing, 2003

We present a novel approach for real-time multichannel speech enhancement in environments of nonstationary noise and timevarying acoustical transfer functions (ATFs). The proposed system integrates adaptive beamforming, ATF identification, soft signal detection, and multichannel postfiltering. The noise canceller branch of the beamformer and the ATF identification are adaptively updated online, based on hypothesis test results. The noise canceller is updated only during stationary noise frames, and the ATF identification is carried out only when desired source components have been detected. The hypothesis testing is based on the nonstationarity of the signals and the transient power ratio between the beamformer primary output and its reference noise signals. Following the beamforming and the hypothesis testing, estimates for the signal presence probability and for the noise power spectral density are derived. Subsequently, an optimal spectral gain function that minimizes the mean square error of the log-spectral amplitude (LSA) is applied. Experimental results demonstrate the usefulness of the proposed system in nonstationary noise environments.

Adaptation mode control with residual noise estimation for beamformer-based multi-channel speech enhancement

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012

In this paper, we propose a new adaptation mode controller (AMC) for a generalized sidelobe canceller (GSC) having prior knowledge of the direction-of-arrival (DOA) of a desired speech source. In order to optimize the adaptation mode of a GSC, the residual noise remaining in the GSC output must be employed for adapting the AMC. The residual noise in the GSC output is estimated by using a short-time Fourier transform (STFT)-based Wiener filter, where a priori signal-to-noise ratio (SNR) and a posteriori target-to-nontarget-directional signal ratio (TNR) are estimated based on a decision-directed approach and a DOA-based approach, respectively. The estimated residual noise is finally incorporated as a control parameter into the adaptive filters in the AMC. The performance of the proposed AMC is evaluated by measuring the perceptual evaluation of speech quality (PESQ) scores and cepstral distortion in car noise environments with SNRs from 0 to 20 dB. Experimental results show that the proposed AMC performs better than the conventional AMCs.

Signal enhancement using beamforming and nonstationarity with applications to speech

IEEE Transactions on Signal Processing, 2001

We consider a sensor array located in an enclosure, where arbitrary transfer functions (TFs) relate the source signal and the sensors. The array is used for enhancing a signal contaminated by interference. Constrained minimum power adaptive beamforming, which has been suggested by Frost and, in particular, the generalized sidelobe canceler (GSC) version, which has been developed by Griffiths and Jim, are the most widely used beamforming techniques. These methods rely on the assumption that the received signals are simple delayed versions of the source signal. The good interference suppression attained under this assumption is severely impaired in complicated acoustic environments, where arbitrary TFs may be encountered. In this paper, we consider the arbitrary TF case. We propose a GSC solution, which is adapted to the general TF case. We derive a suboptimal algorithm that can be implemented by estimating the TFs ratios, instead of estimating the TFs. The TF ratios are estimated by exploiting the nonstationarity characteristics of the desired signal. The algorithm is applied to the problem of speech enhancement in a reverberating room. The discussion is supported by an experimental study using speech and noise signals recorded in an actual room acoustics environment.

Speech enhancement in nonstationary noise environments using noise properties

Speech Communication, 2006

Traditional short-time spectral attenuation (STSA) speech enhancement algorithms are ineffective in the presence of highly nonstationary noise due to difficulties in the accurate estimation of the local noise spectrum. With a view to improve the speech quality in the presence of random noise bursts, characteristic of many environmental sounds, a simple postprocessing scheme is proposed that can be applied to the output of an STSA speech enhancement algorithm. The postprocessing algorithm is based on using spectral properties of the noise in order to detect noisy time-frequency regions which are then attenuated using a SNR-based rule. A suitable suppression rule is developed that is applied to the detected noisy regions so as to achieve significant reduction of noise with minimal speech distortion. The post-processing method is evaluated in the context of two well-known STSA speech enhancement algorithms and experimental results demonstrating improved speech quality are presented for a data set of real noise samples.

A Neural Beamspace-Domain Filter for Real-Time Multi-Channel Speech Enhancement

Symmetry

Most deep-learning-based multi-channel speech enhancement methods focus on designing a set of beamforming coefficients, to directly filter the low signal-to-noise ratio signals received by microphones, which hinders the performance of these approaches. To handle these problems, this paper designs a causal neural filter that fully exploits the spectro-temporal-spatial information in the beamspace domain. Specifically, multiple beams are designed to steer towards all directions, using a parameterized super-directive beamformer in the first stage. After that, a deep-learning-based filter is learned by, simultaneously, modeling the spectro-temporal-spatial discriminability of the speech and the interference, so as to extract the desired speech, coarsely, in the second stage. Finally, to further suppress the interference components, especially at low frequencies, a residual estimation module is adopted, to refine the output of the second stage. Experimental results demonstrate that the p...

Joint dereverberation and noise reduction using beamforming and a single-channel speech enhancement scheme

The REVERB challenge provides a common framework for the evaluation of speech enhancement algorithms in the presence of both reverberation and noise. This contribution proposes a system consisting of a commonly used combination of a beamformer with a single-channel speech enhancement scheme aiming at joint dereverberation and noise reduction. First, a minimum variance distortionless response beamformer with an on-line estimated noise coherence matrix is used to suppress the noise and possibly some reflections. The beamformer output is then processed by a single-channel speech enhancement scheme, incorporating temporal cepstrum smoothing which suppresses both reverberation and residual noise. Experimental results show that improvements are particularly significant in conditions with high reverberation times.

On the application of the LCMV beamformer to speech enhancement

2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009

In theory the linearly constrained minimum variance (LCMV) beamformer can achieve perfect dereverberation and noise cancellation when the acoustic transfer functions (ATFs) between all sources (including interferences) and the microphones are known. However, blind estimation of the ATFs remains a difficult task. In this paper the noise reduction of the LCMV beamformer is analyzed and compared with the noise reduction of the minimum variance distortionless response (MVDR) beamformer. In addition, it is shown that the constraint of the LCMV can be modified such that we only require relative transfer functions rather than ATFs to achieve perfect cancellation of coherent interferences. Finally, we evaluate the noise reduction performance achieved by the LCMV and MVDR beamformers for two coherent sources: one desired and one undesired.

Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments

1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), 1999

Speech enhancement algorithms which are based on estimating the short-time spectral amplitude of the clean speech have better performance when a soft-decision gain modification, depending on the a priori probability of speech absence, is used. In reported works a fixed probability, q, is assumed. Since speech is non-stationary and may not be present in every frequency bin when voiced, we propose a method for estimating distinct values of q for different bins which are tracked in time. The estimation is based on a decision-theoretic approach for setting a threshold in each bin followed by short-time averaging. The estimated q' s are used to control both the gain and the update of the estimated noise spectrum during speech presence in a modified MMSE log-spectral amplitude estimator. Subjective tests resulted in higher scores than for the IS-127 standard enhancement algorithm, when pre-processing noisy speech for a coding application.

Successive Relative Transfer Function Identification Using Single Microphone Speech Enhancement

Zenodo (CERN European Organization for Nuclear Research), 2018

A distortionless speech extraction in a reverberant environment can be achieved by an application of a beamforming algorithm, provided that the relative transfer functions (RTFs) of the sources and the covariance matrix of the noise are known. In this contribution, we consider the RTF identification challenge in a multi-source scenario. We propose a successive RTF identification (SRI), based on a sole assumption that sources become successively active. The proposed algorithm identifies the RTF of the ith speech source assuming that the RTFs of all other sources in the environment and the power spectral density (PSD) matrix of the noise were previously estimated. The proposed RTF identification algorithm is based on the neural network Mix-Max (NN-MM) single microphone speech enhancement algorithm, followed by a least-squares (LS) system identification method. The proposed RTF estimation algorithm is validated by simulation.

A Brief Survey of Speech Enhancement 1

We present a brief overview of the speech enhancement problem for wide-band noise sources that are not correlated with the speech signal. Our main focus is on the spectral subtraction approach and some of its derivatives in the forms of linear and non-linear minimum mean square error estimators. For the linear case, we review the signal subspace approach, and for the non-linear case, we review spectral magnitude and phase estimators. On line estimation of the second order statistics of speech signals using parametric and non-parametric models is also addressed.