A Robust Speech Enhancement Scheme on The Basis of Bone-conductive Microphones (original) (raw)
Related papers
A Real-Time Dual-Microphone Speech Enhancement Algorithm Assisted by Bone Conduction Sensor
Sensors, 2020
The quality and intelligibility of the speech are usually impaired by the interference of background noise when using internet voice calls. To solve this problem in the context of wearable smart devices, this paper introduces a dual-microphone, bone-conduction (BC) sensor assisted beamformer and a simple recurrent unit (SRU)-based neural network postfilter for real-time speech enhancement. Assisted by the BC sensor, which is insensitive to the environmental noise compared to the regular air-conduction (AC) microphone, the accurate voice activity detection (VAD) can be obtained from the BC signal and incorporated into the adaptive noise canceller (ANC) and adaptive block matrix (ABM). The SRU-based postfilter consists of a recurrent neural network with a small number of parameters, which improves the computational efficiency. The sub-band signal processing is designed to compress the input features of the neural network, and the scale-invariant signal-to-distortion ratio (SI-SDR) is ...
Air and bone-conductive integrated microphones for robust speech detection and enhancement
2003
We present a novel hardware device that combines a regular microphone with a bone-conductive microphone. The device looks like a regular headset and it can be plugged into any machine with a USB port. The bone-conductive microphone has an interesting property: it is insensitive to ambient noise and captures the low frequency portion of the speech signals. Thanks to the signals from the boneconductive microphone, we are able to detect very robustly whether the speaker is talking, eliminating more than 90% of background speech. Furthermore, by combining both channels, we are able to significantly remove background speech even when the background speaker speaks at the same time as the speaker wearing the headset.
Model-based speech enhancement using a bone-conducted signal
The Journal of the Acoustical Society of America, 2012
Codebook-based single-microphone noise suppressors, which exploit prior knowledge about speech and noise statistics, provide better performance in nonstationary noise. However, as the enhancement involves a joint optimization over speech and noise codebooks, this results in high computational complexity. A codebook-based method is proposed that uses a reference signal observed by a bone-conduction microphone, and a mapping between air-and bone-conduction codebook entries generated during an offline training phase. A smaller subset of air-conducted speech codebook entries that accurately models the clean speech signal is selected using this reference signal. Experiments support the expected improvement in performance at low computational complexity.
Direct filtering for air- and bone-conductive microphones
2004
bone-conductive integrated microphones have been introduced by the authors [5,4] for speech enhancement in noisy environments. In this paper, we present a novel technique, called direct filtering, to combine the two channels from the air-and bone-conductive microphone for speech enhancement. Compared to the previous technique, the advantage of the direct filtering is that it does not require any training, and it is speaker independent. Experiments show that this technique effectively removes noises and significantly improves speech recognition accuracy even in highly non-stationary noisy environments.
In this paper, we use voice activity detection to improve the de-noising ability of the previously proposed pre-image iteration speech enhancement method. We use a speech database consisting of two-channel recordings where the audio signal is recorded by both a bone conductive microphone and a close-talking microphone. The bone channel is used for voice activity detection as it can be assumed to be robust against environmental noise. The pre-image iteration method is prone to residual noise around speech components ?? we use the voice activity detection to remove this noise. The approach is evaluated using objective quality measures of the PEASS toolbox and shows an increase of the de-noising capability compared to the original method. Index Terms ?? Speech de-noising, voice activity detection, bone conductive microphone, pre-image iterations, speech enhancement
2011
Enhancement of speech signal degraded by additive background noise has received more attention over the past decade, due to wide range of applications and limitations of the available methods. Main objective of speech enhancement is to improve the perceptual aspects of speech such as overall quality, intelligibility and degree of listener fatigue. Among the all available methods the spectral subtraction algorithm is the historically one of the first algorithm, proposed for background noise reduction. The greatest asset of Spectral Subtraction Algorithm lies in its simplicity. The simple subtraction process comes at a price. More papers have been written describing variations of this algorithm that minimizes the shortcomings of the basic method than other algorithms. In this paper we present the review of basic spectral subtraction Algorithm, a short coming of basic spectral subtraction Algorithm, different modified approaches of Spectral Subtraction Algorithms such as Spectral Subtr...
The speech processing systems used to communicate or store speech are usually designed for a noise free environment. But the presence of background interference in the form of additive background and channel noise drastically degrades the performance of these systems, causing inaccurate information exchange and listener fatigue. To obtain a more intelligible speech signal, and one that is more pleasant to listen, noise reduction is very much needed. Most implementations and variations of the basic spectral subtraction technique advocate subtraction of the noise spectrum estimate over the entire speech spectrum. However the physical noise will not affect the speech uniformly over the entire. This work proposes a frequency dependent Spectral Subtraction method, which takes into account the fact that the background noise affects the speech spectrum differently at various frequencies. This proposed approach outperforms the standard power spectral subtraction method resulting in improved...
Effect of Speech enhancement using spectral subtraction on various noisy environment
IRJET, 2022
Analysis Modification Synthesis (AMS) plays a key role in many audio signal processing applications, separating the audio stream into time intervals with speech activity and time intervals without speech. Many features have been introduced into the literature that reflect the existence of language. Therefore, this article presents a structured overview of several established speech enhancement features targeting different characteristics of speech. Categorize features in terms of their exploitable properties. B. Evaluate performance in a background noise environment, different input SNR categories, and some dedicated functions. Our analysis shows how to select promising VAD features and find reasonable tradeoffs between performance and complexity. To estimate clean speech using the Fast Fourier Transform (FFT), we emphasize the noise spectrum estimated during speech, subtract it from the noisy speech spectrum, and consider the average amplitude of the clean spectrum. and tried to develop a new method to minimize the spectrum of loud sounds. The noise reduction algorithm uses MATLAB software to semi- duplicate the noisy speech data (overlap-add processing) and use FFT to calculate the corresponding amplitude spectrum to remove noise from the noisy speech. and performed by reversing the audio in time. Reconstructed with the Inverse Fast Fourier Transform (IFFT).
SPEECH ENHANCEMENT USING SPECTRAL SUBTRACTION TECHNIQUE WITH MINIMIZED CROSS SPECTRAL COMPONENTS
The aim of speech enhancement is to get significant reduction of noise and enhanced speech from noisy speech. There are several approaches for speech enhancement .earlier approaches didn't consider cross spectral terms into account. Cross spectral terms become prominent when processing window size becomes small i.e. 20ms-30ms. In this paper, an enhancement method is proposed for significant reduction of noise, and improvement in the quality and perceptibility of speech degraded by correlated additive background noise. The proposed method is based on the spectral subtraction technique. The simple spectral subtraction technique results in poor reduction of noise. One of the main reasons for this is neglecting the cross spectral terms of speech and noise, based on the appropriation that clean speech and noise signals are completely uncorrelated to each other, which is not true on short time basis. In this paper an improvement in reduction of the noise is achieved as compared to the earlier methods. This fact is mainly attributed to the cross spectral terms between speech and noise. This algorithm can be implemented and used in hearing aids for the benefit of hearing impaired people. Objective speech quality measures, spectrogram analyses and subjective listening tests conforms the proposed method is more effective in comparison with earlier speech enhancement techniques.
The Performance of Wearable Speech Enhancement System Under Noisy Environment: An Experimental Study
IEEE Access
Wearable speech enhancement can improve the recognition accuracy of the speech signals in stationary noise environments at 0dB to 60dB signal to noise ratio. Beamforming, adaptive noise reduction, and voice activity detection algorithms are used in wearable speech enhancement systems to enhance speech signals. In recent works, a word rate recognition accuracy of 63% for a 0db signal-to-noise ratio is not satisfactory for a robust speech recognition system. This paper discusses the experimental study using fixed beamforming, adaptive noise reduction, and voice activity detection algorithms with the inclusion of −10dB to 20dB signal to noise ratio for different types of noises to test the wearable speech enhancement system's performance in noisy environments. It also compares deep learning-based noise reduction methods as a benchmark for speech enhancement and word recognition for different noise levels. We have obtained an average word rate recognition accuracy of 5.74% at −10dB and 93.79% at 20dB for non-stationary noisy environments. The outcome of the experiments shows that the selected methods perform significantly better in the environment with high noise dB for both stationary and non-stationary noise. We found that there is no significant statistical difference between the stationary and non-stationary noise word recognition and SNRs level. However, the deep learning-based method performs significantly better than the fixed beamforming, adaptive noise reduction, and voice activity detection algorithms in all noisy levels. INDEX TERMS Wearable speech enhancement, beamforming, adaptive noise reduction, voice activity detection, deep learning.