ON EQUALIZATION OF BONE CONDUCTED SPEECH FOR IMPROVED SPEECH QUALITY (original) (raw)

Model-based speech enhancement using a bone-conducted signal

The Journal of the Acoustical Society of America, 2012

Codebook-based single-microphone noise suppressors, which exploit prior knowledge about speech and noise statistics, provide better performance in nonstationary noise. However, as the enhancement involves a joint optimization over speech and noise codebooks, this results in high computational complexity. A codebook-based method is proposed that uses a reference signal observed by a bone-conduction microphone, and a mapping between air-and bone-conduction codebook entries generated during an offline training phase. A smaller subset of air-conducted speech codebook entries that accurately models the clean speech signal is selected using this reference signal. Experiments support the expected improvement in performance at low computational complexity.

Method of LP-based blind restoration for improving intelligibility of bone-conducted speech

2007

Bone-conducted (BC) speech in an extremely noisy environment is stable against surrounding noise so that it may be able to be used instead of air-conducted (AC) speech for communication. However, it has very poor sound quality and its intelligibility is degraded when transmitted through bone conduction. Therefore, voice-quality and the intelligibility of BC speech need to be blindly improved in actual speech communication and this is a challenging new topic in the speech signalprocessing field. We proposed an LP-based model to restore BC speech to improve its voice-quality in a previous study. While other methods such as Long-term Fourier transform need to use numerous AC speech parameters to restore BC speech, the proposed model can blindly restore BC speech by predicting BC-LP coefficients from AC-LP coefficients. We improved the proposed model by (1) extending long-term processing to framebasis processing, (2) using LSF coefficients on LP representation, and (3) using a recurrent neural network for predicting parameters. We evaluated the improved model in comparison with other models to find out whether the model could adequately improve voice quality and the intelligibility of BC speech, using objective measures (LSD, MCD, and LCD) and carrying out Modified Rhyme Tests (MRTs). An evaluation of these three improvements to the LP-based model proved the practicability of blind-BC restoration.

LP-based method of blind restoration to improve intelligibility of bone-conducted speech

Thang TAT VU †a) , Massashi UNOKI †b) , and Masato AKAGI †c) , SUMMARY Bone-conducted (BC) speech can be used instead of air-conducted (AC) speech in an extremely noisy environment. However, its intelligibility is degraded when transmitted through bone-conduction. Therefore, voice quality and the intelligibility of BC speech need to be blindly improved in actual communication through speech and this is a challenging new topic in the field of speech signal processing. We proposed a linear prediction (LP) based model to restore BC speech to improve voice quality in a previous study. While other methods such as Long-term Fourier transform need to use numerous AC speech parameters to restore BC speech, the model we proposed demonstrated the expressed ability of blindly restoring BC speech by predicting AC-LP coefficients from BC-LP coefficients. We improved the previous model by (1) extending long-term processing to frame-basis processing, (2) using line spectral frequency (LSF) coefficients on an LP representation, and (3) using a recurrent neural network for predicting parameters. We evaluated the improved model in comparison with others to find out whether it could adequately improve voice quality and the intelligibility of BC speech, using objective measures (i.e., LSD, MCD, and LCD) and carrying out a subjective measure-a Japanese-word intelligibility test (JWIT). The experimental results proved significant improvements to our newly proposed models (LSF and LSF-SRN). The LSF model demonstrated it had significant capabilities for improving BC speech, i.e., both voice quality and intelligibility of speech. Our proposed model, LSF-SRN, demonstrated an expressed capability for improving the intelligibility of BC speech even when using blind restoration.

A Robust Speech Enhancement Scheme on The Basis of Bone-conductive Microphones

This paper presents a novel speech more reliably than conventional multi-channel schemes, enhancement scheme on the basis of bone-conductive in general the additional gains to be derived from them microphones. High performance in non-stationary noisy have not been commensurate with the greatly increased environments is achieved by combining the following two computations required by these methods to combine techniques: the speech detection based on the information from these different sensors. bone-conductive microphone signal, and non-stationary To overcome this problem, this paper proposes a noise suppression by adaptive spectral subtraction. Unlike new speech enhancement scheme for a non-stationary regular air-conductive microphones, bone-conductive noisy environment. Firstly, the bone-conductive microphones are insensitive to environmental noises and microphone based speech detection is used to distinguish background speech. By employing this advantage, we are between background noise segments and speech segments. able to detect very robustly speech in non-stationary noisy Secondly, the reliable speech detection results help to environments. Furthermore, based on the reliable speech determine when to re-estimate the noise threshold. detection results, the adaptive spectral subtraction can Thirdly, the adaptive spectral subtraction aims at operate effectively even under low SNR conditions. attenuating the non-stationary noise. And the final enhanced speech is produced.

Bone-conducted speech enhancement using deep denoising autoencoder

Speech Communication

Bone-conduction microphones (BCMs) capture speech signals based on the vibrations of the speaker's skull and exhibit better noise-resistance capabilities than normal air-conduction microphones (ACMs) when transmitting speech signals. Because BCMs only capture the low-frequency portion of speech signals, their frequency response is quite different from that of ACMs. When replacing an ACM with a BCM, we may obtain satisfactory results with respect to noise suppression, but the speech quality and intelligibility may be degraded due to the nature of the solid vibration. The mismatched characteristics of BCM and ACM can also impact the automatic speech recognition (ASR) performance, and it is infeasible to recreate a new ASR system using the voice data from BCMs. In this study, we propose a novel deep-denoising autoencoder (DDAE) approach to bridge BCM and ACM in order to improve speech quality and intelligibility, and the current ASR could be employed directly without recreating a new system. Experimental results first demonstrated that the DDAE approach can effectively improve speech quality and intelligibility based on standardized evaluation metrics. Moreover, our proposed system can significantly improve the ASR performance by a notable 48.28% relative character error rate (CER) reduction (from 14.50% to 7.50%) under quiet conditions. In an actual noisy environment (sound pressure from 61.7 dBA to 73.9 dBA), our proposed system with a BCM outperforms an ACM, yielding an 84.46% reduction in the relative CER (proposed system: 9.13% and ACM: 58.75%).

A Real-Time Dual-Microphone Speech Enhancement Algorithm Assisted by Bone Conduction Sensor

Sensors, 2020

The quality and intelligibility of the speech are usually impaired by the interference of background noise when using internet voice calls. To solve this problem in the context of wearable smart devices, this paper introduces a dual-microphone, bone-conduction (BC) sensor assisted beamformer and a simple recurrent unit (SRU)-based neural network postfilter for real-time speech enhancement. Assisted by the BC sensor, which is insensitive to the environmental noise compared to the regular air-conduction (AC) microphone, the accurate voice activity detection (VAD) can be obtained from the BC signal and incorporated into the adaptive noise canceller (ANC) and adaptive block matrix (ABM). The SRU-based postfilter consists of a recurrent neural network with a small number of parameters, which improves the computational efficiency. The sub-band signal processing is designed to compress the input features of the neural network, and the scale-invariant signal-to-distortion ratio (SI-SDR) is ...

Pitch characteristics of bone conducted speech

2010 18th European Signal Processing Conference, 2010

This paper investigates the pitch characteristics of bone conducted speech. Pitch determination of speech signal can not attain the expected level of accuracy in adverse conditions. Bone conducted speech is robust to ambient noise and it has regular harmonic structure in the lower spectral region. These two properties make it very suitable for pitch tracking. Few works have been reported in the literature on bone conducted speech to facilitate detection and removal of unwanted signal from the simultaneously recorded air conducted speech. In this paper, we show that bone conducted speech can also be used for robust pitch determination even in highly noisy environment that can be very useful in many practical speech communication applications like speech enhancement, speech/speaker recognition, and so on.

An LP-based blind model for restoring bone-conducted speech

2008 Second International Conference on Communications and Electronics, 2008

Due to the stability against the external noise, boneconducted (BC) speech seems better to be used instead of noisy air-conducted speech in an extremely noisy environment. However the quality of bone-conducted speech is very low and restoring bone-conducted speech is a challenged topic in speech signal processing field. As the main issue to improve the BC speech, many studies try to model and resolve the degradation when the signal is conducted through bone transduction. In previous study, we proposed a linear prediction (LP) based blindrestoration model. In this paper, we therefore completely evaluated the proposed model in comparison with other models to find out whether our proposed model could adequately improve voice quality and the intelligibility of BC speech, using objective measures (LSD, MCD, and LCD) and carrying out Japanese word-intelligibility tests (JWITs), Vietnamese wordintelligibility tests (VWITs) and Modified Rhyme Tests (MRTs) for English. The results of experiments on different languages, i.e. Japanese, English and Vietnamese proved the practicability of blind-BC restoration.

The effect of bone conduction microphone locations on speech intelligibility and sound quality

Applied Ergonomics, 2011

This paper presents the results of three studies of intelligibility and quality of speech recorded through a bone conduction microphone (BCM). All speech signals were captured and recorded using a Temco HG-17 BCM. Twelve locations on or close to the skull were selected for the BCM placement. In the first study, listeners evaluated the intelligibility and quality of the bone conducted speech signals presented through traditional earphones. Listeners in the second study evaluated the intelligibility and quality of signals presented through a loudspeaker. In the third study the signals were reproduced through a bone conduction headset; however, signal evaluation was limited to speech intelligibility only. In all three studies, the Forehead and Temple BCM locations yielded the highest intelligibility and quality rating scores. The Collarbone location produced the least intelligible and lowest quality signals across all tested BCM locations.

Air and bone-conductive integrated microphones for robust speech detection and enhancement

2003

We present a novel hardware device that combines a regular microphone with a bone-conductive microphone. The device looks like a regular headset and it can be plugged into any machine with a USB port. The bone-conductive microphone has an interesting property: it is insensitive to ambient noise and captures the low frequency portion of the speech signals. Thanks to the signals from the boneconductive microphone, we are able to detect very robustly whether the speaker is talking, eliminating more than 90% of background speech. Furthermore, by combining both channels, we are able to significantly remove background speech even when the background speaker speaks at the same time as the speaker wearing the headset.