Linear-Prediction-Based Accurate Spectrum Estimation with Pitch Extension for Bone-Conducted Speech (original) (raw)
Related papers
Spectrum Compensation Method for Speech Signals Based on Prediction Error Filtering
WSEAS Transactions on Systems and Control archive, 2017
This paper proposes a technique for improving the performance of linear prediction (LP) by utilizing the prediction error filter (PEF) as a pre-processor. Problems often occur in estimating the power spectrum of the input speech signal using LP due to the large spectral dynamic range of speech which makes the autocorrelation matrix ill-conditioned. In the proposed method, the LP based power spectrum estimation is compensated by the spectrum characteristics of the designed PEF. The accuracy of formant frequency estimation is verified on synthetic speech. The validity of the proposed method is also illustrated by inspecting real air conducted and bone conducted speeches. Through the experiments, we show that the proposed method can estimate the power spectrum more accurately than the conventional direct and pre-emphasis LP methods
WSEAS Transactions on Signal Processing archive, 2017
This paper proposes a linear prediction (LP) method to estimate accurately the original power spectrum of the input speech signal. A prediction error filter (PEF) is used as a pre-processor, and the LP based power spectrum estimation is compensated by the frequency characteristics of the designed PEF. Through experiments on synthetic vowels, we show that the proposed spectrum compensation method can estimate the power spectrum more accurately than the direct and pre-emphasis LP methods.
Pitch characteristics of bone conducted speech
2010 18th European Signal Processing Conference, 2010
This paper investigates the pitch characteristics of bone conducted speech. Pitch determination of speech signal can not attain the expected level of accuracy in adverse conditions. Bone conducted speech is robust to ambient noise and it has regular harmonic structure in the lower spectral region. These two properties make it very suitable for pitch tracking. Few works have been reported in the literature on bone conducted speech to facilitate detection and removal of unwanted signal from the simultaneously recorded air conducted speech. In this paper, we show that bone conducted speech can also be used for robust pitch determination even in highly noisy environment that can be very useful in many practical speech communication applications like speech enhancement, speech/speaker recognition, and so on.
Pitch Determination from Bone Conducted Speech
IEICE Transactions on Information and Systems, 2016
This paper explores the potential of pitch determination from bone conducted (BC) speech. Pitch determination from normal air conducted (AC) speech signal can not attain the expected level of accuracy for every voice and background conditions. In contrast, since BC speech is caused by the vibrations that have traveled through the vocal tract wall, it is robust against ambient conditions. Though an appropriate model of BC speech is not known, it has regular harmonic structure in the lower spectral region. Due to this lowpass nature, pitch determination from BC speech is not usually affected by the dominant first formant. Experiments conducted on simultaneously recorded AC and BC speech show that BC speech is more reliable for pitch estimation than AC speech. With little human work, pitch contour estimated from BC speech can also be used as pitch reference that can serve as an alternate to the pitch contour extracted from laryngograph output which is sometimes inconsistent with simultaneously recorded AC speech.
A novel method for spectral-domain fundamental frequency (F0) estimation is proposed. The basis of this method is estimating F0 using the power spectrum of a windowed speech segment. For this purpose, a new transform is introduced. The prominent feature of this transform is that it estimates F0 from the speech segment power spectrum by exploiting the window function power spectrum. As a result, this transform is named the Window-Based transform. By comparison between the proposed method and the autocorrelation and the cepstral pitch estimation methods, the superiority of the proposed method under noisy environments is demonstrated.
Method of LP-based blind restoration for improving intelligibility of bone-conducted speech
2007
Bone-conducted (BC) speech in an extremely noisy environment is stable against surrounding noise so that it may be able to be used instead of air-conducted (AC) speech for communication. However, it has very poor sound quality and its intelligibility is degraded when transmitted through bone conduction. Therefore, voice-quality and the intelligibility of BC speech need to be blindly improved in actual speech communication and this is a challenging new topic in the speech signalprocessing field. We proposed an LP-based model to restore BC speech to improve its voice-quality in a previous study. While other methods such as Long-term Fourier transform need to use numerous AC speech parameters to restore BC speech, the proposed model can blindly restore BC speech by predicting BC-LP coefficients from AC-LP coefficients. We improved the proposed model by (1) extending long-term processing to framebasis processing, (2) using LSF coefficients on LP representation, and (3) using a recurrent neural network for predicting parameters. We evaluated the improved model in comparison with other models to find out whether the model could adequately improve voice quality and the intelligibility of BC speech, using objective measures (LSD, MCD, and LCD) and carrying out Modified Rhyme Tests (MRTs). An evaluation of these three improvements to the LP-based model proved the practicability of blind-BC restoration.
LP-based method of blind restoration to improve intelligibility of bone-conducted speech
Thang TAT VU †a) , Massashi UNOKI †b) , and Masato AKAGI †c) , SUMMARY Bone-conducted (BC) speech can be used instead of air-conducted (AC) speech in an extremely noisy environment. However, its intelligibility is degraded when transmitted through bone-conduction. Therefore, voice quality and the intelligibility of BC speech need to be blindly improved in actual communication through speech and this is a challenging new topic in the field of speech signal processing. We proposed a linear prediction (LP) based model to restore BC speech to improve voice quality in a previous study. While other methods such as Long-term Fourier transform need to use numerous AC speech parameters to restore BC speech, the model we proposed demonstrated the expressed ability of blindly restoring BC speech by predicting AC-LP coefficients from BC-LP coefficients. We improved the previous model by (1) extending long-term processing to frame-basis processing, (2) using line spectral frequency (LSF) coefficients on an LP representation, and (3) using a recurrent neural network for predicting parameters. We evaluated the improved model in comparison with others to find out whether it could adequately improve voice quality and the intelligibility of BC speech, using objective measures (i.e., LSD, MCD, and LCD) and carrying out a subjective measure-a Japanese-word intelligibility test (JWIT). The experimental results proved significant improvements to our newly proposed models (LSF and LSF-SRN). The LSF model demonstrated it had significant capabilities for improving BC speech, i.e., both voice quality and intelligibility of speech. Our proposed model, LSF-SRN, demonstrated an expressed capability for improving the intelligibility of BC speech even when using blind restoration.
A pitch-based spectral enhancement technique for robust speech processing
Interspeech 2013, 2013
This paper presents a new pitch-based spectral enhancement algorithm on voiced frames for speech analysis and noiserobust speech processing. The proposed algorithm determines a time-warping function (TWF) and the speaker's pitch with high precision, simultaneously. This technique reduces the smearing effect in between harmonics when the fundamental frequency is not constant within the analysis window. To do so, we propose a metric called the harmonic residual which measures the difference between the actual spectrum and the resynthesized spectrum derived from the linear model of speech production with various combinations of TWF and high-precision pitch values as parameters. The TWF and pitch pair that yields the minimum harmonic residual is selected and the enhanced spectrum is obtained accordingly. We show how this new representation can be used for automatic speech recognition by proposing a robust spectral representation derived from harmonic amplitude interpolation 1 .
An LP-based blind model for restoring bone-conducted speech
2008 Second International Conference on Communications and Electronics, 2008
Due to the stability against the external noise, boneconducted (BC) speech seems better to be used instead of noisy air-conducted speech in an extremely noisy environment. However the quality of bone-conducted speech is very low and restoring bone-conducted speech is a challenged topic in speech signal processing field. As the main issue to improve the BC speech, many studies try to model and resolve the degradation when the signal is conducted through bone transduction. In previous study, we proposed a linear prediction (LP) based blindrestoration model. In this paper, we therefore completely evaluated the proposed model in comparison with other models to find out whether our proposed model could adequately improve voice quality and the intelligibility of BC speech, using objective measures (LSD, MCD, and LCD) and carrying out Japanese word-intelligibility tests (JWITs), Vietnamese wordintelligibility tests (VWITs) and Modified Rhyme Tests (MRTs) for English. The results of experiments on different languages, i.e. Japanese, English and Vietnamese proved the practicability of blind-BC restoration.