Pitch characteristics of bone conducted speech (original) (raw)
Related papers
Pitch Determination from Bone Conducted Speech
IEICE Transactions on Information and Systems, 2016
This paper explores the potential of pitch determination from bone conducted (BC) speech. Pitch determination from normal air conducted (AC) speech signal can not attain the expected level of accuracy for every voice and background conditions. In contrast, since BC speech is caused by the vibrations that have traveled through the vocal tract wall, it is robust against ambient conditions. Though an appropriate model of BC speech is not known, it has regular harmonic structure in the lower spectral region. Due to this lowpass nature, pitch determination from BC speech is not usually affected by the dominant first formant. Experiments conducted on simultaneously recorded AC and BC speech show that BC speech is more reliable for pitch estimation than AC speech. With little human work, pitch contour estimated from BC speech can also be used as pitch reference that can serve as an alternate to the pitch contour extracted from laryngograph output which is sometimes inconsistent with simultaneously recorded AC speech.
Linear-Prediction-Based Accurate Spectrum Estimation with Pitch Extension for Bone-Conducted Speech
Journal of Signal Processing
This paper proposes an approach to pitch-synchronous linear prediction (LP) for bone-conducted (BC) voiced speech. A combination of the spectral compensation (SC) method with pitch extension LP is used to obtain a more accurate power spectrum of the BC speech signal. Simulation experiments show that the proposed method provides better performance than the conventional autocorrelation and original SC methods.
Pitch Estimation and Analysis of speech signal
Speech is the principal form of human communication since it began from day one when human beings start to communicate. The rate of vibration produce by the vocal cords is called a fundamental frequency (F0) or pitch period. Consequently, the pitch period estimation is to determinate the fundamental frequency for used in speech signal processing applications. The fundamental frequency range for a person is about 20 to 20 kHz, and the frequency of a sound wave will determine the human tone and pitch. The resultant of spikes in the correlation of voice data is to determine the period and therefore the pitch of the signal. Numerous pitch determination algorithms (PDAs) have been proposed in the literature. In general, they can be categorized into three classes: Time-domain, frequency-domain, and time– frequency domain Algorithms. The pitch tracking techniques using autocorrelation method and AMDF (Average Magnitude Difference Function) method involving the preprocessing and the extraction of pitch pattern.
ON EQUALIZATION OF BONE CONDUCTED SPEECH FOR IMPROVED SPEECH QUALITY
We propose an equalizer that attempts to improve the perceived speech quality of bone-conducted speech input with ear-insert microphones, which can provide clean speech input in noisy environments. We first show that the transfer characteristics of boneconducted speech are both speaker and microphone dependent, and propose an equalizer which is trained using simultaneously recorded airborne and bone-conducted speech. The short-term FFT amplitude ratio of airborne and bone-conducted speech is used. The amplitudes are averaged and smoothed extensively before the ratio is calculated. The trained equalizer is applied to bone-conducted speech in the frequency domain. We show that the proposed equalizer provides notable quality improvement on the bone-conducted speech input, both subjectively and objectively. We also show that the application of spectrum subtraction also helps decrease some constant level background noise found in these types of microphones.
Model-based speech enhancement using a bone-conducted signal
The Journal of the Acoustical Society of America, 2012
Codebook-based single-microphone noise suppressors, which exploit prior knowledge about speech and noise statistics, provide better performance in nonstationary noise. However, as the enhancement involves a joint optimization over speech and noise codebooks, this results in high computational complexity. A codebook-based method is proposed that uses a reference signal observed by a bone-conduction microphone, and a mapping between air-and bone-conduction codebook entries generated during an offline training phase. A smaller subset of air-conducted speech codebook entries that accurately models the clean speech signal is selected using this reference signal. Experiments support the expected improvement in performance at low computational complexity.
On the Usefulness of the Speech Phase Spectrum for Pitch Extraction
Most frequency domain techniques for pitch extraction such as cepstrum, harmonic product spectrum (HPS) and summation residual harmonics (SRH) operate on the magnitude spectrum and turn it into a function in which the fundamental frequency emerges as argmax. In this paper, we investigate the extension of these three techniques to the phase and group delay (GD) domains. Our extensions exploit the observation that the bin at which F (magnitude) becomes maximum, for some mono-tonically increasing function F , is equivalent to bin at which F (phase) has maximum negative slope and F (group delay) has the maximum value. To extract the pitch track from speech phase spectrum, these techniques were coupled with the source-filter model in the phase domain that we proposed in earlier publications and a novel voicing detection algorithm proposed here. The accuracy and robustness of the phase-based pitch extraction techniques are illustrated and compared with their magnitude-based counterparts using six pitch evaluation metrics. On average , it is observed that the phase spectrum can be successfully employed in pitch tracking with comparable accuracy and ro-bustness to the speech magnitude spectrum.
Evaluation of pitch detection algorithms in adverse conditions
Proc. 3rd International Conference on …, 2006
Robust fundamental frequency estimation in adverse conditions is important in various speech processing applications. In this paper a new pitch detection algorithm (PDA) based on the autocorrelation of the Hilbert envelope of the LP residual [1] is compared to another well established algorithm from Goncharoff . A set of evaluation criteria is collected on which the two PDA algorithms are compared. In order to evaluate the algorithms in adverse conditions a suited reference database was constructed. This reference database consists of parts of the SPEECON speech database where recordings of 60 speakers were selected and manually pitch marked. The recordings cover several adverse conditions as noise in the car cabin and reverberations of office rooms. The evaluation highlights the good performance of the new algorithm in comparison but shows, that low SNR conditions and strong reverberation are still a demanding challenge for future pitch detection algorithms.
A novel method for spectral-domain fundamental frequency (F0) estimation is proposed. The basis of this method is estimating F0 using the power spectrum of a windowed speech segment. For this purpose, a new transform is introduced. The prominent feature of this transform is that it estimates F0 from the speech segment power spectrum by exploiting the window function power spectrum. As a result, this transform is named the Window-Based transform. By comparison between the proposed method and the autocorrelation and the cepstral pitch estimation methods, the superiority of the proposed method under noisy environments is demonstrated.
A Robust Speech Enhancement Scheme on The Basis of Bone-conductive Microphones
This paper presents a novel speech more reliably than conventional multi-channel schemes, enhancement scheme on the basis of bone-conductive in general the additional gains to be derived from them microphones. High performance in non-stationary noisy have not been commensurate with the greatly increased environments is achieved by combining the following two computations required by these methods to combine techniques: the speech detection based on the information from these different sensors. bone-conductive microphone signal, and non-stationary To overcome this problem, this paper proposes a noise suppression by adaptive spectral subtraction. Unlike new speech enhancement scheme for a non-stationary regular air-conductive microphones, bone-conductive noisy environment. Firstly, the bone-conductive microphones are insensitive to environmental noises and microphone based speech detection is used to distinguish background speech. By employing this advantage, we are between background noise segments and speech segments. able to detect very robustly speech in non-stationary noisy Secondly, the reliable speech detection results help to environments. Furthermore, based on the reliable speech determine when to re-estimate the noise threshold. detection results, the adaptive spectral subtraction can Thirdly, the adaptive spectral subtraction aims at operate effectively even under low SNR conditions. attenuating the non-stationary noise. And the final enhanced speech is produced.
Air and bone-conductive integrated microphones for robust speech detection and enhancement
2003
We present a novel hardware device that combines a regular microphone with a bone-conductive microphone. The device looks like a regular headset and it can be plugged into any machine with a USB port. The bone-conductive microphone has an interesting property: it is insensitive to ambient noise and captures the low frequency portion of the speech signals. Thanks to the signals from the boneconductive microphone, we are able to detect very robustly whether the speaker is talking, eliminating more than 90% of background speech. Furthermore, by combining both channels, we are able to significantly remove background speech even when the background speaker speaks at the same time as the speaker wearing the headset.