On the Relationship Between Instantaneous Frequency and Pitch in Speech Signals (original) (raw)
Related papers
On the Usefulness of the Speech Phase Spectrum for Pitch Extraction
Most frequency domain techniques for pitch extraction such as cepstrum, harmonic product spectrum (HPS) and summation residual harmonics (SRH) operate on the magnitude spectrum and turn it into a function in which the fundamental frequency emerges as argmax. In this paper, we investigate the extension of these three techniques to the phase and group delay (GD) domains. Our extensions exploit the observation that the bin at which F (magnitude) becomes maximum, for some mono-tonically increasing function F , is equivalent to bin at which F (phase) has maximum negative slope and F (group delay) has the maximum value. To extract the pitch track from speech phase spectrum, these techniques were coupled with the source-filter model in the phase domain that we proposed in earlier publications and a novel voicing detection algorithm proposed here. The accuracy and robustness of the phase-based pitch extraction techniques are illustrated and compared with their magnitude-based counterparts using six pitch evaluation metrics. On average , it is observed that the phase spectrum can be successfully employed in pitch tracking with comparable accuracy and ro-bustness to the speech magnitude spectrum.
Pitch Estimation and Analysis of speech signal
Speech is the principal form of human communication since it began from day one when human beings start to communicate. The rate of vibration produce by the vocal cords is called a fundamental frequency (F0) or pitch period. Consequently, the pitch period estimation is to determinate the fundamental frequency for used in speech signal processing applications. The fundamental frequency range for a person is about 20 to 20 kHz, and the frequency of a sound wave will determine the human tone and pitch. The resultant of spikes in the correlation of voice data is to determine the period and therefore the pitch of the signal. Numerous pitch determination algorithms (PDAs) have been proposed in the literature. In general, they can be categorized into three classes: Time-domain, frequency-domain, and time– frequency domain Algorithms. The pitch tracking techniques using autocorrelation method and AMDF (Average Magnitude Difference Function) method involving the preprocessing and the extraction of pitch pattern.
Pitch extraction (also called fundamental frequency estimation) has been a popular topic in many fields of research since the age of computers. Yet in the course of some 50 years of study, current techniques are still not to a desired level of accuracy and robustness. When presented with a single clean pitched signal, most techniques do well, but when the signal is noisy, or when there are multiple pitch streams, many current pitch algorithms still fail to perform well. This report presents a discussion of the history of pitch detection techniques, as well as a survey of the current state of the art in pitch detection technology.
Uses of the pitch-scaled harmonic filter in speech processing
The pitch-scaled harmonic filter (PSHF) is a technique for decomposing speech signals into their periodic and aperiodic constituents, during periods of phonation. In this paper, the use of the PSHF for speech analysis and processing tasks is described. The periodic component can be used as an estimate of the part attributable to voicing, and the aperiodic component can act as an estimate of that attributable to turbulence noise, i.e., from fricative, aspiration and plosive sources. Here we present the algorithm for separating the periodic and aperiodic components from the pitch-scaled Fourier transform of a short section of speech, and show how to derive signals suitable for timeseries analysis and for spectral analysis. These components can then be processed in a manner appropriate to their source type, for instance, extracting zeros as well as poles from the aperiodic spectral envelope. A summary of tests on synthetic speech-like signals demonstrates the robustness of the PSHF's performance to perturbations from additive noise, jitter and shimmer. Examples are given of speech analysed in various ways: power spectrum, short-time power and short-time harmonics-to-noise ratio, linear prediction and mel-frequency cepstral coefficients. Besides being valuable for speech production and perception studies, the latter two analyses show potential for incorporation into speech coding and speech recognition systems. Further uses of the PSHF are revealing normally-obscured acoustic features, exploring interactions of turbulence-noise sources with voicing, and pre-processing speech to enhance subsequent operations.
EXTRACTION OF PITCH IN ADVERSE CONDITIONS
This paper proposes a method for the extraction of pitch in adverse conditions. Real environment in which the degradation is due to several unpredictable sources like, additive noise, reverberation and channel noise is treated as adverse condition in this study. The proposed method is based on the knowledge of Glottal Closure (GC) events. GC event is the instant at which closure of vocal folds takes place within a pitch period. The Hilbert envelope of the Linear Prediction (LP) residual gives information about the location of GC events. Autocorrelation analysis is performed on the Hilbert envelope of the LP residual. The properties of the Hilbert envelope of the LP residual are exploited for the extraction of pitch from the autocorrelation sequence. The results of the proposed method are compared with the Simple Inverse Filtering Technique (SIFT) algorithm. The performance of the proposed algorithm is found to be superior, even in adverse conditions.
Mathematical Problems in Engineering, 2021
In this article, a novel pitch determination algorithm based on harmonic differences method (HDM) is proposed. Most of the algorithms today rely on autocorrelation, cepstrum, and lastly convolutional neural networks, and they have some limitations (small datasets, wideband or narrowband, musical sounds, temporal smoothing, etc.), accuracy, and speed problems. There are very rare works exploiting the spacing between the harmonics. HDM is designed for both wideband and exclusively narrowband (telephone) speech and tries to find the most repeating difference between the harmonics of speech signal. We use three vowel databases in our experiments, namely, Hillenbrand Vowel Database, Texas Vowel Database, and Vowels from the TIMIT corpus. We compare HDM with autocorrelation, cepstrum, YIN, YAAPT, CREPE, and FCN algorithms. Results show that harmonic differences are reliable and fast choice for robust pitch detection. Also, it is superior to others in most cases.
Fundamental Frequency Estimation Based on Pitch-Scaled Harmonic Filtering
2007
In this paper, we present an algorithm for robustly estimating the fundamental frequency in speech signals. Our approach is based on pitch-scaled harmonic filtering (PSHF). Following PSHF, we perform a filtering in the frequency domain using the short-time Fourier transform in order to separate the harmonic and non-harmonic parts of the processed signal. We enhance the standard PSHF approach by using a range of window lengths and a cost function that is applied to each window size. This cost function takes into account the energy at the harmonic and non-harmonic frequency coefficients to estimate harmonic energy for a frame. By using energy peaks and applying a cost function that considers the change in pitch in subsequent frames, we then determine the final pitch contour. We evaluated our approach on the Keele database. As the experimental results demonstrate, our methods performs robustly for noisy speech and has a good performance for clean speech in comparison with state-of-the-art algorithms.
International Conference on Innovation in Engineering and Technology (ICIET), 2018
This paper articulates a speech features extraction system implying pitch and first two order formant estimation of different vowel sounds embedded in different speeches using autocorrelation and frequency domain spectral analysis. The database has inputted sounds voiced by different male and female speakers. The pitch values are measured by dint of the respective autocorrelation profiles and the formants are estimated by analyzing the corresponding frequency spectra. Pitch values of white noise-corrupted speeches are calculated as well. The overall testbed has been simulated on MATLAB and the performance evaluations corroborate the reliability of the presented framework. The proposed speech features estimation technique yield to effective voice recognition and speaker identification applications.