The Performance of the Time-Frequency Analysis Software (TF32) in the Acoustic Analysis of the Synthesized Pathological Voice (original) (raw)

A Comparative Study of Acoustic Voice Measurements by Means of Dr. Speech and Computerized Speech Lab

Journal of Voice, 2005

In this study, the calculations and results of acoustic voice analysis as calculated by two different analysis systems (Doctor Speech (DRS), Tiger Electronics, Neu-Anspach, Germany, and Computerized Speech Lab (CSL), Kay Elemetrics Corporation, Lincoln Park, NJ) are compared. A group of 120 normal voices was selected for analysis of the objective parameters: fundamental frequency (F 0), variation of F 0 (F 0 SD), jitter, shimmer, and harmonics-to-noise ratio (HNR). The subject group was a random selection of normal voices of adults. The aim of this comparison was to find determined differences and similarities in data measurements between both systems to make data transfer possible. A significant correlation was found for F 0 , HNR, and shimmer relative. The correlation for jitter (relative and absolute) and F 0 SD was weak. DRS and CSL are not comparable in absolute figures, but their judgment against normative data is identical. Further research is necessary to explore the affect on pathological voices or child voices.

Integrated software for analysis and synthesis of voice quality

Voice quality is an important perceptual cue in many disciplines, but knowledge of its nature is limited by a poor understanding of the relevant psychoacoustics. This article (aimed at researchers studying voice, speech, and vocal behavior) describes the UCLA voice synthesizer, software for voice analysis and synthesis designed to test hypotheses about the relationship between acoustic parameters and voice quality perception. The synthesizer provides experimenters with a useful tool for creating and modeling voice signals. In particular, it offers an integrated approach to voice analysis and synthesis and allows easy, precise, spectral-domain manipulations of the harmonic voice source. The synthesizer operates in near real time, using a parsimonious set of acoustic parameters for the voice source and vocal tract that a user can modify to accurately copy the quality of most normal and pathological voices. The software, user's manual, and audio files may be downloaded from http://brm.psychonomic-journals.org/content/supplemental. Future updates may be downloaded from www.surgery.medsch.ucla.edu/glottalaffairs/.

Analysis by synthesis of FM modulation and aspiration noise components in pathological voices

IEEE International Conference on Acoustics Speech and Signal Processing, 2002

FM and source noise characteristics of pathological voices are analyzed and modeled using precision interpolating pitch tracking. Detailed tracking data allows segregation of pitch variations into low frequency (tremor) and high frequency pitch variation (HFPV) time series. Tremor data is used to resample the original voice into a quasi-constant pitch signal, which results in a more accurate source noise estimate using the noise analysis algorithm described by de Krom [1]. Gaussian distributions are used for both source HFPV and aspiration noise models. Combined analysis parameters are used to drive a formant synthesizer, resulting in improved perceived fidelity.

Discrimination of Pathological Voices Using a Time-Frequency Approach

IEEE Transactions on Biomedical Engineering, 2005

Acoustical measures of vocal function are routinely used in the assessments of disordered voice, and for monitoring the patient's progress over the course of voice therapy. Typically, acoustic measures are extracted from sustained vowel stimuli where short-term and long-term perturbations in fundamental frequency and intensity, and the level of "glottal noise" are used to characterize the vocal function. However, acoustic measures extracted from continuous speech samples may well be required for accurate prediction of abnormal voice quality that is relevant to the client's "real world" experience. In contrast with sustained vowel research, there is relatively sparse literature on the effectiveness of acoustic measures extracted from continuous speech samples. This is partially due to the challenge of segmenting the speech signal into voiced, unvoiced, and silence periods before features can be extracted for vocal function characterization. In this paper we propose a joint time-frequency approach for classifying pathological voices using continuous speech signals that obviates the need for such segmentation. The speech signals were decomposed using an adaptive time-frequency transform algorithm, and several features such as the octave max, octave mean, energy ratio, length ratio, and frequency ratio were extracted from the decomposition parameters and analyzed using statistical pattern classification techniques. Experiments with a database consisting of continuous speech samples from 51 normal and 161 pathological talkers yielded a classification accuracy of 93.4%.

Voice quality assessment using phase information: Application on voice pathology

One of the most important human abilities is speech along with hearing. Speech is the primary way in which we attune to the society. Our voice can uncover several information about us to other people. It reveals our energy level, our emotions, our personality and our artistry. Voice abnormalities may cause social isolation or may create problems in the professional field. Due to this significance of the voice, the early detection of a voice pathology is essential.

Preliminary Measurements of Voice Parameters using Multi Dimensional Voice Program

World journal of research and review, 2017

Voice plays a major role in speech and communication. Characteristics of a "normal" voice should include a good quality, appropriate balance of oral and nasal resonance, appropriate loudness, and habitual pitch level suitable for the age, size and sex of individual and proper voice inflections. The aim of this study is to analyze the voice characteristics in young people. The comparison is conducted by using subjective and objective methods. The sample of the study consisted of ten males and ten females, aged 19-30 years, with no voice pathology. Acoustic analysis was performed in Multi-Dimensional Voice Program (MDVP; Kay Elemetrics Corporation, Lincoln Park, NJ) and included the voice recordings of sustained vowels /a/, /e/, /i/, /o/, /u/. For subjective assessments were used the laryngoscopy assessment for vocal cord function and the Voice Handicap Index scale (VHI), VoiSS scale and Buffalo scale. The score of each scale ranged within normal limits and the findings of laryngoscopy were normal for all participants. In addition, the maximum phonation time (MPT) and the ratio «s/z» evaluated and revealed no abnormalities. The analysis of the results from MDVP brought upon significant correlations between the sexes. More specifically, the mean fundamental frequency F0 for all vowels were significantly higher in women than men (p<0,001). However, the parameters of % jitter /i/ and NHR /a/, /e/ showed significantly higher values in men than women with p=0,029, p=0,006 and p<0,001. Finally, the value of VTI /a/ parameter showed a significant difference in men compared to women (p=0,035). In conclusion, the parameters % jitter /i/, NHR /a/, /e/ and VTI /o/, showed higher value in men and only the value of F0 parameter displays greater value on women.

Analysis and synthesis of amplitude modulation components in pathological voices

Proceedings of 2002 IEEE Workshop on Speech Synthesis 2002 WSS-02, 2002

Previous work [1] addressed analysis and synthesis of the FM and aspiration noise components of pathological voices. The current study refines the previous analysis and adds amplitude modulation to model non-periodic components. The cepstral aspiration noise estimate is improved by removal of HFPV (high frequency pitch variation) from the original voice prior to estimating the noise. Amplitude and pitch pulse power tracking are performed on a pulse-by-pulse basis to accurately estimate the power time series, which is then segregated into low frequency power variation and high frequency power variation (shimmer). These AM effects are then also removed from the original voice and cepstral noise analysis is applied to estimate aspiration noise. Results show that incorporation of the AM effects improves the fidelity of the synthesized voice, and that AM has a minimal effect on the measured level of aspiration noise.