Speech analysis and synthesis using an AM–FM modulation model (original) (raw)

Investigating source-filter interaction to specify classic speech production theory

2015

The paper is concerned with the specification and improvement of the traditional source-filter model of the human vocal tract proposed by G.Fant and analyzed by many scientists. The new method of recording the glottal wave synchronously with an output speech signal was employed to obtain the experimental material. The comparison of the recorded signals allowed analyzing the structure of the speech signal at different stages of its generation. As a result, the classic vocal tract model was specified by distinguishing a feedback component which formalizes the processes in the vocal tract as a complex acoustic nonlinear system. One of the functions of the component is to transform the acoustic energy from the articulation system upstream. In the paper the recording method is described, perceptual experiment and the acoustic analysis results are presented.

A new mathematical model for speech intonation: Theory and hardware/software realization

The Journal of the Acoustical Society of America, 1987

Y6. Speech formant trajectory estimation using dynamic programming with modulated transition costs. David Talkin (AT&T Bell Laboratories, Room 2D-410, 600 Mountain Avenue, Murray Hill, NJ 07974) A new algorithm to track automatically speech formant frequencies have been developed. Dynamic programming is used to optimize formant trajectory estimates by imposing appropriate frequency continuity constraints. The continuity constraints are modulated by a stationarity function. The formant frequencies are selected from candidates proposed by solving for the roots of the linear predictor polynomial computed periodically from the speech waveform. The local costs of all possible mappings of the complex roots to formant frequencies are computed at each frame based on the frequencies and bandwidths of the component formants for each mapping. The cost of connecting each of these mappings with each of the mappings in the previous frame is then minimized using a modified Viterbi algorithm. Two sentences spoken by 88 males and 43 females were analyzed. The first three formants were tracked correctly in all sonorant regions in over 80% of the sentences. These performance results are based on spectrographic analysis and informal listening to formant-synthesized speech. 2:15 Y7. Some properties of autoregressive model related cepstrum. Biing-Hwang Juang and David Mansour (AT&T Bell Laboratories, 600 Mountain Avenue, Murry Hill, NJ 07974) bridging the encrypted signals in a single B-voice channel) is discussed. In particular, an enhancement procedure using a prefilter, "dither" noise and an adaptive postfilter is proposed. The system is based on an all-pole model for the degraded speech. The coefficients of the postfilter are easily derivable from the all-pole filter parameters. Informal listening tests have shown that the enhanced speech is near "toll" quality. 2:45 Y9. Invariant acoustic cues in stop consonants: A cross-language study using the Wigher distribution. H. Garudadri (Electrical Engineering,

Analysis/synthesis and modification of the speech aperiodic component

Speech Communication, 1996

The general framework of this paper is speech analysis and synthesis. The speech signal may be separated into two components: (1) a periodic component (which includes the quasi-periodic or voiced sounds produced by regular vocal cord vibrations); (2) an aperiodic component (which includes the non-periodic part of voiced sounds (e.g. fricative noise in /v/j or sound emitted without any vocal cord vibration (e.g. unvoiced fricatives, or plosives)). This work is intended to contribute to a precise modelling of this second component and particularly of modulated noises. Firstly, a synthesis method, inspired by the "shot noise effect", is introduced. This technique uses random point processes which define the times of arrival of spectral events (represented by Formant Wave Form (FWF)). Based on the theoretical framework provided by the Rice representation and the random modulation theory, an analysis/synthesis scheme is proposed. Perception tests show that this method allows to synthesize very natural speech signals. The representation proposed also brings new types of voice quality modifications (time scaling, vocal effort, breathiness of a voice, etc.).

Frequency modulations in the speech signal

Acoustical Physics, 2009

The paper examines physical mechanisms of frequency modulations in acoustics of the vocal tract and methods of estimation of these modulations in the speech signal. It has been found that vibrations of the tract walls make a negligibly small effect on modulations of its resonance frequencies. The model of the pro cess of speech formation with account for the subglottal cavity shows that a change in boundary conditions at the open glottis produces noticeable variations in resonance frequencies. Along with this type of modula tions, modulations determined by the shape of the source of excitation also arise in the speech signal. They substantially depend on the ratio of the frequency of the fundamental tone to the resonance frequency and of the parameters of methods estimating modulations and methods of analysis of the speech signal. Overall, this may sometimes cause unstable and unpredictable modulations of estimated formant frequencies in the speech signal.

On the Representation of Voice Source Aperiodicities in the MBE Speech Coding Model

2003

We present an investigation of the representation of voice source aperiodicities in the Multi-Band Excitation (MBE) speech model for the compression of narrowband speech. The MBE model is a fixed-frame based analysis-synthesis algorithm which combines harmonic and stochastic components to reconstruct speech from estimated model parameters. Pitch cycle perturbations, such as jitter and shimmer, are not captured accurately in the framewise constant parameter estimates, thus impacting the reproduced voice quality. The actual dependence of MBE reconstructed voice quality on the voice pitch and the type of perturbation are explored through objective measurements and subjective listening with synthetic and natural speech.

On decomposing speech into modulated components

Speech and Audio Processing, IEEE …, 2000

Abstract—We model a segment of filtered speech signal as a product of elementary signals as opposed to a sum of sinusoidal signals. Using this model, one can better appreciate the basic relationships between envelopes and phases or instantaneous frequencies (IF's) of ...

Modification of the aperiodic component of speech signals for synthesis

1996

Modification of the Aperiodic Component of Speech Signals for Synthesis Gael Richard Christophe R. d'Alessandro ABSTRACT Modeling the excitation component of speech signals is a challenging problem for speech synthesis. Recently, several works have been devoted to peri-odic/ ...