Comparative analysis of linear and nonlinear speech signals predictors (original) (raw)

Application of Nonlinear Methods for Analyzing Rate of Speech Production

Journal of fluency …, 1997

This pilot study has examined the rate of repetitive speech production through use of nonlinear methods. Durational measures were obtained from a normal subject who was required to produce a stimulus word in four speaking conditions: normal, controlled-normal, accelerated, and controlled-accelerated. Phase plots and accumulated time series plots were utilized to display intra-subject variability. Attractors were observed in each of the four phase plots and of particular interest was the direction of their shift for the different speaking conditions. The accumulated time series plots also revealed patterns of intra-subject variability across time. In summary, these two forms of nonlinear representation successfully characterized qualitative changes within, and across, the four speaking conditions. The observed spectral distributions and patterns of variability have implications for differentiating normal from abnormal speaking conditions.

Nonlinear prediction of speech signal using volterra-wiener series

Interspeech 2013, 2013

Linear Prediction (LP) analysis has proven to be very effective and successful in speech analysis and speech synthesis applications. This may be due to the fact that LP analysis captures implicitly the time-varying vocal tract area function. However, it captures only the second-order statistical relationships and only the linear dependencies in the sequence of samples of speech signals (and not the higher-order relations), as a result of which the LP residual is also intelligible. This paper studies the effectiveness of nonlinear prediction (NLP) of the speech signal by using the state-ofthe-art Volterra-Wiener series and uses a novel chaotic titration method to analyze the chaotic characteristics of the residual obtained by both the LP and NLP methods. The experimental results demonstrate that the proposed NLP approach gives less prediction error, relatively flat residual spectrum, less PESQ score (i.e., objective evaluation of MOS to a certain extent) and less chaoticity than its LP counterpart. Finally, the L 1 norm and L 2 norm of NLP residual was found be relatively less than LP residual for five instances of voiced and unvoiced regions extracted from speakers of TIMIT database.

Parametric Non-linear Prediction of Speech

IEEE Automatic Speech Recognition Workshop, Arden House, Harriman, NY, pp. 35-38., 1991

In this paper we present several algorithms for simultaneous estimation of the non-linear as well as the linear prediction parameters of speech signals. Our study shows that the non-linear models retain substantially more information when compared to linear-only models. Preliminary experiments on telephone quality speech data clearly and consistently indicate that there is a significant reduction in the prediction error when the bilinear prediction components are included along with the LPC part. The results in this paper may have significant effect on the performance accuracy of any speech recognition/synthesis/coding system that currently relies on linear prediction only.

Speech Analysis and Synthesis by Linear Prediction of the Speech Wave

The Journal of the Acoustical Society of America, 1970

We describe a procedure for efficient encoding of the speech wave by representing it in terms of time-varying parameters related to the transfer function of the vocal tract and the characteristics of the excitation. The speech wave, sampled at 10 kHz, is analyzed by predicting the present speech sample as a linear combination of the 12 previous samples. The 12 predictor coe&ients are determined by minimiaing the mean-squared error between the actual and the predicted values of the speech samples. Fifteen parametek-namely, the 12 predictor coethcienta, the pitch period, a binary parameter indicating whether the speech is voiced or unvoiced, and the rms value of the speech samples-are derived by analysis of the speech wave, encoded and transmitted to the synthesizer. The speech wave is synthesized as the output of a linear recursive filter excited by either a sequence of quasiperiodic pulses or a wbite-noise source. Application of this method for efficient transmission and storage of speech signals as well as procedures for determining other speech characteristics, such as formant frequencies and bandwidths, the spectral envelope, and the autocorrelation function, are discussed.

NONLINEAR SPEECH PROCESSING: OVERVIEW AND APPLICATIONS

From a physics and mathematics viewpoint, in the traditional linear approach to speech modeling the true nonlinear physics of speech production are approximated via the standard assumptions of linear acoustics and 1D plane wave propagation of the sound in the vocal tract. Despite the limited technological success of the linear model in several applications, there is strong theoretical and experimental evidence for the existence of important nonlinear 3D fluid dynamics phenomena during the speech production that cannot be accounted for by the linear model. Examples of such phenomena include modulations of the speech airflow and turbulence. presents several physical measures that show turbulences in the airflow. The main arguments that show the evidences of non-linearities in the speech signal are:

A Comparative Study of Speech Modeling Methods

2007

In this paper, we present two stochastic methods, to identify the parameters of a physical process with unbiased estimates based on whitening error of prediction. These methods incorporate a recursive procedure that makes successive corrections in determining, a linear mathematical model, based on the data of observation, to represent the system considered. The ARMA model is a typical example. Several tests of simulation were carried out to show the abilities of the least mean squares algorithm LMS and stochastic Newton algorithm. An application is provided then to identify the parameters of AR model corresponding to the speech signal. Key-Words: Modeling, Identification, Unbiased estimator, Newton method, Gradient method, Speech signal.

Performances of Least Squares Estimator for Modeling a Speech Signal

2009

This paper experimentally explores the computational properties of the least squares method. It is a successfully applied to identify with unbiased estimates parameters of a physical process based on whitening error of prediction and to establish their performances (Bias, Variance and Mean Squares Error). Then, for every data observation set and missing pattern, the estimates parameters can be compared with those originally generated. An application is realized for modeling a speech signal.

Linear prediction analysis/synthesis & noise cancellation techniques in speech signals

1980

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: E C 54729 INFORM ATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.

Identification of nonlinear oscillator models for speech analysis and synthesis

2005

More than ten years ago the first successful application of a nonlinear oscillator model to high-quality speech signal processing was reported (Kubin and Kleijn, 1994). Since then, numerous developments have been initiated to turn nonlinear oscillators into a standard tool for speech technology. The present contribution will review and compare several of these attempts with a special emphasis on adaptive model identification from data and the approaches to the associated machine learning problems. This includes Bayesian methods for the regularization of the parameter estimation problem (including the pruning of irrelevant parameters) and Ansatz library (Lainscsek et al., 2001) based methods (structure selection of the model). We conclude with the observation that these advanced identification methods need to be combined with a thorough background from speech science to succeed in practical modeling tasks. This chapter corresponds to talks given at the Cost 277 summerschool at IIASS in Vietri sul Mare (IT), in Sept. 2004. We would sincerely like to thank Anna Esposito for organizing the summerschool, and for her patience editing this publication.