Time–Frequency Analysis of Vietnamese Speech Inspired on Chirp Auditory Selectivity (original) (raw)

Abstract

In speech analysis, the pitch or fundamental frequency is usually considered as parameter for characterizing the vocal chord excitation, but it plays nearly no role in the very time–spectral analysis of the speech signal. In this paper, we present a novel speech analysis approach in which pitch (and its variation over time) play a leading role. The computation of the pitch and the pitch rate is carried out in-segment, by means of the minimization of Huber’s loss over the short-time correlation according to a second-order polynomial fitting law. The proposed method is integrated within the Fan-Chirp transform and the Spectral All-Pole Estimation method, both proposed previously by the authors. The results over Vietnamese speech reveal the advantages of the proposed analysis methodology versus the popular linear prediction estimation. The paper discusses finally the possible impact of the proposed method in speech coding, this representing the upcoming research work.

Preview

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Kondoz, A.M.: Digital speech: Coding for low bit rate communication systems. John Wiley & Sons, Chichester (2004)
    Book Google Scholar
  2. Quatieri, T.F.: Discrete-Time Speech Signal Processing. Prentice-Hall, Englewood Cliffs (2001)
    Google Scholar
  3. Weruaga, L., Képesi, M.: The fan-chirp transform for nonstationary harmonic signals. Signal Processing 87, 1504–1522 (2007)
    Article MATH Google Scholar
  4. Kawahara, H., et al.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)
    Article Google Scholar
  5. Mercado, E., Myers, C.E., Gluck, M.A.: Modeling auditory cortical processing as an adaptive chirplet transform. Neurocomputing 32(33), 913–919 (2000)
    Article Google Scholar
  6. Dunn, R., Quatieri, T.F.: Sinewave analysis/synthesis based on the fan-chirp transform. In: Proc. IEEE WASPAA, pp. 247–250 (2007)
    Google Scholar
  7. Li, P., Guan, Y., Xu, B., Liu, W.: Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech. In: Proc. IEEE ICASSP, pp. 2014–2023 (2008)
    Google Scholar
  8. Weruaga, L.: All-pole estimation in spectral domain. IEEE Trans. Signal Processing 55, 4821–4830 (2007)
    Article MathSciNet Google Scholar
  9. Whittle, P.: Gaussian estimation in stationary time series. Bull. Intl. Stat. Instit. 39, 105–130 (1961)
    MathSciNet MATH Google Scholar
  10. Képesi, M., Weruaga, L.: Adaptive chirp-based time-frequency analysis of speech signals. Speech Communication 55, 474–492 (2006)
    Article Google Scholar
  11. Marques, J.S., et al.: Improved pitch prediction with fractional delays in CELP coding. In: Proc. IEEE ICASSP, pp. 665–668 (1990)
    Google Scholar
  12. Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)
    MATH Google Scholar
  13. Rojo-Álvarez, J.L., et al.: A robust support vector algorithm for nonparametric spectral analysis. IEEE Signal Processing Lett. 10, 320–323 (2003)
    Article Google Scholar

Download references

Author information

Authors and Affiliations

  1. Commission for Scientific Visualisation, Austrian Academy of Sciences, Donau-City Strasse 1, 1220, Vienna, Austria
    Ha Nguyen & Luis Weruaga

Authors

  1. Ha Nguyen
  2. Luis Weruaga

Editor information

Editors and Affiliations

  1. Japan Advanced Institute of Science and Technology, Asahidai 1-1, 923-12292, Nomi, Japan
    Tu-Bao Ho
  2. Department of Computer Science & Technology, Nanjing University, 22 Hankou Road, 210093, China
    Zhi-Hua Zhou

Rights and permissions

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nguyen, H., Weruaga, L. (2008). Time–Frequency Analysis of Vietnamese Speech Inspired on Chirp Auditory Selectivity. In: Ho, TB., Zhou, ZH. (eds) PRICAI 2008: Trends in Artificial Intelligence. PRICAI 2008. Lecture Notes in Computer Science(), vol 5351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89197-0\_28

Download citation

Keywords

Publish with us