A robust speaker recognition system combining factor analysis techniques (original) (raw)

Abstract

in this paper we implement state of the art factor analysis based methods and fused their scores to gain a channel robust speaker recognition system. These two methods are joint factor analysis (JFA) and i-Vector which define low-dimensional speaker and channel dependent spaces. For score fusion we propose a simple weight computation without training step. We experiment our method on two conditions; 1) in channel matched training and test channel (telephone in training phase/telephone in test phase) task and 2) the channel mismatched condition (telephone training phase/microphone, GSM and VOIP in test phase) task. Our strategies outperform a state-of-the-art GMM-UBM based system. We obtained more than 4% absolute EER improvement for both channel dependent and channel independent condition compared to the standard GMM-UBM based method. Simulation also results that the combined i-Vector and JFA based system give better performance than all implemented method.

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (15)

S. Reza, Z. Zeinalkhani, J., Kabudian, "The effect of initial conditions on RASTA filter efficiency" (in Persian), 15 th Computer Conference, MATN cooperation, Tehran, Iran, 2010.
D. E. Sturim, and D. A. Reynolds, "Speaker adaptive cohort selection for Tnorm in text-independent speaker verification," In Proc. of ICASSP, 2005.
D. Martinez, O. Plchot, L. Burget, O. Glemberk, P. Matejka, "Language Recognition in i-vectors Space", Proc. InterSpeech, 2011.
K. You, H. Wang, "Robust Features for Noisy Speech Recognition based on Temporal Trajectory Filtering of Short- Time Autocorrelation Sequences", Speech Communication, volume 28, page 13-28, 1999.
N. Dehak, P.A. Torres-Carrasquillo, D. Reynolds, R. Dehak, "Language Recognition via i-vectors and Dimensionality Reduction", Proc. InterSpeech, 2011.
N. Dehak, R. Dehak, P. Kenny, N. Brummer, and P. Dumouchel, "Support Vector Machines Versus Fast Scoring in the Low-Dimensional Total Variability for Speaker Verification", INTERSPEECH, 2009.
P. Kenny, "Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithm", Technical Report, CRIM Research Center, Canada, 2005.
P. Kenny, G. Boulianne, and P. Dumouchel, "Eigenvoice modeling with sparse training data", IEEE Trans. Speech and Audio Processing, volume 13, page 345-356, 2005.
R., Auckenthaler, M., Carey, H. Lloyd-Thormas, "Score Normalization for Text -Independent Speaker Recognition Systems", Digital Signal Processing, 2000.
V. Hautamaki, T., Kinnunen, F., Sedlak, K. A. Lee, B. Ma and H. Li, "Sparse Classifier Fusion for Speaker Verification", Audio, Speech, and Language Processing, IEEE Transactions on (Volume: 21, Issue: 8), 2013.
D. A. Reynolds, T. F. Quatieri and R. B. Dunn, "Speaker Verification Using Adapted Gaussian Mixture Model", Digital Signal Processing, 2000.
T., Schultz, N., Vu, T., Schlippe,"GlobaPhone: A Multilingual Text and Speech Database in 20 Languages", ICASP, 2013.
M. Bijnakhan, J., Sheikhzadegan., Y., Samareh, K., Lukas, M., Tebyani, " FARSDAT -The Speech Database of Farsi Spoken Language", Speech science and technology 5th Australian international conference, 1994.
J. F. Bonastre, N. Scheffer, C.Fredouille, D. Matrouf, "NIST'04 SPEAKER RECOGNITION EVALUATION CAMPAIGN: NEW LIA SPEAKER DETECTION PLATEFORM BASED ON ALIZE TOOLKIT", proceeding of NIST 2004 speaker recognition workshop2004.
V. Hautamaki, T., Kinnunen, F., Sedlak, K. A., Lee, B., Ma, H., Li, " Sparse Classifier Fusion For Speaker Verification", IEEE Transaction on audio and language processing, 2013.