Feature bandwidth extension for Persian conversational telephone speech recognition (original) (raw)

Configuring a whole setup with application of continuous conversational telephony speech recognition in Persian is the goal of this paper. For this propose, two common methods, Gaussian Mixture Model (GMM) and Neural Network (NN) and a proposed hybrid GMM-NN method have been considered to estimate full-bandwidth features from band-limited features. Performances of these methods have been evaluated with two different spectral and cepstral based features, LFBE and MFCC. Also, the effect of speaker gender in estimation process has been investigated. Our results showed that best phoneme recognition accuracy is obtained when MFCC features are reconstructed using two gender dependent neural networks. In this configuration, phoneme accuracy was about 1.6 % more than baseline. The tests were applied on TFarsDat corpus.