Speech Recognition of Punjabi Numerals Using Convolutional Neural Networks (original) (raw)
Achieving accuracy for speech recognition has been a huge obstacle in the domain of Natural Language Processing and the model used predominantly for this is GMM-HMM. But, now with the boom of deep learning, it took primacy over the earlier model. With the advancement in the parallel processing and usage of the GPU power, deep learning has set forth results that have outperformed the GMM-HMM. This paper evaluates the performance of deep learning algorithm—Convolutional Neural network (CNN) on dataset comprising of audio (.wav) files capturing the recital of numerals from 0 to 100 in Punjabi language. The accuracy of the network is evaluated for two datasets that are with and without noise reduction. The model gives better results than the baseline GMM-HMM showing a reduction of error rate by 3.23% for data with noise reduction and by 3.76% for data without noise reduction.