AutoSSR: an efficient approach for automatic spontaneous speech recognition model for the Punjabi Language (original) (raw)
Related papers
—Speech is one of the easiest and the fastest way to communicate. Recognition of speech by computer for various languages is a challenging task. The accuracy of Automatic speech recognition system (ASR) remains one of the key challenges, even after years of research. Accuracy varies due to speaker and language variability, vocabulary size and noise. Also, due to the design of speech recognition that is based on issues like-speech database, feature extraction techniques and performance evaluation. This paper aims to describe the development of a speaker-independent isolated automatic speech recognition system for Indian English language. The acoustic model is build using Carnegie Mellon University (CMU) Sphinx tools. The corpus used is based on Most Commonly used English words in everyday life. Speech database includes the recordings of 76 Punjabi Speakers (northwest Indian English accent). After testing, the system obtained an accuracy of 85.20 %, when trained using 128 GMMs (Gaussian Mixture Models).
Punjabi Speech Recognition: A Survey
As Punjabi language is one of the most widely used languages in media and communication, its speech recognition is need of the hour. Hence, survey has been carried out for Punjabi speech recognition. In this, work has been carried out from boundary detection of isolated word recognition from Historical Perspective to that of present scenario. However, this has been limited to constraints and assumptions. This paper discusses the related work and future challenges.
ASRoIL: a comprehensive survey for automatic speech recognition of Indian languages
Artificial Intelligence Review, 2019
India is the land of language diversity with 22 major languages having more than 720 dialects, written in 13 different scripts. Out of 22, Hindi, Bengali, Punjabi is ranked 3rd, 7th and 10th most spoken languages around the globe. Expect Hindi, where one can find some significant research going on, other two major languages and other Indian languages have not fully developed Automatic Speech Recognition systems. The main aim of this paper is to provide a systematic survey of the existing literature related to automatic speech recognition (i.e. speech to text) for Indian languages. The survey analyses the possible opportunities, challenges, techniques, methods and to locate, appraise and synthesize the evidence from studies to provide empirical answers to the scientific questions. The survey was conducted based on the relevant research articles published from 2000 to 2018. The purpose of this systematic survey is to sum up the best available research on automatic speech recognition of Indian languages that is done by synthesizing the results of several studies.
Speech Recognition Technology: A Survey on Indian Languages
This paper presents a brief survey of Automatic Speech Recognition (ASR) and discusses the major themes and advances made in the past 70 years of research, so as to provide a technological perspective and an appreciation of the fundamental progress that has been accomplished in this important area of speech communication. Despite years of research and consequent progress of the accuracy of ASR, the latter remains one of the most important research challenges which calls for further research. The design of Speech Recognition system, therefore, depends on the following issues: Definition of various types of speech classes, speech representation, feature extraction techniques, speech classifiers, database, language models and performance evaluation. The problems that persist in ASR and the various techniques developed by various research workers to solve these problems have been presented in a chronological order. This is a work highlighting the contributions in the area of speech recognition works with special reference to Indian Languages. The objective of this survey is to summarize and compare some of the well-known methods and Toolkits used in various stages of speech recognition system and also identify research topic and applications which are at the forefront of this exciting and challenging field.
Speech Corpus Development for a Speaker Independent Spontaneous Urdu Speech Recognition System
2010
This paper reports the design and development of an 82 speaker Urdu speech corpus for speaker independent spontaneous speech recognition using the CMU Sphinx Open Source Toolkit for Speech Recognition. The corpus consists of 45 hours of spontaneous and read speech data from 82 speakers (42 male and 40 female), recorded over a microphone and a telephone line. The speech was collected from speakers ranging from 20 to 55 years of age. Recording sessions were conducted in office and home environments.
AN OVERVIEW OF HINDI SPEECH RECOGNITION
In this age of information technology, information access in a convenient manner has gained importance. Since speech is a primary mode of communication among human beings, it is natural for people to expect to be able to carry out spoken dialogue with computer [1]. Speech recognition system permits ordinary people to speak to the computer to retrieve information. It is desirable to have a human computer dialogue in local language. Hindi being the most widely spoken Language in India is the natural primary human language candidate for human machine interaction. There are five pairs of vowels in Hindi languages; one member is longer than the other one. This paper describes an overview of speech recognition system. How speech is produced and the properties and characteristics of Hindi Phoneme.
Automatic Speech Recognition System for Hindi Utterances with Regional Indian Accents: A Review 1
This paper presents a study of automatic speech recognition system for Hindi utterances with regional Indian accents. In paper [3] we have designed matlab based ASR and control system for eight English key words by using simple rule base. This rule base algorithm is the beginning stage for Key Word recognition. In paper we have designed Design of Hindi Key Word Recognition System for Home Automation System Using MFCC and DTW. Features of the speech signal are extracted in the form of MFCC coefficients and Dynamic Time Warping (DTW) has been used as features matching techniques. The recognition results are tested for clean and noisy test data. Average accuracy for clean data is 97.50 % while that for noisy data is 91.25 %. We face problem in noise environment to detect correct utterance now we are going to review different papers and find out different techniques to design our ASR control system for Hindi Key Words using MFCC and DTW in noise environment.
An Empirical Approach for Optimization of Acoustic Models in Hindi Speech Recognition Systems
A well established paradigm to develop an automatic speech recognition (ASR) system is the feature extraction at front end and liklihood evaluation of feature vectors using hidden Markov models (HMMs) with Gaussian mixtures at back end. To reduce the overall computational overhead and for proper handling of HMM parameters the appropriate selection of Gaussian mixtures and tied states is very important. This paper reviews the statistical framework and presents an empirical approach to select the optimum number of Gaussian mixtures and appropriate degree of state tying in context of small amount of training data usually available for Indian languages, specifically Hindi. At front end we have used perceptual linear prediction (PLP) combined with relative spectral (RASTA) processing for our proposed Hindi speech recognition system.
Urdu Speech Corpus and Preliminary Results on Speech Recognition
Language resources for Urdu language are not well developed. In this work, we summarize our work on the development of Urdu speech corpus for isolated words. The Corpus comprises of 250 isolated words of Urdu recorded by ten individuals. The speakers include both native and non-native, male and female individuals. The corpus can be used for both speech and speaker recognition tasks. We also report our results on automatic speech recognition task for the said corpus. The framework extracts Mel Frequency Cepstral Coefficients along with the velocity and acceleration coefficients, which are then fed to different classifiers to perform recognition task. The classifiers used are Support Vector Machines, Random Forest and Linear Discriminant Analysis. Experimental results show that the best results are provided by the Support Vector Machines with a test set accuracy of 73%. The results reported in this work may provide a useful baseline for future research on automatic speech recognition of Urdu.
Chhattisgarhi speech corpus for research and development in automatic speech recognition
Automatic speech recognition (ASR) is a computerized interface which allows humans to communicate with machine in a way of its natural conversation. ASR has wide range of applications in various fields such as language development in young children, telecommunications, as an assistive device for hearing impaired etc. Performance of ASR system is greatly influenced by the database used for its implementation. In this paper, we are discussing about building a speech corpus for a rare but important Indian dialect Chhattisgarhi. This speech corpus consists of 100 unique isolated words and four speech scripts aggregating 67 sentences, recorded from total 478 native speakers. These words were selected from English to Chhattisgarhi dictionary published by Chhattisgarh Rajbhasha Aayog and scripts from Chhattisgarhi literature and newspaper articles. This dataset has been collected travelling over 60% geographical area of the Chhattisgarh state. Finally, a valuable speech corpus for the first time have been prepared for Chhattisgarhi with an aim to enhance the speech research. The successful extermination of speech recognition for both isolated and continuous speech samples have been demonstrated on the prepared database.