Shweta Sinha | Birla Institute of Technology, Mesra (Ranchi) India (original) (raw)
Related Authors
Graduate Center of the City University of New York
Uploads
Papers by Shweta Sinha
This paper describes the collection of a text and audio corpus for mobile personal communication ... more This paper describes the collection of a text and audio corpus for mobile personal communication in Hindi. Hindi is the largest of the Indian languages, and is the first language for more than 200 million people who use it not only for spoken mobile communication but also for sending text messages to each other. The main script for Hindi is Devanagari, but it is not well supported by the current generation of mobile devices. The Devanagari alphabet is twice as large as for English which makes it difficult to fit onto the small keypad of a mobile device. The aim of this project is to collect text and speech resources which can be used for training spoken language systems that aide text messaging on mobile devices - i.e. train a speech recogniser for the mobile personal communication domain so that text can be input through dictation rather than by typing. In total we collected a text corpus of 2 million words of natural messages in 12 different domains, and a spoken corpus of 100 speakers who each spoke 630 phonetically rich sentences - about 4 hours of speech. The speech utterances were recorded in 16 kHz through 3 recording channels: a mobile phone, a headset and a desktop mounted microphone. The data sets were properly annotated and available for development of speech recognition / synthesis systems in mobile domain.
This paper describes the collection of a text and audio corpus for mobile personal communication ... more This paper describes the collection of a text and audio corpus for mobile personal communication in Hindi. Hindi is the largest of the Indian languages, and is the first language for more than 200 million people who use it not only for spoken mobile communication but also for sending text messages to each other. The main script for Hindi is Devanagari, but it is not well supported by the current generation of mobile devices. The Devanagari alphabet is twice as large as for English which makes it difficult to fit onto the small keypad of a mobile device. The aim of this project is to collect text and speech resources which can be used for training spoken language systems that aide text messaging on mobile devices - i.e. train a speech recogniser for the mobile personal communication domain so that text can be input through dictation rather than by typing. In total we collected a text corpus of 2 million words of natural messages in 12 different domains, and a spoken corpus of 100 speakers who each spoke 630 phonetically rich sentences - about 4 hours of speech. The speech utterances were recorded in 16 kHz through 3 recording channels: a mobile phone, a headset and a desktop mounted microphone. The data sets were properly annotated and available for development of speech recognition / synthesis systems in mobile domain.
State of the art automatic speech recognition system uses Mel frequency cepstral coefficients as ... more State of the art automatic speech recognition system uses Mel frequency cepstral coefficients as feature extractor along with Gaussian mixture model for acoustic modeling but there is no standard value to assign number of mixture component in speech recognition process.Current choice of mixture component is arbitrary with little justification. Also the standard set for European languages can not be used in Hindi speech recognition due to mismatch in database size of the languages.The parameter estimation with too many or few component may inappropriately estimate the mixture model. Therefore, number of mixture is important for initial estimation of expectation maximization process. In this research work, we estimate number of Gaussian mixture component for Hindi database based upon the size of vocabulary.Mel frequency cepstral feature and perceptual linear predictive feature along with its extended variations with delta-delta-delta feature have been used to evaluate this number based on optimal recognition score of the system . Comparitive analysis of recognition performance for both the feature extraction methods on medium size Hindi database is also presented in this paper.HLDA has been used as feature reduction technique and also its impact on the recognition score has been highlighted here.
This paper describes the collection of a text and audio corpus for mobile personal communication ... more This paper describes the collection of a text and audio corpus for mobile personal communication in Hindi. Hindi is the largest of the Indian languages, and is the first language for more than 200 million people who use it not only for spoken mobile communication but also for sending text messages to each other. The main script for Hindi is Devanagari, but it is not well supported by the current generation of mobile devices. The Devanagari alphabet is twice as large as for English which makes it difficult to fit onto the small keypad of a mobile device. The aim of this project is to collect text and speech resources which can be used for training spoken language systems that aide text messaging on mobile devices - i.e. train a speech recogniser for the mobile personal communication domain so that text can be input through dictation rather than by typing. In total we collected a text corpus of 2 million words of natural messages in 12 different domains, and a spoken corpus of 100 speakers who each spoke 630 phonetically rich sentences - about 4 hours of speech. The speech utterances were recorded in 16 kHz through 3 recording channels: a mobile phone, a headset and a desktop mounted microphone. The data sets were properly annotated and available for development of speech recognition / synthesis systems in mobile domain.
This paper describes the collection of a text and audio corpus for mobile personal communication ... more This paper describes the collection of a text and audio corpus for mobile personal communication in Hindi. Hindi is the largest of the Indian languages, and is the first language for more than 200 million people who use it not only for spoken mobile communication but also for sending text messages to each other. The main script for Hindi is Devanagari, but it is not well supported by the current generation of mobile devices. The Devanagari alphabet is twice as large as for English which makes it difficult to fit onto the small keypad of a mobile device. The aim of this project is to collect text and speech resources which can be used for training spoken language systems that aide text messaging on mobile devices - i.e. train a speech recogniser for the mobile personal communication domain so that text can be input through dictation rather than by typing. In total we collected a text corpus of 2 million words of natural messages in 12 different domains, and a spoken corpus of 100 speakers who each spoke 630 phonetically rich sentences - about 4 hours of speech. The speech utterances were recorded in 16 kHz through 3 recording channels: a mobile phone, a headset and a desktop mounted microphone. The data sets were properly annotated and available for development of speech recognition / synthesis systems in mobile domain.
State of the art automatic speech recognition system uses Mel frequency cepstral coefficients as ... more State of the art automatic speech recognition system uses Mel frequency cepstral coefficients as feature extractor along with Gaussian mixture model for acoustic modeling but there is no standard value to assign number of mixture component in speech recognition process.Current choice of mixture component is arbitrary with little justification. Also the standard set for European languages can not be used in Hindi speech recognition due to mismatch in database size of the languages.The parameter estimation with too many or few component may inappropriately estimate the mixture model. Therefore, number of mixture is important for initial estimation of expectation maximization process. In this research work, we estimate number of Gaussian mixture component for Hindi database based upon the size of vocabulary.Mel frequency cepstral feature and perceptual linear predictive feature along with its extended variations with delta-delta-delta feature have been used to evaluate this number based on optimal recognition score of the system . Comparitive analysis of recognition performance for both the feature extraction methods on medium size Hindi database is also presented in this paper.HLDA has been used as feature reduction technique and also its impact on the recognition score has been highlighted here.