ASR in mobile phones - an industrial approach (original) (raw)

Rapid porting of ASR-systems to mobile devices

Interspeech 2005

Portable devices for the consumer market are becoming available in large quantities. Because of their design and use, human speech often is the input modality of choice, for example for car navigation systems or portable speech-to-speech translation devices. In this paper we describe our work in porting our existing desktop PC based speech recognition system to an off-the-shelf PDA running WindowsCE3.0. We do this in a way that our already well performing language and acoustic models can be taken over without the need of retraining them for the PDA. In order to achieve an acceptable run-time behavior we apply several optimization techniques to the preprocessing and decoding process. Among other things we introduce the newly developed early feature vector reduction. In that way the execution time of our recognition system can be reduced from initially 28x realtime to 2.6x real-time with a tolerable increase in word error rate. The size of the acoustic models is reduced to 25% of its original size.

Evaluation of Phonetic System for Speech Recognition on Smartphone

International Journal of Innovative Technology and Exploring Engineering, 2019

This paper presents detailed study and performance evaluation of phonetic system by comparing it with various classification techniques of automatic speech recognition such as Neural Network, Hidden Markov Model, Support Vector Machine and Gaussian Mixture Model. In the phonetic system, recognized speech is processed by using language processing i.e. matching phonemes and hence generates more correct output text. The accuracy of speech recognition of ASR classifier and phonetic system is evaluated on day to day human to machine communications, using high-quality recording equipment, while the results for enhancement of existing systems is done on everyday android phones, and evaluated for normal conversations in Hindi and English language. Classifier is used to classify the fragmented phonemes or words after the fragmentation of the speech signal. Different classification techniques are implemented and comparing accuracy of speech recognition of different classifier. It is seen that...

A robust high accuracy speech recognition system for mobile applications

IEEE Transactions on Speech and Audio Processing, 2002

This paper describes a robust, accurate, efficient, low-resource, medium-vocabulary, grammar-based speech recognition system using Hidden Markov Models for mobile applications. Among the issues and techniques we explore are improving robustness and efficiency of the front-end, using multiple microphones for removing extraneous signals from speech via a new multi-channel CDCN technique, reducing computation via silence detection, applying the Bayesian information criterion (bic) to build smaller and better acoustic models, minimizing finite state grammars, using hybrid maximum likelihood and discriminative models, and automatically generating baseforms from single new-word utterances.

ASR - A real-time speech recognition on portable devices

IEEE, 2016

This paper presents the implementation of real-time automatic speech recognition (ASR) for portable devices. The speech recognition is performed offline using PocketSphinx which is the implementation of Carnegie Mellon University's Sphinx speech recognition engine for portable devices. In this work, machine Learning approach is used which converts graphemes into phonemes using the TensorFlow's Sequence-to-Sequence model to produce the pronunciations of words. This paper also explains the implementation of statistical language model for ASR. The novelty of ASR is its offline speech recognition and thus requires no Internet connection compared to other related works. A speech recognition service currently provides the cloud based processing of speech and therefore has access to the speech data of users. However, the speech is processed on the handheld device in offline ASR and therefore enhances the privacy of users.

Innovative speech processing for mobile terminals: an annotated bibliography

Signal Processing, 2000

This paper gives an overview of recent bibliographic references dealing with speech processing in mobile terminals. Its purpose is to point out state of the art issues in the area; thus a fairly large list of references taken from many conferences proceedings and journals is given and commented. General considerations about speech processing in mobile communications are "rstly introduced; then we deal with audio processing for speech enhancement in mobile terminals and with low bit-rate speech coding. Speech recognition is addressed with some accent put on mobile applications. A short overview of implementation aspects of speech processing algorithms in mobile terminals is also given. Finally, open issues and problems are listed.

Speech Recognition Native Module Environment Inherent in Mobiles Devices

Lecture Notes in Computer Science, 2015

Applications on mobile devices have been characterized for their usability. The voice is a natural means of interaction between users and mobile devices. Traditional speech recognition algorithms work in controlled media are targeted to specific population groups (e.g. age, gender or language to name of few), and also require a lot of computational resources so that the algorithms are effective. Therefore, pattern recognition is performed in mobile applications as web services. However, this type of solution generates high dependence on Internet connectivity, so it is desirable to have an embedded module for this task that does not consume many computational resources and have a good level of effectiveness. This paper presents an embedded mobile systems for voice recognition module is presented. This module works in noisy environments, it works for any age of users and has proved that it can work for several languages.

A ROBUST SPEAKER–INDEPENDENT CPU–BASED ASR SYSTEM

1999

In this paper a new automatic speech recognition (ASR) CPU-based software, called AlfaNum, with the chosen few heuristics optimized for applications in heterogeneous conditions is described. AlfaNum is a discrete speaker-independent ASR product intended for application in the largest bank-by-phone interactive voice response (IVR) system in Yugoslavia, with a lot of customers all over Serbia. That means a large variety of dialects, telephone line quality, and microphones used. This system has been tested on 500 speakers and it achieved an average accuracy of 98,2% in real life conditions. The whole software is developed in C++ programming language. Object oriented programming gave the software an elegant look, and minimized all possible errors. On the other hand, the power of C++ language and its tight interaction with machine made the software fast and efficient.

Telephony Speech Recognition System: Challenges

Ijca Proceedings on National Conference on Communication Technologies Its Impact on Next Generation Computing 2012, 2012

Present paper describes the challenges to design the telephony Automatic Speech Recognition (ASR) System. Telephonic speech data are collected automatically from all geographical regions of West Bengal to cover major dialectal variations of Bangla spoken language. All incoming calls are handled by Asterisk Server i.e. Computer telephony interface (CTI). The system asks some queries and users' spoken responses are stored and transcribed manually for ASR system training. In real time scenario, the telephonic speech contains channel drop, silence or no speech event, truncated speech signal, noisy signal etc along with the desired speech event. This paper describes these kinds of challenges of telephony ASR system. And also describes some brief techniques which will handle such unwanted signals in case of telephonic speech to certain extent and able to provide almost desired speech signal for the ASR system.