A configurable distributed speech recognition system, Biennial on DSP for in-Vehicle and Mobile Systems (original) (raw)

An efficient framework for robust mobile speech recognition services

Acoustics, Speech, and …, 2003

A distributed framework for implementing automatic speech recognition (ASR) services on wireless mobile devices is presented. The framework is shown to scale easily to support a large number of mobile users connected over a wireless network and degrade ...

ASR - A real-time speech recognition on portable devices

IEEE, 2016

This paper presents the implementation of real-time automatic speech recognition (ASR) for portable devices. The speech recognition is performed offline using PocketSphinx which is the implementation of Carnegie Mellon University's Sphinx speech recognition engine for portable devices. In this work, machine Learning approach is used which converts graphemes into phonemes using the TensorFlow's Sequence-to-Sequence model to produce the pronunciations of words. This paper also explains the implementation of statistical language model for ASR. The novelty of ASR is its offline speech recognition and thus requires no Internet connection compared to other related works. A speech recognition service currently provides the cloud based processing of speech and therefore has access to the speech data of users. However, the speech is processed on the handheld device in offline ASR and therefore enhances the privacy of users.

An Efficient Front-End for Distributed Speech Recognition over Mobile

International Journal of Computer and Communication Engineering, 2012

To improve the robustness of distributed speech front-ends in mobile communication we introduce, in this paper, a new set of feature vector which is estimated through three steps. First, the Mel-Line Spectral Frequencies (MLSFs) coefficients are combined with conventional MFCCs, after extracted from a denoised acoustic frame using the wiener filter. Also, we optimize the stream weights of multi-stream HMMs by deploying a discriminative approach. Finally, these features are adequately transformed and reduced in a multi-stream scheme using Karhunen-Loeve Transform (KLT). Recognition experiments on the Aurora 2 connected digits database reveal that the proposed front-end leads to a significant improvement in speech recognition accuracy for highly noisy GSM.

DynaSpeak: SRI's Scalable Speech Recognizer for Embedded and Mobile Systems

2002

We introduce SRI's new speech recognition engine, DynaSpeak TM , which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operation based on integer arithmetic. These features are designed to address the needs of the fast-developing and changing domain of embedded and mobile computing platforms.

Network and embedded applications of automatic speech recognition

2008

ASR (Automatic Speech Recognition) is one of key technologies in the upcoming Ubiquitous Computing and Ambient Intelligence. In this paper, first, the surveys on processing devices such as microprocessors and memories, and on communication infrastructure, especially wireless communication infrastructure relating to ASR are reported. Second, the embedded version of CSR (Continuous Speech Recognition) software for the mobile environmental use of ASR is reported.

Speech Recognition Native Module Environment Inherent in Mobiles Devices

Lecture Notes in Computer Science, 2015

Applications on mobile devices have been characterized for their usability. The voice is a natural means of interaction between users and mobile devices. Traditional speech recognition algorithms work in controlled media are targeted to specific population groups (e.g. age, gender or language to name of few), and also require a lot of computational resources so that the algorithms are effective. Therefore, pattern recognition is performed in mobile applications as web services. However, this type of solution generates high dependence on Internet connectivity, so it is desirable to have an embedded module for this task that does not consume many computational resources and have a good level of effectiveness. This paper presents an embedded mobile systems for voice recognition module is presented. This module works in noisy environments, it works for any age of users and has proved that it can work for several languages.

Speech recognition in mobile environments

2000

The growth of cellular telephony combined with recent advances in speech recognition technology results in sizeable potential opportunities for mobile speech recognition applications. Classic robustness techniques that have been previously proposed for speech recognition yield limited improvements of the degradation introduced by idiosyncrasies of the mobile networks. These sources of degradation include distortion introduced by the speech codec as well as artifacts arising from channel errors and discontinuous transmission.

SPEECH RECOGNITION SYSTEM

Speech recognition applications are becoming more and more useful nowadays. Various interactive speech aware applications are available in the market. But they are usually meant for and executed on the traditional general-purpose computers. With growth in the needs for embedded computing and the demand for emerging embedded platforms, it is required that the speech recognition systems (SRS) are available on them too. PDAs and other handheld devices are becoming more and more powerful and affordable as well. It has become possible to run multimedia on these devices. Speech recognition systems emerge as efficient alternatives for such devices where typing becomes difficult attributed to their small screen limitations. This paper characterizes an SR process on PXA27x XScale processor, a widely used platform for handheld devices, and implement it for performing tasks on media files through a Linux media player, Mplayer.

A distributed framework for enterprise level speech recognition services

2004 IEEE International Conference on Acoustics, Speech, and Signal Processing

This paper presents methods for improving the efficiency of automatic speech recognition (ASR) decoders in multiuser applications. The methods involve allocating ASR resources to service human-machine dialogs in deployments that make use of many low cost, commodity servers. It is shown that even very simple strategies for efficient allocation of ASR servers to incoming utterances has the potential to double the capacity of a multiuser deployment. This is important because, while there has been a great deal of work applied to increasing the efficiency of individual ASR engines, there has been little effort applied to increasing overall efficiency at peak loads in multiuser scenarios.