Integrating speech with keypad input for automatic entry of spelling and pronunciation of new words (original) (raw)

A comparison of speech and typed input

1990

Meaningful evaluation of spoken language interfaces must be based on detailed comparisons with an alternate, well-understood input modality, such as the keyboard. This paper presents an empirical study in which users were asked to enter digit strings into the computer by voice and by keyboard. Two different ways of verifying and correcting the spoken input were also examined using either voice or keyboard. Timing analyses were performed to determine which aspects of the interface were critical to speedy completion of the task. The results show that speech is preferable for strings that require more than a few keystrokes. The results emphasize the need for fast mad accurate speech recognition, but also demonstrate how error correction and input validation are crucial components of a speech interface. Although the performance of continuous speech recognizers has improved significantly in recent years [6], few application programs using such technology have been built. This discrepancy is based on the fallacy of equating speech recognition performance with the usability of a spoken language application. Clearly, the accuracy of the speech recognition component is a key factor in the usability of a spoken language system. However other factors come into play when we consider a recognition system in the context of live use.

IJERT-Assiting Voice Input Recognition Application

International Journal of Engineering Research and Technology (IJERT), 2014

https://www.ijert.org/assiting-voice-input-recognition-application https://www.ijert.org/research/assiting-voice-input-recognition-application-IJERTV3IS051239.pdf Assisting Voice Input Recognition Application (in short, AVIRA) is a multifunctional software programming application capable of managing computer's basic operations. It employs a user interface that is capable of receiving user's voice as its input, process it and perform a necessary operation. It communicates back with the user via synthesized speech. The application requests users input in English and responds in the same. Also this system will be very much useful to the differently abled persons and the visually challenged persons who cannot read and, wants to know the details in their education and in any other aspects of the real world. Normally, the visually challenged persons acquire knowledge and exchange information with others mainly through the speech and writing. This system will help those persons in their education and the computer will be made user friendly to achieve this purpose. This paper is based on the fact that the computer will be able to interact with the user and fulfill their needs in the computer world. In this, we have introduced a new approach of using Speech recognition and Speech synthesis to have a two way communication. This makes the user feel as if they are getting a reply from another person. A visually challenged person can easily interact with the computer systems without the help of other persons. Thus we have made an effort in making this an intelligent environment in the speech processing with the computer system by getting the Input from the user as speech input and artificially generate the synthesized voice which makes the process easy.

T12: an advanced text input system with phonetic support for mobile devices

2005

The popular T9 text input system for mobile devices uses a predictive dictionary-based disambiguation scheme, enabling a user to type in commonly-used words with low overhead. We present a new text input system called T12, which in addition to providing T9's capabilities, also allows a user to cycle through the possible choices based on phonetic similarity, and to elaborate commonly used abbreviations, acronyms and other short forms. This ability to cycle through the possible choices acts as a spelling checker, which provides suggestions from the dictionary with similar pronunciation as the input word.

Usability field-test of a spoken data-entry system

1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), 1999

This paper reports on the field-test of a speech based data-entry system developed as a follow-up of an EC funded project. The application domain is the data-entry of personnel absence records from a huge historical paper file (about 100,000 records). The application was required by the personnel office of a public administration. The tested system resulted both sufficiently simple to make a detailed analysis feasible, and sufficiently representative of the potentials of spoken data-entry.

Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices

2012

Abstract Dictation using speech recognition could potentially serve as an efficient input method for touchscreen devices. However, dictation systems today follow a mentally disruptive speech interaction model: users must first formulate utterances and then produce them, as they would with a voice recorder. Because utterances do not get transcribed until users have finished speaking, the entire output appears and users must break their train of thought to verify and correct it.

Speech Technology

except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

Design and Evaluation of a Spoken-Feedback Keyboard

2007

Abstract Speech recognition technologies have come a long way in the past generation. Indeed, they are becoming ever more pervasive in our day-to-day lives, especially in the form of voice-activated menus so prevalent in many automated answering systems.

Applying Prediction Techniques to Phoneme-based AAC Systems

It is well documented that people with severe speech and physical impairments (SSPI) often experience literacy difficulties, which hinder them from effectively using orthographicbased AAC systems for communication. To address this problem, phoneme-based AAC systems have been proposed, which enable users to access a set of spoken phonemes and combine phonemes into speech output. In this paper we investigate how prediction techniques can be applied to improve user performance of such systems. We have developed a phoneme-based prediction system, which supports single phoneme prediction and phoneme-based word prediction using statistical language models generated using a crowdsourced AAC-like corpus. We incorporated our prediction system into a hypothetical 12-key reduced phoneme keyboard. A computational experiment showed that our prediction system led to 56.3% average keystroke savings. 4.2.1 Keystroke Savings Keystroke Savings (KS) is defined as the percentage of keystrokes that the user saves by using prediction methods compared to using the MULTITAP method:

Automatic acquisition of names using speak and spell mode in spoken dialogue systems

Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03, 2003

This paper describes a novel multi-stage recognition procedure for deducing the spelling and pronunciation of an open set of names. The overall goal is the automatic acquisition of unknown words in a human computer conversational system. The names are spoken and spelled in a single utterance, achieving a concise and natural dialogue flow. The first recognition pass extracts letter hypotheses from the spelled part of the waveform and maps them to phonemic hypotheses via a hierarchical sublexical model capable of generating graphemephoneme mappings. A second recognition pass determines the name by combining information from the spoken and spelled part of the waveform, augmented with language model constraints. The procedure is integrated into a spoken dialogue system where users are asked to enroll their names for the first time. The acquisition process is implemented in multiple parallel threads for real-time operation. Subsequent to inducing the spelling and pronunciation of a new name, a series of operations automatically updates the recognition and natural language systems to immediately accommodate the new word. Experiments show promising results for letter and phoneme accuracies on a preliminary dataset.

User Interaction with Word Prediction

ACM Transactions on Accessible Computing, 2009

Word prediction systems can reduce the number of keystrokes required to form a message in a letter-based AAC system. It has been questioned, however, whether such savings translate into an enhanced communication rate due to the additional overhead (e.g., shifting of focus and repeated scanning of a prediction list) required in using such a system. Our hypothesis is that word prediction has high potential for enhancing AAC communication rate, but the amount is dependent in a complex way on the accuracy of the predictions. Due to significant user interface variations in AAC systems and the potential bias of prior word prediction experience on existing devices, this hypothesis is difficult to verify. We present a study of two different word prediction methods compared against letter-by-letter entry at simulated AAC communication rates. We find that word prediction systems can in fact speed communication rate (an advanced system gave a 58.6% improvement), and that a more accurate word p...