Integrating speech with keypad input for automatic entry of spelling and pronunciation of new words (original) (raw)

1990

Meaningful evaluation of spoken language interfaces must be based on detailed comparisons with an alternate, well-understood input modality, such as the keyboard. This paper presents an empirical study in which users were asked to enter digit strings into the computer by voice and by keyboard. Two different ways of verifying and correcting the spoken input were also examined using either voice or keyboard. Timing analyses were performed to determine which aspects of the interface were critical to speedy completion of the task. The results show that speech is preferable for strings that require more than a few keystrokes. The results emphasize the need for fast mad accurate speech recognition, but also demonstrate how error correction and input validation are crucial components of a speech interface. Although the performance of continuous speech recognizers has improved significantly in recent years [6], few application programs using such technology have been built. This discrepancy is based on the fallacy of equating speech recognition performance with the usability of a spoken language application. Clearly, the accuracy of the speech recognition component is a key factor in the usability of a spoken language system. However other factors come into play when we consider a recognition system in the context of live use.

T12: an advanced text input system with phonetic support for mobile devices

2005

The popular T9 text input system for mobile devices uses a predictive dictionary-based disambiguation scheme, enabling a user to type in commonly-used words with low overhead. We present a new text input system called T12, which in addition to providing T9's capabilities, also allows a user to cycle through the possible choices based on phonetic similarity, and to elaborate commonly used abbreviations, acronyms and other short forms. This ability to cycle through the possible choices acts as a spelling checker, which provides suggestions from the dictionary with similar pronunciation as the input word.

Experimental Comparisons of Data Entry by Automated Speech Recognition, Keyboard, and Mouse

Human Factors: The Journal of the Human Factors and Ergonomics Society, 2002

In a series of experiments isolated-word automated speech recognition (ASR) was compared with keyboard and mouse interfaces for three data entry tasks: textual phrase entry, selection from a list, and numerical data entry. To effect fair comparisons, the tasks were designed to minimize the transaction cycle for each input mode and data type, and the main comparisons used times from only correct data entries. With the hardware and software employed the results indicate that for inputting short phrases, ASR competes only if the typist's speed is below 45 words per minute. For selecting an item from a list, ASR offers an advantage only if the list length exceeds 15 items. For entering numerical data, ASR offers no advantage over keypad or mouse. An extrapolation to latency-free ASR suggests that even as hardware and software become faster, human factors will dominate and the results would shift only slightly in favor of ASR.

Usability field-test of a spoken data-entry system

1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), 1999

This paper reports on the field-test of a speech based data-entry system developed as a follow-up of an EC funded project. The application domain is the data-entry of personnel absence records from a huge historical paper file (about 100,000 records). The application was required by the personnel office of a public administration. The tested system resulted both sufficiently simple to make a detailed analysis feasible, and sufficiently representative of the potentials of spoken data-entry.

Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices

2012

Abstract Dictation using speech recognition could potentially serve as an efficient input method for touchscreen devices. However, dictation systems today follow a mentally disruptive speech interaction model: users must first formulate utterances and then produce them, as they would with a voice recorder. Because utterances do not get transcribed until users have finished speaking, the entire output appears and users must break their train of thought to verify and correct it.

Design and Evaluation of a Spoken-Feedback Keyboard

2007

Abstract Speech recognition technologies have come a long way in the past generation. Indeed, they are becoming ever more pervasive in our day-to-day lives, especially in the form of voice-activated menus so prevalent in many automated answering systems.

Applying Prediction Techniques to Phoneme-based AAC Systems

It is well documented that people with severe speech and physical impairments (SSPI) often experience literacy difficulties, which hinder them from effectively using orthographicbased AAC systems for communication. To address this problem, phoneme-based AAC systems have been proposed, which enable users to access a set of spoken phonemes and combine phonemes into speech output. In this paper we investigate how prediction techniques can be applied to improve user performance of such systems. We have developed a phoneme-based prediction system, which supports single phoneme prediction and phoneme-based word prediction using statistical language models generated using a crowdsourced AAC-like corpus. We incorporated our prediction system into a hypothetical 12-key reduced phoneme keyboard. A computational experiment showed that our prediction system led to 56.3% average keystroke savings. 4.2.1 Keystroke Savings Keystroke Savings (KS) is defined as the percentage of keystrokes that the user saves by using prediction methods compared to using the MULTITAP method:

Automatic acquisition of names using speak and spell mode in spoken dialogue systems

Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03, 2003

This paper describes a novel multi-stage recognition procedure for deducing the spelling and pronunciation of an open set of names. The overall goal is the automatic acquisition of unknown words in a human computer conversational system. The names are spoken and spelled in a single utterance, achieving a concise and natural dialogue flow. The first recognition pass extracts letter hypotheses from the spelled part of the waveform and maps them to phonemic hypotheses via a hierarchical sublexical model capable of generating graphemephoneme mappings. A second recognition pass determines the name by combining information from the spoken and spelled part of the waveform, augmented with language model constraints. The procedure is integrated into a spoken dialogue system where users are asked to enroll their names for the first time. The acquisition process is implemented in multiple parallel threads for real-time operation. Subsequent to inducing the spelling and pronunciation of a new name, a series of operations automatically updates the recognition and natural language systems to immediately accommodate the new word. Experiments show promising results for letter and phoneme accuracies on a preliminary dataset.

User Interaction with Word Prediction

ACM Transactions on Accessible Computing, 2009

Word prediction systems can reduce the number of keystrokes required to form a message in a letter-based AAC system. It has been questioned, however, whether such savings translate into an enhanced communication rate due to the additional overhead (e.g., shifting of focus and repeated scanning of a prediction list) required in using such a system. Our hypothesis is that word prediction has high potential for enhancing AAC communication rate, but the amount is dependent in a complex way on the accuracy of the predictions. Due to significant user interface variations in AAC systems and the potential bias of prior word prediction experience on existing devices, this hypothesis is difficult to verify. We present a study of two different word prediction methods compared against letter-by-letter entry at simulated AAC communication rates. We find that word prediction systems can in fact speed communication rate (an advanced system gave a 58.6% improvement), and that a more accurate word p...

Voice key board

Proceedings of the 2009 international conference on Multimodal interfaces - ICMI-MLMI '09, 2009

Multimodal systems, incorporating more natural input modalities like speech, hand gesture, facial expression etc., can make humancomputer-interaction more intuitive by drawing inspiration from spontaneous human-human-interaction. We present here a multimodal input device for Indic scripts called the Voice Key Board (VKB) which offers a simpler and more intuitive method for input of Indic scripts. VKB exploits the syllabic nature of Indic language scripts and exploits the user's mental model of Indic scripts wherein a base consonant character is modified by different vowel ligatures to represent the actual syllabic character. We also present a user evaluation result for VKB comparing it with the most common input method for the Devanagari script, the InScript keyboard. The results indicate a strong user preference for VKB in terms of input speed and learnability. Though VKB starts with a higher user error rate compared to InScript, the error rate drops by 55% by the end of the experiment, and the input speed of VKB is found to be 81% higher than InScript. Our user study results point to interesting research directions for the use of multiple natural modalities for Indic text input.

Speech-to-text input method for web system using JavaScript

We have developed a speech-to-text input method for web systems. The system is provided as a JavaScript library including an Ajax-like mechanism based on a Java applet, CGI programs, and dynamic HTML documents. It allows users to access voice-enabled web pages without requiring special browsers. Web developers can embed it on their web page by inserting only one line in the header field of an HTML document. This study also aims at observing natural spoken interactions in personal environments. We have succeeded in collecting 4,003 inputs during a period of seven months via our public Japanese ASR server. In order to cover out-of-vocabulary words to cope with some proper nouns, a web page to register new words into the language model are developed. As a result, we could obtain an improvement of 0.8% in the recognition accuracy. With regard to the acoustical conditions, an SNR of 25.3 dB was observed.

Patterns of entry and correction in large vocabulary continuous speech recognition systems

Proceedings of the SIGCHI conference on Human factors in computing systems the CHI is the limit - CHI '99, 1999

A study was conducted to evaluate user performance and satisfaction in completion of a set of text creation tasks using three commercially available continuous speech recognition systems. The study also compared user performance on similar tasks using keyboard input. One part of the study (Initial Use) involved 24 users who enrolled, received training and carried out practice tasks, and then completed a set of transcription and composition tasks in a single session. In a parallel effort (Extended Use), four researchers used speech recognition to carry out real work tasks over 10 sessions with each of the three speech recognition software products. This paper presents results from the Initial Use phase of the study along with some preliminary results from the Extended Use phase. We present details of the kinds of usability and system design problems likely in current systems and several common patterns of error correction that we found. Keywords Speech recognition, input techniques, speech user interfaces, analysis methods [lcrnjission to ,,lakc digital or Ilard zapics 01'dl or part 0I'this uork fbl personal Or classroonl 11s~ is grantd without i'w prwidcd Ihill CWiCS arr: nol ,lla& ,jr dislril>ute(i li,r profit or commercial Xl\.Xlta@C and &it c(,pics hcur illi< nolicc a11ti the I'~111 ciltlliml ou tk lirsl pa@C. I'() C(W ottlcrwisc, I,, qublish, 10 post cm scrvm co' to rcdisLrii)UtC 10 ffSt% rquires prior qxcifis pcniiission :llld'Or Cl fCc.

An Empirical Approach for the Evaluation of Voice User Interfaces

intechopen.com

Nowadays, the convergence of devices, electronic computing, and massive media produces huge volumes of information, which demands the need for faster and more efficient interaction between users and information. How to make information access manageable, efficient, and easy becomes the major challenge for Human-Computer Interaction (HCI) researchers. The different types of computing devices, such as PDAs (personal digital assistants), tablet PCs, desktops, game consoles, and the next generation phones, provide many different modalities for information access. This makes it possible to dynamically adapt application user interfaces to the changing context. However, as applications go more and more pervasive, these devices show theirs limited input/output capacity caused by small visual displays, use of hands to operate buttons and the lack of an alphanumeric keyboard and mouse (Gu & Gilbert, 2004). Voice User Interface (VUI) systems are capable of, besides recognizing the voice of their users, to understand voice commands, and to provide responses to them, usually, in real time. The state-of-the-art in speech technology already allows the development of automatic systems designed to work in real conditions. VUI is perhaps the most critical factor in the success of any automated speech recognition (ASR) system, determining whether the user experience will be satisfying or frustrating, or even whether the customer will remain one. This chapter describes a practical methodology for creating an effective VUI design. The methodology is scientifically based on principles in linguistics, psychology, and language technology (Cohen et al. 2004; San-Segundo et al., 2005). Given the limited input/output capabilities of mobile devices, speech presents an excellent way to enter and retrieve information either alone or in combination with other modalities. Furthermore, people with disabilities should be provided with a wide range of alternative interaction modalities other than the traditional screen-mouse based desktop computing devices. Whether the disability is temporary or permanent, people with reading difficulty, visual impairment, and/or any difficulty using a keyboard, or mouse can rely on speech as an alternate approach for information access.

Speech User Interface for Information Retrieval

Along with the rapid development of information technology, the amount of information generated at a given time far exceeds human's ability to organize, search, and manipulate without the help of automatic systems. Now a days so many tools and techniques are available for storage and retrieval of information. User uses interface to interact with these techniques, mostly text user interface (TUI) or graphical user interface (GUI). Here, I am trying to introduce a new interface i.e. speech for information retrieval. The goal of this project is to develop a speech interface that can search and read the required information from the database effectively, efficiently and more friendly. This tool will be highly useful to blind people, they will able to demand the information to the computer by giving voice command/s (keyword) through microphone and listen the required information using speaker or headphones.

A Speech-In List-Out Approach to Spoken User Interfaces

2004

Spoken user interfaces are conventionally either dialogue based or menu-based. In this paper we propose a third approach, in which the task of invoking responses from the system is treated as one of retrieval from the set of all possible responses. Unlike conventional spoken user interfaces that return a unique response to the user, the proposed interface returns a shortlist of possible responses, from which the user must make the final selection. We refer to such interfaces as Speech-In List-Out or SILO interfaces. Experiments show that SILO interfaces can be very effective, are highly robust to degraded speech recognition performance, and can impose significantly lower cognitive load on the user as compared to menu-based interfaces.

Automated System Using Speech Recognition

IRJET, 2022

In today's world where technology is advancing to its highest degree, new updates are coming daily. Tasks that seemed impossible at first are appearing like a piece of cake today. As mentioned above, there are still some parts of tech that have to be taken a step ahead of its time. People still use mouse and keyboard for software access and modifications. As Computers can be operated with only keyboard and mouse, a new era of voice assistant has invaded the house of Technologies. In mobile we have Siri, Alexa, Google Assistant and in Computers we have Cortana as well. However, the computer technology needs to be updated to perform the various computer operations rather than just being limited to web search. So we have concentrated ourselves in making a Desktop Assistant that will perform the same task as that of siri, alexa i.e. web search along with the ability to manage and modify the files present in the system. In this project, we are making an automated system in such a form that can be provided directly to the system in executable format and will perform operations as soon as the project is activated.