Further Development of the PhonicStick: The application of phonic-based acceleration methods to the speaking joystick (original) (raw)

Further Development of the PhonicStick

2015

The PhonicStick is a novel Augmentative and Alternative Communication joystick-like device which enables individuals with severe speech and physical disorders to access forty-two sounds (i.e. phonics) and blend the sounds together to create spoken words. The device aims to allow the users to select phonics and produce speech without the need for a visual interface. One of the problems with the current prototype of the PhonicStick is that the phonic entry is relatively slow and may involve many physical movements which will cause great difficulties for users with poor hand function. Therefore, in this research we are investigating whether natural language processing (NLP) technology can be utilized to facilitate the phonic retrieval and word creation processes. Our goal is to develop a set of phonic-based NLP acceleration methods, such as phonic disambiguation and phonic prediction, which will reduce the user effort required to select the target phonics and improve the speed of producing words. This paper will discuss the challenges of applying such methods to the PhonicStick and report on the current state of the development of the proposed techniques. The presentation will also include a live demonstration of the latest prototype of the PhonicStick.

The Vocal Joystick: A Voice-Based Human-Computer Interface for Individuals with Motor Impairments

We present a novel voice-based humancomputer interface designed to enable individuals with motor impairments to use vocal parameters for continuous control tasks. Since discrete spoken commands are ill-suited to such tasks, our interface exploits a large set of continuous acousticphonetic parameters like pitch, loudness, vowel quality, etc. Their selection is optimized with respect to automatic recognizability, communication bandwidth, learnability, suitability, and ease of use. Parameters are extracted in real time, transformed via adaptation and acceleration, and converted into continuous control signals. This paper describes the basic engine, prototype applications (in particular, voice-based web browsing and a controlled trajectory-following task), and initial user studies confirming the feasibility of this technology.

The vocal joystick data collection effort and vowel corpus

Vocal Joystick is a mechanism that enables individuals with motor impairments to make use of vocal parameters to control objects on a computer screen (buttons, sliders, etc.) and ultimately will be used to control electro-mechanical instruments (e.g., robotic arms, wireless home automation devices). In an effort to train the VJ-system, speech data from the TIMIT speech corpus was initially used. However, due to problematic issues with co-articulation, we began a large data collection effort in a controlled environment that would not only address the problematic issues, but also yield a new vowel corpus that was representative of the utterances a user of the VJ-system would use. The data collection process evolved over the course of the effort as new parameters were added and as factors relating to the quality of the collected data in terms of the specified parameters were considered. The result of the data collection effort is a vowel corpus of approximately 11 hours of recorded data comprised of approximately 23500 sound files of the monophthongs and vowel combinations (e.g. diphthongs) chosen for the Vocal Joystick project varying along the parameters of duration, intensity and amplitude. This paper discusses how the data collection has evolved since its initiation and provides a brief summary of the resulting corpus.

The Application of Natural Language Processing to Augmentative and Alternative Communication

Assistive Technology, 2012

Significant progress has been made in the application of natural language processing (NLP) to augmentative and alternative communication (AAC), particularly in the areas of interface design and word prediction. This article will survey the current state-of-the-science of NLP in AAC and discuss its future applications for the development of next generation of AAC technology.

The Vocal Joystick: Evaluation of voice-based cursor control techniques for assistive technology

Disability & Rehabilitation: Assistive Technology, 2008

Mouse control has become a crucial aspect of many modern day computer interactions. This poses a challenge for individuals with motor impairments or those whose use of hands are restricted due to situational constraints. We present a system called the Vocal Joystick which allows the user to continuously control the mouse cursor by varying vocal parameters such as vowel quality, loudness and pitch. A survey of existing cursor control methods is presented to highlight the key characteristics of the Vocal Joystick. Evaluations were conducted to characterize expert performance capability of the Vocal Joystick, and to compare novice user performance and preference for the Vocal Joystick and two other existing speech based cursor control methods. Our results show that Fitts' law is a good predictor of the speedaccuracy tradeoff for the Vocal Joystick, and suggests that the optimal performance of the Vocal Joystick may be comparable to that of a conventional hand-operated joystick. Novice user evaluations show that the Vocal Joystick can be used by people without extensive training, and that it presents a viable alternative to existing speech-based cursor control methods.

The Vocal Joystick

The Vocal Joystick is a novel human-computer interface mechanism designed to enable individuals with motor impairments to make use of vocal parameters to control objects on a computer screen (buttons, sliders, etc.) and ultimately electro-mechanical instruments (e.g., robotic arms, wireless home automation devices). We have developed a working prototype of our "VJ-engine" with which individuals can now control computer mouse movement with their voice. The core engine is currently optimized according to a number of criterion. In this paper, we describe the engine system design, engine optimization, and user-interface improvements, and outline some of the signal processing and pattern recognition modules that were successful. Lastly, we present new results comparing the vocal joystick with a state-of-the-art eye tracking pointing device, and show that not only is the Vocal Joystick already competitive, for some tasks it appears to be an improvement.

The Vocal Joystick: Evaluation of Voice-based Cursor Control Techniques

Mouse control has become a crucial aspect of many modern day computer interactions. This poses a challenge for individuals with motor impairments or those whose use of hands are restricted due to situational constraints. We present a system called the Vocal Joystick which allows the user to continuously control the mouse cursor by varying vocal parameters such as vowel quality, loudness and pitch. A survey of existing cursor control methods is presented to highlight the key characteristics of the Vocal Joystick. Evaluations were conducted to characterize expert performance capability of the Vocal Joystick, and to compare novice user performance and preference for the Vocal Joystick and two other existing speech based cursor control methods. Our results show that Fitts' law is a good predictor of the speedaccuracy tradeoff for the Vocal Joystick, and suggests that the optimal performance of the Vocal Joystick may be comparable to that of a conventional hand-operated joystick. Novice user evaluations show that the Vocal Joystick can be used by people without extensive training, and that it presents a viable alternative to existing speech-based cursor control methods.

Virtual keyboard with the prediction of words for children with cerebral palsy

Computer Methods and Programs in Biomedicine, 2020

One in every 200 people worldwide cannot express orally because of cognitive, motor, neurological, or emotional problems. Assistive technologies can help people with impairments to use computers to perform their daily life activities independently and to communicate with others. This paper presents a Hidden Markov Model-based word prediction method that allows keyboard emulation software to predict words so that children with disabilities can type texts more quickly. The proposed system involved the development of a keyboard emulator, the construction and processing of a corpus, as well as a word prediction algorithm. Children with different cognitive profiles had to produce a text and type it twice: first with free typing, second using the virtual keyboard's word prediction. Results indicated the word prediction of the keyboard emulator software reduced typing efforts. However, the software initially increased the typing time when the corpus was not well adapted to users. The total amount of clicks with word prediction decreased by around 26.2%. Regarding execution time using prediction, 61% typed the text in less time. The tests performed with literate volunteers indicated a reduction in the number of clicks by up to 51.3%. This result surpasses the 15% achieved in the previous study by Free Virtual Keyboard with word prediction based on pure statistics. Moreover, all volunteers required fewer clicks to perform the task. People with impairments, especially children, could use the system and demonstrate their knowledge and abilities. The entire system is available on the Internet and users have unrestricted and free access to it.

A self learning vocal interface for speech-impaired users

In this work we describe research aimed at developing an assistive vocal interface for users with a speech impairment. In contrast to existing approaches, the vocal interface is self-learning, which means it is maximally adapted to the end-user and can be used with any language, dialect, vocabulary and grammar. The paper describes the overall learning framework and the vocabulary acquisition technique, and proposes a novel grammar induction technique based on weakly supervised hidden Markov model learning. We evaluate early implementations of these vocabulary and grammar learning components on two datasets: recorded sessions of a vocally guided card game by non-impaired speakers and speech-impaired users engaging in a home automation task.

Speech and Language Processing for Multimodal Human-Computer Interaction

The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology, 2004

In this paper, we describe our recent work at Microsoft Research, in the project codenamed Dr. Who, aimed at the development of enabling technologies for speech-centric multimodal human-computer interaction. In particular, we present in detail MiPad as the first Dr. Who's application that addresses specifically the mobile user interaction scenario. MiPad is a wireless mobile PDA prototype that enables users to accomplish many common tasks using a multimodal spoken language interface and wireless-data technologies. It fully integrates continuous speech recognition and spoken language understanding, and provides a novel solution to the current prevailing problem of pecking with tiny styluses or typing on minuscule keyboards in today's PDAs or smart phones. Despite its current incomplete implementation, we have observed that speech and pen have the potential to significantly improve user experience in our user study reported in this paper. We describe in this system-oriented paper the main components of MiPad, with a focus on the robust speech processing and spoken language understanding aspects. The detailed MiPad components discussed include: distributed speech recognition considerations for the speech processing algorithm design; a stereo-based speech feature enhancement algorithm used for noise-robust front-end speech processing; Aurora2 evaluation results for this front-end processing; speech feature compression (source coding) and error protection (channel coding) for distributed speech recognition in MiPad; HMM-based acoustic modeling for 1 Various portions of this article have been presented at ICSLP