Speech processing tools-An introduction to interoperability (original) (raw)

Speech technology integration and research platform: A system study

Fifth European Conference on Speech …, 1997

We present a generic speech technology integration platform for application development and research across different domains. The goal of the design is two-fold: On the application development side, the system provides an intuitive developer's interface defined by a high level application definition language and a set of convenient speech application building tools. It allows a novice developer to rapidly deploy and modify a spoken language dialogue application. On the system research and development side, the system uses a thin, 'broker' layer to separate the system application programming interface from the service provider interface. It makes the system easy to incorporate new technologies and new functional components. We also use a domain independent acoustic model set to cover US English phones for general speech applications. The system grammar and lexicon engine creates grammars and lexicon dictionaries on the fly to enable a practically unrestricted vocabulary for many recognition and synthesis applications.

SPA: Web-based Platform for easy Access to Speech Processing Modules

2016

This paper presents SPA, a web-based Speech Analytics platform that integrates several speech processing modules and that makes it possible to use them through the web. It was developed with the aim of facilitating the usage of the modules, without the need to know about software dependencies and specific configurations. Apart from being accessed by a web-browser, the platform also provides a REST API for easy integration with other applications. The platform is flexible, scalable, provides authentication for access restrictions, and was developed taking into consideration the time and effort of providing new services. The platform is still being improved, but it already integrates a considerable number of audio and text processing modules, including: Automatic transcription, speech disfluency classification, emotion detection, dialog act recognition, age and gender classification, non-nativeness detection, hyper-articulation detection, dialog act recognition, and two external modul...

Multi-lingual interoperability in speech technology

Speech Communication, 2001

Single copies of this publication or of a part of it may be made for individual use only. The approval of the RTA Information Policy Executive is required for more than one copy to be made or an extract included in another publication. Requests to do so should be sent to the address above.

Bringing together tools and resources for speech sciences

Subsidia: Tools and resources for speech sciences, 2019

Researchers in speech sciences develop materials and instruments to conduct their studies, to automatize analyses and tasks, and to provide utilities applicable to different fields. This paper and the book that it introduces present a panorama of the current interests in said fields, and seek to provide an interdisciplinary perspective in order to bootstrap the creation of more and better tools and resources relevant to those areas.

Universal speech tools: The CSLU toolkit

Proceedings of the …, 1998

A set of freely available, universal speech tools is needed to accelerate progress in the speech technology. The CSLU Toolkit represents an effort to make the core technology and fundamental infrastructure accessible, affordable and easy to use. The CSLU Toolkit has been under development for five years. This paper describes recent improvements, additions and uses of the CSLU Toolkit.

Plug and Play Software for Designing High-Level Speech Processing Systems

1998

Software engineering for research and development in the area of signal processing is by no means unimportant. For speech processing, in particular, it should be a priority: given the intrinsic complexity of text-to-speech or recognition systems, there is little hope to do state-of-the-art research without solid and extensible code. This paper describes a simple and efficient methodology for the design of maximally reusable and extensible software components for speech and signal processing. The resulting programming ...

SpeechActs: a spoken-language framework

Computer, 1996

S peechActs is a prototype testbed for developing spoken natural language applications. In developing SpeechActs, our primary goal was to enable software developers without special expertise in speech or natural language to create effective conversational speech applications-that is, applications with which users can speak naturally, as if they were conversing with a personal assistant. We also wanted SpeechActs applications to work with one another without requiring that each have specific knowledge of other applications running in the same suite. For example, if someone talks about "Tom Jones" in one application and then mentions "Tom" later in the conversation while in another application, that second application should know that the user means Tom Jones and not some other Tom. A discourse management component is necessary to embody the information that allows such a natural conversational flow. Because technology changes so rapidly, we also did not want to tie developers to specific speech recognizers or synthesizers. We wanted them to be able to use these speech technologies as plug-in components. These constraints-integrated conversational applications, no specialized language expertise, and technology independence-led us to a minimalist, modular approach to grammar development, discourse management, and natural language understanding. This approach contrasts with those taken by other researchers working on spoken-dialogue systems. We believe we have achieved a degree of conversational naturalness similar to that of the outstanding Air Traffic Information Systems dialogues, 1-3 and we have done so with simpler natural language techniques. At the same time, SpeechActs applications are unique in their level of speech technology independence. Currently, SpeechActs supports a handful of speech recognizers: BBN's Hark, 4 Texas Instruments' Dagger, 5 and Nuance Communications' recognizers 6 (derived from SRI's Decipher). These recognizers are all continuous-they accept normally spoken speech with no artificial pauses between words-and speaker-independent-they require no training by individual users. For output, the framework provides text-to-speech support for Centigram's TruVoice and AT&T's TrueTalk. The system's architecture makes it straightforward to add new recognizers and synthesizers to the existing set. Like several other research systems, SpeechActs supports multiple, integrated applications. For example, the Chatter system developed at the Massachusetts Institute of Technology 7 offers e-mail reading, voice mail access, telephone dialing, Rolodex access, and user-activity information. Chatter, however, was created as a single, monolithic system. Applications in a monolithic system can be very tightly integrated, allowing the user's conversation to flow seamlessly from one activity to another (Chatter: "Your next message is from Lisa." User: "What's her address?").

MUSE: A SCRIPTING LANGUAGE FOR THE DEVELOPMENT OF INTERACTIVE SPEECH ANALYSIS AND RECOGNITION TOOLS1

1997

Speech research is a complex endeavor, as reflected in the numerous tools and specialized languages the modern researcher needs to learn. These tools, while adequate for what they have been designed for, are difficult to customize or extend in new directions, even though this is often required. We feel this situation can be improved and propose a new scripting language, MUSE, designed explicitly for speech research, in order to facilitate exploration of new ideas. MUSE is designed to support many modes of research from interactive speech analysis through compute-intensive speech understanding systems, and has facilities for automating some of the more difficult requirements of speech tools: user interactivity, distributed computation, and caching. In this paper we describe the design of the MUSE language and our current prototype MUSE interpreter.

Annotation in the SpeechDat projects

2001

A large set of spoken language resources (SLR) for various European languages is being compiled in several SpeechDat projects with the aim to train and test speech recognizers for voice driven services, mainly over telephone lines. This paper is focused on the annotation conventions applied for the Speechdat SLR. These SLR contain typical examples of short monologue speech utterances with simple orthographic transcriptions in a hierarchically simple annotation structure. The annotation conventions and their underlying principles are described and compared to approaches used for related SLR. The synchronization of the orthographic transcriptions with the corresponding speech files is addressed, and the impact of the selected approach for capturing specific phonological and phonetic phenomena is discussed. In the SpeechDat projects a number of tools have been developed to carry out the transcription of the speech. In this paper, a short description of these tools and their properties is provided. For all SpeechDat projects, an internal validity check of the databases and their annotations is carried out. The procedure of this validation campaign, the performed evaluations, and some of the results are presented.

Speech processing tools-An introduction to interoperability (original) (raw)

Related papers