SGStudio: rapid semantic grammar development for spoken language understanding (original) (raw)


Abstract Grammar-based approaches to spoken language understanding are utilized to a great extent in industry, particularly when developers are confronted with data sparsity. In order to ensure wide grammar coverage, developers typically modify their grammars in an iterative process of deploying the application, collecting and transcribing user utterances, and adjusting the grammar. In this paper, we explore enhancing this iterative process by leveraging active learning with back-off grammars.

When dialogue system developers tackle a new domain, much ef- fort is required; the development of different parts of the system usually proceeds independently. Yet it may be profitable to coor- dinate development efforts between different modules. Here, we focus our efforts on extending small amounts of language model training data by integrating semantic classes that were created for a natural language understanding module. By converting finite state parses of a training corpus into a probabilistic context free grammar and subsequently generating artificial data from the con- text free grammar, we can significantly reduce perplexity and ASR word error for situations with little training data. Experiments are presented using data from the ATIS and DARPA Communicator travel corpora.

An early release software product for the rapid development of spoken dialog systems (SDS’s), known as Lyrebird™ [1][2][3], will be demonstrated that makes use of grammatical inference to build natural language, mixed initiative, speech recognition applications. The demonstration will consist of the presenter developing a spoken dialog system using Lyrebird™, and will include a demonstration of some features that are still in the prototype phase.