Formalizing Expert Knowledge for Developing Accurate Speech Recognizers (original) (raw)
Related papers
An expert system for mapping acoustic cues into phonetic features
Information Sciences, 1984
The paper describes the conception of the sublexical levels of a speech-understanding system as a society of experts. Experts cooperate in extracting and describing acoustic cues, generating and verifying phonetic hypotheses, and accessing a large lexicon. The knowledge of each expert is described by a frame language which allows integration between structural and procedural knowledge. Structural knowledge deals with relations between facts like acoustic-cue descriptions and phonetic-feature hypotheses. Procedural knowledge deals with rules for the use of relations, for the generation of contextual constraints for relation application, and for the extraction of new cues in specified contexts. The main purpose of the research proposed here is that of providing at the same time a model for computer perception and algorithms useful for designing complex systems operating in real time. Some experimental results on the performance of the proposed system are reported.
Empowering Knowledge Based Speech Understanding through Statistics
In this paper we present an innovative approach to speech understanding which is based on a fine--grained knowledge representation automatically compiled from a semantic network and on iterative optimization. Besides allowing an efficient exploitation of parallelism, any--time capability is provided since after each iteration step a (sub--)optimal solution is always available. We apply this approach to a real--world task, which is a dialog system able to answer queries about the German train timetable. In order to speed up the search for the best interpretation of an utterance we make use of statistical methods, e.g. neural networks, n--grams, and classification trees, which are trained on application relevant utterances collected over the public telephone network. At the moment the real--time factor for interpreting the initial user's utterance is 0.7. 1. INTRODUCTION In order to make use of automatic speech understanding systems in real--world applications, those systems have ...
Using Dialog-Level Knowledge Sources to Improve Speech Recognition
We motivate and describe an implementation of the MINDS speech recognition system. MINDS uses knowledge of dialog structures, user goals and focus in a problem solving situation. The knowledge is combined to form predictions which translate into dynamically generated semantic network grammars. An experiment evaluated recognition accuracy given different levels of knowledge as constraints. Our results show that speech recognition accuracy improves dramatically, when the maximally constrained dynamic network grammar is used to process the speech input signal. Length of Article is about 3812 words and 2 tables. Topic is Perception and Signal Understanding (Speech Recognition) also Natural Language (Understanding) We wish to acknowledge Ed Smith and Philip Werner. This research would not have been possible without their assistance.
Learning and Plan Refinement in a Knowledge-Based System for Automatic Speech Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1987
This paper shows how a semiautomatic design of a speech recognition system can be done as a planning activity. Recognition performances are used for deciding plan refinement. Inductive learning is performed for setting action preconditions. Experimental results in the recognition of connected letters spoken by 100 speakers are presented. Index Terms-Automatic speech recognition, connected speech recognition, inductive learning, letter recognition, network of actions, planning, stochastic rules.
SPREX: Speech recognition expert
The Journal of the Acoustical Society of America, 1988
situation, context, and language in the form of constraints to reduce exponential search and impove accuracy. This work will provide several examples from the areas of speech, image understanding and target identification, music analysis and recognition, and sensor-based diagnosis in expert systems where symbolic knowledge plays an important role in the interpretation of signals.
Learning from the experience of building automatic speech recognition systems
1996
To a language engineer constructing a human-computer interface, 'performance' is the only important measure of success. It can be measured in terms of words communicated or transaction time or ease-of-use; but in all cases it dominates other measures considered imperative in linguistic science: parsimony, elegance, universality, learnability, or psychological reality. Whereas a linguist is interested in the underlying structure of language that explains how meaning is encoded, or why only certain sentence structures occur, the engineer is only interested in successful communication. To him or her, the underlying structure of language is only interesting to the extent that it makes communication more accurate and more reliable. If the regularities and constraints in language can be captured and expressed in a mathematical formalism that can be exploited to make communication effective, then it doesn't matter that such a formalism 'allows' all kinds of unknown phenomena to occur, or is beyond human cognition. On the other hand, the engineer has a weaker sense of 'understanding' than the linguist, and apart from some primitive language acquisition systems [Gorin, 1995] is content to communicate word strings.
Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction
The prolificacy of human-robot interaction not only depends on a robot's ability to understand the intent and content of the human utterance but also gets impacted by the automatic speech recognition (ASR) system. Modern ASR can provide highly accurate (grammatically and syntactically) translation. Yet, the general purpose ASR often misses out on the semantics of the translation by incorrect word prediction due to open-vocabulary modeling. ASR inaccuracy can have significant repercussions as this can lead to a completely different action by the robot in the real world. Can any prior knowledge be helpful in such a scenario? In this work, we explore how prior knowledge can be utilized in ASR decoding. Using our experiments, we demonstrate how our system can significantly improve ASR translation for robotic task instruction. CCS CONCEPTS • Computing methodologies → Speech recognition; Knowledge representation and reasoning.
IEEE Transactions on Computers, 2007
This paper introduces a general framework for incorporating additional sources of knowledge into an HMM-based statistical acoustic model. Since the knowledge sources are often derived from different domains, it may be difficult to formulate a probabilistic function of the model without learning the causal dependencies between the sources. We utilized a Bayesian network framework to solve this problem. The advantages of this graphical model framework are 1) it allows the probabilistic relationship between information sources to be learned and 2) it facilitates the decomposition of the joint probability density function (PDF) into a linked set of local conditional PDFs. This way, a simplified form of the model can be constructed and reliably estimated using a limited amount of training data. We applied this framework to the problem of incorporating wide-phonetic knowledge information, which often suffers from a sparsity of data and memory constraints. We evaluated how well the proposed method performed on an large-vocabulary continuous speech recognition (LVCSR) task using English speech data that contained two different types of accents. The experimental results revealed that it improved the word accuracy with respect to standard HMM, with or without additional sources of knowledge.
A Knowledge-‐Based Architecture for using Semantics in Automatic Speech Recognition
Recognizing speech can be a challenge even for people. For instance, understanding speech in a noisy room can be very difficult. Yet, people can recognize speech even in adverse environments remarkably well, perhaps using other clues, such as context, body language, and our own expectation of the speaker's intention and meaning. Automatic speech recognition (ASR) systems typically do not have access to any of these additional clues.