Wendy Lehnert - Academia.edu (original) (raw)

Papers by Wendy Lehnert

Research paper thumbnail of Cognition, Computers, and Car Bombs: How Yale Prepared Me for the 90's

From the East Side in midtown Manhattan, it was a brisk 20-minute walk going west to Eighth Avenu... more From the East Side in midtown Manhattan, it was a brisk 20-minute walk going west to Eighth Avenue, another 30 minutes going north on the A train to 182nd Street, and then a final 10-minute walk going east to get to the Belfer Graduate School of Science at Yeshiva University. I made that trip every day for two years as a graduate student in mathematics. The math department was housed in a modern high-rise that stood out among the older and less majestic buildings of Washington Heights. Within that seemingly secular structure, each office door frame was uniformly adorned with a small white plastic mezuzah, courtesy of the university.

Research paper thumbnail of Evaluating an Information Extraction System

Integrated Computer-Aided Engineering

Research paper thumbnail of The Process of Question Answering. Research Report No. 88

The cOmputationAl model of question answering proposed by a .lamputer program,,QUALM, is a theory... more The cOmputationAl model of question answering proposed by a .lamputer program,,QUALM, is a theory of conceptual information processing based 'bon models of, human memory organization. It has been developed from the perspective of' natural language processing in conjunction with story understanding systems. The p,ocesses in QUALM are divided into four phases: (1) conceptual categorization; (2) inferential analysis; (3) content specification; and (4) 'retrieval heuristict. QUALM provide a concrete criterion for judging the strengths and weaknesses'of store representations. As a theoretical model, QUALM is intended to describ general question answerinlg, where question antiering is viewed as a erbal communicb.tion device betieen people. (Author/KP) .

Research paper thumbnail of Symbolic/Subsymbolic Sentence Analysi: Exploiting the Best of Two Worlds

A NUMBER OF SEMANTICALLY-ORIENTED TECHNIQUES HAVE BEEN DEVISED OVER THE YEARS TO ADDRESS THE PROB... more A NUMBER OF SEMANTICALLY-ORIENTED TECHNIQUES HAVE BEEN DEVISED OVER THE YEARS TO ADDRESS THE PROBLEMS OF CONCEPTUAL SENTENCE ANALYSIS. WE HAVE IMPLEMENTED A NATURAL LANGUAGE SENTENCE ANALYZER, `CIRCUS'', WHICH INCORPOR- ATES A NUMBER OF WELL-KNOWN TECHNIQUES FROM THE SYMBOLIC INFORMATION PRO- CESSING TRADITION ALONG WITH ORIGINAL TECHNIQUES BASED ON NUMERICAL RELAXA- TION. OUR BASIC SYSTEM ARCHITECTURE SUPPORTS A STACK-CONTROLLED MECHANISM FOR MANAGING SYNTACTIC PREDITIONS, AS WELL AS MODULES FOR HANDLING TWO FUNDAMENTALLY DISTINCT TYPES OF SEMANTIC PREFERENCES: PREDICTIVE SEMANTICS AND DATA-DRIVEN SEMANTICS. A MARKER PASSING ALGORITHM IS USED FOR PREDIC- TIVE SEMANTICS, AND NUMERICAL RELAXATION IS USED FOR DATA-DRIVEN SEMANTICS. THIS PAPER PROVIDES A GENERAL INTRODUCTION TO `CIRCUS'', THE OPPORTUNITIES FOR DIFFERENT KINDS OF MEMORY INTERACTIONS WITH `CIRCUS'', AND DETAILED (BUT NOT TECHNICAL) DESCRIPTIONS OF OUR MARKER PASSING AND NUMERICAL RELAXATION ALGORITHMS. `CIRCUS'' IS CURRENTLY RUNNING UNDER COMMON LISP ON THE TI EXPLORER.

Research paper thumbnail of An evaluation of text analysis technologies

Research paper thumbnail of Description of the UMass Systems as Used for MUC-6

Research paper thumbnail of The process of question-answering

Abstract : Problems in computational question answering assume a new perspective when question an... more Abstract : Problems in computational question answering assume a new perspective when question answering is viewed as a problem in natural language processing. A theory of question answering has been proposed which relies on ideas in conceptual information processing and theories of human memory organization. This theory of question answering has been implemented in a computer program, QUALM, currently being used by two story understanding systems to complete a natural language processing system which reads stories and answers questions about what was read. The processes in QUALM are divided into 4 phases: (1) Conceptual categorization which guides subsequent processing by dictating which specific inference mechanisms and memory retrieval strategies should be invoked in the course of answering a question; (2) Inferential analysis which is responsible for understanding what the questioner really meant when a question should not be taken literally; (3) Content specification which determines how much of an answer should be returned in terms of detail and elaborations, and (4) Retrieval heuristics which do the actual digging to extract an answer from memory.

Research paper thumbnail of Understanding and representation of text

Research paper thumbnail of A Performance Evaluation of Text Analysis Technologies-AI Magazine

Ai Magazine, Sep 15, 1991

Evaluation has become an important and pressing concern for researchers in AI. We need to reassur... more Evaluation has become an important and pressing concern for researchers in AI. We need to reassure ourselves and our program managers that progress is taking place, and that our technology is indeed advancing according to reasonable metrics and assessments. The difficulties with evaluating AI systems are substantial and, to some extent, idiosyncratic, depending on the area of specialization. In an effort to evaluate state-of-the-art natural language processing systems, the Naval Ocean Systems Center (NOSC) has conducted three evaluations of English text analyzers during the last five years. This report describes the most recent and most sophisticated of these evaluations, the Third Message Understanding Conference (MUC-3) 1 This evaluation was sponsored by the Defense Research Projects Agency (DARPA), which plays a key role in sponsoring evaluations for other types of language interpretation systems, including performance evaluations for speech recognition carried out by the National Institute of Standards and Technology (Pallett 1990). Background and History

Research paper thumbnail of Internet 101: A Beginner's Guide to the Internet and the World Wide Web

Research paper thumbnail of Narrative Complexity Based on Summarization Algorithms

Narrative structures can only be defined in terms of some internal memory representation, but nar... more Narrative structures can only be defined in terms of some internal memory representation, but narrative complexity is more properly characterized by information processing requirements. Story grammars, plan and goal hierarchies, and causal chain representations all provide a sense of structure which is largely removed from the processes that produce or access that memory representation. In this paper we introduce the notion of algorithmic equivalence as a means of generating more algorithmically-oriented taxonomies for memory representations. Using memory representations based on plot units, we define two narratives to be algorithmically equivalent if they can be effectively summarized by the same retrieval process. This perspective on representational strategies is an especially natural one from a processing point of view, since the computational complexity of a particular information processing task must be measured in terms of the algorithms involved .

Research paper thumbnail of Narrative Text Summarization

In order to summarize a story it is necessary to access a high level analysis that highlights the... more In order to summarize a story it is necessary to access a high level analysis that highlights the story's central concepts. A technique of memory representation based on affect units appears to provide the necessary foundation for such an analysis. Affect units are conceptual structures that overlap with each other when a narrative is cohesive. When overlapping intersections are interpreted as arcs in a graph of affect units, the resulting graph encodes the plot of the story. Structural features of the graph then reveal which concepts are central to the story. Affect unit analysis is currently being investigated as a processing strategy for narrative summarization.

Research paper thumbnail of The role of object primitives in natural language processing

Research paper thumbnail of CRYSTAL: Inducing a conceptual dictionary

One of the central knowledge sources of an in formation extraction (IE) system IS a dictionary of... more One of the central knowledge sources of an in formation extraction (IE) system IS a dictionary of linguistic patterns that can be used to identify references to relevant information in a text Automatic creation of conceptual dictionaries is important for portability and scalability of an IE system This paper de scribes CRYSTAL, a system which automat ically induces a dictionary of "concept-node definitions" sufficient to identify relevant in formation from a training corpus Each of these concept-node definitions is generalized as far as possible without producing errors, so that a minimum number of dictionary entries cover the positive training instances Because it tests the accuracy of each proposed definition, CRYSTAL can often surpass human intuitions in creating reliable extraction rules

Research paper thumbnail of Automatically Extracting Information in Clinical Free Text Can Make Available an Information Resource That is Largely Untapped

Research paper thumbnail of Issues in inductive learning of domain-specific text extraction rules

Lecture Notes in Computer Science, 1996

Research paper thumbnail of Automated dictionary construction for information extraction from text

Proceedings of 9th IEEE Conference on Artificial Intelligence for Applications, 1993

Knowledge-based natural language processing systems have achieved good success with certain tasks... more Knowledge-based natural language processing systems have achieved good success with certain tasks but they are often criticized because they depend on a domain-specific dictionary that requires a great deal of manual knowledge engineering. This knowledge engineering bottleneck makes knowledge-based NLP systems impractical for real-world applications because they cannot be easily scaled up or ported to new domains. In response to this problem, we developed a system called AutoSlog that automatically builds a domain-specific dictionary of concepts for extracting information from text. Using AutoSlog, we constructed a dictionary for the domain of terrorist event descriptions in only 5 person-hours. We then compared the AutoSlog dictionary with a hand-crafted dictionary that was built by two highly skilled graduate students and required approximately 1500 person-hours of effort. We evaluated the two dictionaries using two blind test sets of 100 texts each. Overall, the AutoSlog dictionary achieved 98% of the performance of the hand-crafted dictionary. On the first test set, the Auto-Slog dictionary obtained 96.3% of the performance of the hand-crafted dictionary. On the second test set, the overall scores were virtually indistinguishable with the AutoSlog dictionary achieving 99.7% of the performance of the handcrafted dictionary.

Research paper thumbnail of Augmenting relevancy signatures with slot filler data

Proceedings of the workshop on Speech and Natural Language - HLT '91, 1992

Research paper thumbnail of UMass/Hughes TIPSTER project on extraction from text

Proceedings of the workshop on Human Language Technology - HLT '93, 1993

Research paper thumbnail of A Conceptual Theory of Question Answering

Research paper thumbnail of Cognition, Computers, and Car Bombs: How Yale Prepared Me for the 90's

From the East Side in midtown Manhattan, it was a brisk 20-minute walk going west to Eighth Avenu... more From the East Side in midtown Manhattan, it was a brisk 20-minute walk going west to Eighth Avenue, another 30 minutes going north on the A train to 182nd Street, and then a final 10-minute walk going east to get to the Belfer Graduate School of Science at Yeshiva University. I made that trip every day for two years as a graduate student in mathematics. The math department was housed in a modern high-rise that stood out among the older and less majestic buildings of Washington Heights. Within that seemingly secular structure, each office door frame was uniformly adorned with a small white plastic mezuzah, courtesy of the university.

Research paper thumbnail of Evaluating an Information Extraction System

Integrated Computer-Aided Engineering

Research paper thumbnail of The Process of Question Answering. Research Report No. 88

The cOmputationAl model of question answering proposed by a .lamputer program,,QUALM, is a theory... more The cOmputationAl model of question answering proposed by a .lamputer program,,QUALM, is a theory of conceptual information processing based 'bon models of, human memory organization. It has been developed from the perspective of' natural language processing in conjunction with story understanding systems. The p,ocesses in QUALM are divided into four phases: (1) conceptual categorization; (2) inferential analysis; (3) content specification; and (4) 'retrieval heuristict. QUALM provide a concrete criterion for judging the strengths and weaknesses'of store representations. As a theoretical model, QUALM is intended to describ general question answerinlg, where question antiering is viewed as a erbal communicb.tion device betieen people. (Author/KP) .

Research paper thumbnail of Symbolic/Subsymbolic Sentence Analysi: Exploiting the Best of Two Worlds

A NUMBER OF SEMANTICALLY-ORIENTED TECHNIQUES HAVE BEEN DEVISED OVER THE YEARS TO ADDRESS THE PROB... more A NUMBER OF SEMANTICALLY-ORIENTED TECHNIQUES HAVE BEEN DEVISED OVER THE YEARS TO ADDRESS THE PROBLEMS OF CONCEPTUAL SENTENCE ANALYSIS. WE HAVE IMPLEMENTED A NATURAL LANGUAGE SENTENCE ANALYZER, `CIRCUS'', WHICH INCORPOR- ATES A NUMBER OF WELL-KNOWN TECHNIQUES FROM THE SYMBOLIC INFORMATION PRO- CESSING TRADITION ALONG WITH ORIGINAL TECHNIQUES BASED ON NUMERICAL RELAXA- TION. OUR BASIC SYSTEM ARCHITECTURE SUPPORTS A STACK-CONTROLLED MECHANISM FOR MANAGING SYNTACTIC PREDITIONS, AS WELL AS MODULES FOR HANDLING TWO FUNDAMENTALLY DISTINCT TYPES OF SEMANTIC PREFERENCES: PREDICTIVE SEMANTICS AND DATA-DRIVEN SEMANTICS. A MARKER PASSING ALGORITHM IS USED FOR PREDIC- TIVE SEMANTICS, AND NUMERICAL RELAXATION IS USED FOR DATA-DRIVEN SEMANTICS. THIS PAPER PROVIDES A GENERAL INTRODUCTION TO `CIRCUS'', THE OPPORTUNITIES FOR DIFFERENT KINDS OF MEMORY INTERACTIONS WITH `CIRCUS'', AND DETAILED (BUT NOT TECHNICAL) DESCRIPTIONS OF OUR MARKER PASSING AND NUMERICAL RELAXATION ALGORITHMS. `CIRCUS'' IS CURRENTLY RUNNING UNDER COMMON LISP ON THE TI EXPLORER.

Research paper thumbnail of An evaluation of text analysis technologies

Research paper thumbnail of Description of the UMass Systems as Used for MUC-6

Research paper thumbnail of The process of question-answering

Abstract : Problems in computational question answering assume a new perspective when question an... more Abstract : Problems in computational question answering assume a new perspective when question answering is viewed as a problem in natural language processing. A theory of question answering has been proposed which relies on ideas in conceptual information processing and theories of human memory organization. This theory of question answering has been implemented in a computer program, QUALM, currently being used by two story understanding systems to complete a natural language processing system which reads stories and answers questions about what was read. The processes in QUALM are divided into 4 phases: (1) Conceptual categorization which guides subsequent processing by dictating which specific inference mechanisms and memory retrieval strategies should be invoked in the course of answering a question; (2) Inferential analysis which is responsible for understanding what the questioner really meant when a question should not be taken literally; (3) Content specification which determines how much of an answer should be returned in terms of detail and elaborations, and (4) Retrieval heuristics which do the actual digging to extract an answer from memory.

Research paper thumbnail of Understanding and representation of text

Research paper thumbnail of A Performance Evaluation of Text Analysis Technologies-AI Magazine

Ai Magazine, Sep 15, 1991

Evaluation has become an important and pressing concern for researchers in AI. We need to reassur... more Evaluation has become an important and pressing concern for researchers in AI. We need to reassure ourselves and our program managers that progress is taking place, and that our technology is indeed advancing according to reasonable metrics and assessments. The difficulties with evaluating AI systems are substantial and, to some extent, idiosyncratic, depending on the area of specialization. In an effort to evaluate state-of-the-art natural language processing systems, the Naval Ocean Systems Center (NOSC) has conducted three evaluations of English text analyzers during the last five years. This report describes the most recent and most sophisticated of these evaluations, the Third Message Understanding Conference (MUC-3) 1 This evaluation was sponsored by the Defense Research Projects Agency (DARPA), which plays a key role in sponsoring evaluations for other types of language interpretation systems, including performance evaluations for speech recognition carried out by the National Institute of Standards and Technology (Pallett 1990). Background and History

Research paper thumbnail of Internet 101: A Beginner's Guide to the Internet and the World Wide Web

Research paper thumbnail of Narrative Complexity Based on Summarization Algorithms

Narrative structures can only be defined in terms of some internal memory representation, but nar... more Narrative structures can only be defined in terms of some internal memory representation, but narrative complexity is more properly characterized by information processing requirements. Story grammars, plan and goal hierarchies, and causal chain representations all provide a sense of structure which is largely removed from the processes that produce or access that memory representation. In this paper we introduce the notion of algorithmic equivalence as a means of generating more algorithmically-oriented taxonomies for memory representations. Using memory representations based on plot units, we define two narratives to be algorithmically equivalent if they can be effectively summarized by the same retrieval process. This perspective on representational strategies is an especially natural one from a processing point of view, since the computational complexity of a particular information processing task must be measured in terms of the algorithms involved .

Research paper thumbnail of Narrative Text Summarization

In order to summarize a story it is necessary to access a high level analysis that highlights the... more In order to summarize a story it is necessary to access a high level analysis that highlights the story's central concepts. A technique of memory representation based on affect units appears to provide the necessary foundation for such an analysis. Affect units are conceptual structures that overlap with each other when a narrative is cohesive. When overlapping intersections are interpreted as arcs in a graph of affect units, the resulting graph encodes the plot of the story. Structural features of the graph then reveal which concepts are central to the story. Affect unit analysis is currently being investigated as a processing strategy for narrative summarization.

Research paper thumbnail of The role of object primitives in natural language processing

Research paper thumbnail of CRYSTAL: Inducing a conceptual dictionary

One of the central knowledge sources of an in formation extraction (IE) system IS a dictionary of... more One of the central knowledge sources of an in formation extraction (IE) system IS a dictionary of linguistic patterns that can be used to identify references to relevant information in a text Automatic creation of conceptual dictionaries is important for portability and scalability of an IE system This paper de scribes CRYSTAL, a system which automat ically induces a dictionary of "concept-node definitions" sufficient to identify relevant in formation from a training corpus Each of these concept-node definitions is generalized as far as possible without producing errors, so that a minimum number of dictionary entries cover the positive training instances Because it tests the accuracy of each proposed definition, CRYSTAL can often surpass human intuitions in creating reliable extraction rules

Research paper thumbnail of Automatically Extracting Information in Clinical Free Text Can Make Available an Information Resource That is Largely Untapped

Research paper thumbnail of Issues in inductive learning of domain-specific text extraction rules

Lecture Notes in Computer Science, 1996

Research paper thumbnail of Automated dictionary construction for information extraction from text

Proceedings of 9th IEEE Conference on Artificial Intelligence for Applications, 1993

Knowledge-based natural language processing systems have achieved good success with certain tasks... more Knowledge-based natural language processing systems have achieved good success with certain tasks but they are often criticized because they depend on a domain-specific dictionary that requires a great deal of manual knowledge engineering. This knowledge engineering bottleneck makes knowledge-based NLP systems impractical for real-world applications because they cannot be easily scaled up or ported to new domains. In response to this problem, we developed a system called AutoSlog that automatically builds a domain-specific dictionary of concepts for extracting information from text. Using AutoSlog, we constructed a dictionary for the domain of terrorist event descriptions in only 5 person-hours. We then compared the AutoSlog dictionary with a hand-crafted dictionary that was built by two highly skilled graduate students and required approximately 1500 person-hours of effort. We evaluated the two dictionaries using two blind test sets of 100 texts each. Overall, the AutoSlog dictionary achieved 98% of the performance of the hand-crafted dictionary. On the first test set, the Auto-Slog dictionary obtained 96.3% of the performance of the hand-crafted dictionary. On the second test set, the overall scores were virtually indistinguishable with the AutoSlog dictionary achieving 99.7% of the performance of the handcrafted dictionary.

Research paper thumbnail of Augmenting relevancy signatures with slot filler data

Proceedings of the workshop on Speech and Natural Language - HLT '91, 1992

Research paper thumbnail of UMass/Hughes TIPSTER project on extraction from text

Proceedings of the workshop on Human Language Technology - HLT '93, 1993

Research paper thumbnail of A Conceptual Theory of Question Answering