Donald Hindle - Academia.edu (original) (raw)

Papers by Donald Hindle

Research paper thumbnail of Spoken content-based audio navigation (SCAN)

We describe SCAN, a system for retrieving and browsing speech documents from large audio corpora ... more We describe SCAN, a system for retrieving and browsing speech documents from large audio corpora that uses new information retrieval and speech processing techniques to create easily navigable presentations of documents relevant to a user query. Experiments show that the new interface is more effective than simple speechalone interfaces.

Research paper thumbnail of Using statistics in lexical analysis

There are similar problems in Information Retrieval . Keyword systems work best when there are on... more There are similar problems in Information Retrieval . Keyword systems work best when there are only a few dozen hits. But unfortunately, it is very easy for a user to select a keyword like food, and be buried under thousands of documents. There ought to be a set of tools that make it easier for a user to cope with a very large set of documents. In particular, the user should be able to ask the system to suggest a set of candidate keywords that would help disambiguate among the various senses of food, so that he can quickly focus on the sense that he 1 is interested in.

Research paper thumbnail of Parsing, word associations and typical predicate-argument relations

There are a number of coUocational constraints in natural languages that ought to play a more imp... more There are a number of coUocational constraints in natural languages that ought to play a more important role in natural language parsers. Thus, for example, it is hard for most parsers to take advantage of the fact that wine is typically drunk, produced, and sold, but (probably) not pruned. So too, it is hard for a parser to know which verbs go with which prepositions (e.g., set up) and which nouns fit together to form compound noun phrases (e.g., computer programmer). This paper will attempt to show that many of these types of concerns can be addressed with syntactic methods (symbol pushing), and need not require explicit semantic interpretation. We have found that it is possible to identify many of these interesting co-occurrence relations by computing simple summary statistics over millions of words of text. This paper will summarize a number of experiments carried out by various subsets of the authors over the last few years. The term collocation will be used quite broadly to include constraints on SVO (subject verb object) triples, phrasal verbs, compound noun phrases, and psycholinguistic notions of word association (e.g., doctor~nurse).

Research paper thumbnail of Spoken Content-Based Audio Navigation (SCAN

We describe SCAN, a system for retrieving and browsing speech documents from large audio corpora ... more We describe SCAN, a system for retrieving and browsing speech documents from large audio corpora that uses new information retrieval and speech processing techniques to create easily navigable presentations of documents relevant to a user query. Experiments show that the new interface is more effective than simple speechalone interfaces.

Research paper thumbnail of SCANMail: browsing and searching speech data by content

Increasing amounts of public, corporate, and private audio data are available for use, but limite... more Increasing amounts of public, corporate, and private audio data are available for use, but limited in usefulness by the lack of tools to permit their browsing and search. In this paper, we describe SCANMail, a system that employs automatic speech recognition, information retrieval, information extraction, and human computer interaction technology to permit users to browse and search their voicemail messages by content through a graphical user interface interface. The SCANMail client also provides note-taking capabilities as well as browsing and querying features. A CallerId server also proposes caller names from existing caller acoustic models and is trained from user feedback. An Email server sends the original message plus its transcription to a mailing address specified in the user's profile.

Research paper thumbnail of Structural Ambiguity and Lexical Relations

Computational Linguistics, 1993

We propose that ambiguous prepositional phrase attachment can be resolved on the basis of the rel... more We propose that ambiguous prepositional phrase attachment can be resolved on the basis of the relative strength of association of the preposition with noun and verb, estimated on the basis of word distribution in a large corpus. This work suggests that a distributional approach can be effective in resolving parsing problems that apparently call for complex reasoning.

Research paper thumbnail of SCANMail: audio navigation in the voicemail domain

This paper describes SCANMail, a system that allows users to browse and search their voicemail me... more This paper describes SCANMail, a system that allows users to browse and search their voicemail messages by content through a GUI. Content based navigation is realized by use of automatic speech recognition, information retrieval, information extraction and human computer interaction technology. In addition to the browsing and querying functionalities, acoustics-based caller ID technology is used to proposes caller names from existing caller acoustic models trained from user feedback. The GUI browser also provides a note-taking capability. Comparing SCANMail to a regular voicemail interface in a user study, SCANMail performed better both in terms of objective (time to and quality of solutions) as well as subjective objectives.

Research paper thumbnail of SCAN: designing and evaluating user interfaces to support retrieval from speech archives

Previous examinations of search in textual archives have assumed that users first retrieve a rank... more Previous examinations of search in textual archives have assumed that users first retrieve a ranked set of documents relevant to their query, and then visually scan through these documents, to identify the information they seek. While document scanning is possible in text, it is much more laborious in speech archives, due to the inherently serial nature of speech. Yet, in developing tools for speech access, little attention has so far been paid to users' problems in scanning and extracting information from within "speech documents".

Research paper thumbnail of The AT&t 60, 000 word speech-to-text system

Research paper thumbnail of AT&T at TREC7 SDR Track

AT&T participated in the Spoken Document Retrieval (SDR) track of TREC-7. Our speech retrieval sy... more AT&T participated in the Spoken Document Retrieval (SDR) track of TREC-7. Our speech retrieval system uses modern Information Retrieval (IR) methods in conjunction with in-house automatic speech recognition. The novel feature of our TREC-7 work is the use of document expansion to reduce the performance loss due to ASR errors. Results show that retrieval from automatic transcriptions of speech is quite competitive with doing retrieval from human transcriptions. Our experiments indicate that document expansion can be used to further improve retrieval from automatic transcripts. This paper presents some analysis of document expansion in context of the TREC-7 SDR track task.

Research paper thumbnail of AT&T at TREC7

This year AT&T participated in the ad-hoc task and the Filtering, SDR, an... more This year AT&T participated in the ad-hoc task and the Filtering, SDR, and VLC tracks. Most ofour effort for TREC-7 was concentrated on SDR and VLC tracks. On the filtering track, we tested apreliminary version of a text classification toolkit that we have been developing over the last year. Inthe ad-hoc task, we introduce a new tf-factor in our term weighting scheme and use a simplified retrievalalgorithm. The same weighting scheme and algorithm are used in the SDR and the VLC tracks.The...

Research paper thumbnail of AT&T at TREC6: SDR Track

AT&T participated in ... more AT&T participated in the Spoken Document Retrieval (SDR) track of TREC-7.Our speech retrieval system uses modern Informa-tion Retrieval (IR) methods in conjunction with in-house automatic speech recognition. The novel feature of our TREC-7 work is the use of document expansion to reduce the performance loss due to ASR errors. Results show that retrieval from automatic transcrip-tions of speech is quite

Research paper thumbnail of AT&T at TREC8

In 1999, AT&T participated in the ad-hoc task and the Question Answering QA, Spoken Document Retr... more In 1999, AT&T participated in the ad-hoc task and the Question Answering QA, Spoken Document Retrieval SDR, and Web tracks. Most of our e ort for TREC-8 focused on the QA and SDR tracks. Results from SDR track show that our document expansion techniques, presented in 8, 9 , are very e ective for speech retrieval. The results for question answering are also encouraging. Our system designed in a relatively short period for this task can nd the correct answer for about 45 of the user questions. This is specially good given the fact that our system extracts only a short phrase as an answer.

Research paper thumbnail of A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars

Research paper thumbnail of D-theory: talking about talking about trees

Linguists, including computational linguists, have always been fond of talking about trees. In th... more Linguists, including computational linguists, have always been fond of talking about trees. In this paper, we outline a theory of linguistic structure which talks about talking about trees; we call this theory Description theory (D-theory). While important issues must be resolved ...

Research paper thumbnail of Spoken content-based audio navigation (SCAN)

We describe SCAN, a system for retrieving and browsing speech documents from large audio corpora ... more We describe SCAN, a system for retrieving and browsing speech documents from large audio corpora that uses new information retrieval and speech processing techniques to create easily navigable presentations of documents relevant to a user query. Experiments show that the new interface is more effective than simple speechalone interfaces.

Research paper thumbnail of Using statistics in lexical analysis

There are similar problems in Information Retrieval . Keyword systems work best when there are on... more There are similar problems in Information Retrieval . Keyword systems work best when there are only a few dozen hits. But unfortunately, it is very easy for a user to select a keyword like food, and be buried under thousands of documents. There ought to be a set of tools that make it easier for a user to cope with a very large set of documents. In particular, the user should be able to ask the system to suggest a set of candidate keywords that would help disambiguate among the various senses of food, so that he can quickly focus on the sense that he 1 is interested in.

Research paper thumbnail of Parsing, word associations and typical predicate-argument relations

There are a number of coUocational constraints in natural languages that ought to play a more imp... more There are a number of coUocational constraints in natural languages that ought to play a more important role in natural language parsers. Thus, for example, it is hard for most parsers to take advantage of the fact that wine is typically drunk, produced, and sold, but (probably) not pruned. So too, it is hard for a parser to know which verbs go with which prepositions (e.g., set up) and which nouns fit together to form compound noun phrases (e.g., computer programmer). This paper will attempt to show that many of these types of concerns can be addressed with syntactic methods (symbol pushing), and need not require explicit semantic interpretation. We have found that it is possible to identify many of these interesting co-occurrence relations by computing simple summary statistics over millions of words of text. This paper will summarize a number of experiments carried out by various subsets of the authors over the last few years. The term collocation will be used quite broadly to include constraints on SVO (subject verb object) triples, phrasal verbs, compound noun phrases, and psycholinguistic notions of word association (e.g., doctor~nurse).

Research paper thumbnail of Spoken Content-Based Audio Navigation (SCAN

We describe SCAN, a system for retrieving and browsing speech documents from large audio corpora ... more We describe SCAN, a system for retrieving and browsing speech documents from large audio corpora that uses new information retrieval and speech processing techniques to create easily navigable presentations of documents relevant to a user query. Experiments show that the new interface is more effective than simple speechalone interfaces.

Research paper thumbnail of SCANMail: browsing and searching speech data by content

Increasing amounts of public, corporate, and private audio data are available for use, but limite... more Increasing amounts of public, corporate, and private audio data are available for use, but limited in usefulness by the lack of tools to permit their browsing and search. In this paper, we describe SCANMail, a system that employs automatic speech recognition, information retrieval, information extraction, and human computer interaction technology to permit users to browse and search their voicemail messages by content through a graphical user interface interface. The SCANMail client also provides note-taking capabilities as well as browsing and querying features. A CallerId server also proposes caller names from existing caller acoustic models and is trained from user feedback. An Email server sends the original message plus its transcription to a mailing address specified in the user's profile.

Research paper thumbnail of Structural Ambiguity and Lexical Relations

Computational Linguistics, 1993

We propose that ambiguous prepositional phrase attachment can be resolved on the basis of the rel... more We propose that ambiguous prepositional phrase attachment can be resolved on the basis of the relative strength of association of the preposition with noun and verb, estimated on the basis of word distribution in a large corpus. This work suggests that a distributional approach can be effective in resolving parsing problems that apparently call for complex reasoning.

Research paper thumbnail of SCANMail: audio navigation in the voicemail domain

This paper describes SCANMail, a system that allows users to browse and search their voicemail me... more This paper describes SCANMail, a system that allows users to browse and search their voicemail messages by content through a GUI. Content based navigation is realized by use of automatic speech recognition, information retrieval, information extraction and human computer interaction technology. In addition to the browsing and querying functionalities, acoustics-based caller ID technology is used to proposes caller names from existing caller acoustic models trained from user feedback. The GUI browser also provides a note-taking capability. Comparing SCANMail to a regular voicemail interface in a user study, SCANMail performed better both in terms of objective (time to and quality of solutions) as well as subjective objectives.

Research paper thumbnail of SCAN: designing and evaluating user interfaces to support retrieval from speech archives

Previous examinations of search in textual archives have assumed that users first retrieve a rank... more Previous examinations of search in textual archives have assumed that users first retrieve a ranked set of documents relevant to their query, and then visually scan through these documents, to identify the information they seek. While document scanning is possible in text, it is much more laborious in speech archives, due to the inherently serial nature of speech. Yet, in developing tools for speech access, little attention has so far been paid to users' problems in scanning and extracting information from within "speech documents".

Research paper thumbnail of The AT&t 60, 000 word speech-to-text system

Research paper thumbnail of AT&T at TREC7 SDR Track

AT&T participated in the Spoken Document Retrieval (SDR) track of TREC-7. Our speech retrieval sy... more AT&T participated in the Spoken Document Retrieval (SDR) track of TREC-7. Our speech retrieval system uses modern Information Retrieval (IR) methods in conjunction with in-house automatic speech recognition. The novel feature of our TREC-7 work is the use of document expansion to reduce the performance loss due to ASR errors. Results show that retrieval from automatic transcriptions of speech is quite competitive with doing retrieval from human transcriptions. Our experiments indicate that document expansion can be used to further improve retrieval from automatic transcripts. This paper presents some analysis of document expansion in context of the TREC-7 SDR track task.

Research paper thumbnail of AT&T at TREC7

This year AT&T participated in the ad-hoc task and the Filtering, SDR, an... more This year AT&T participated in the ad-hoc task and the Filtering, SDR, and VLC tracks. Most ofour effort for TREC-7 was concentrated on SDR and VLC tracks. On the filtering track, we tested apreliminary version of a text classification toolkit that we have been developing over the last year. Inthe ad-hoc task, we introduce a new tf-factor in our term weighting scheme and use a simplified retrievalalgorithm. The same weighting scheme and algorithm are used in the SDR and the VLC tracks.The...

Research paper thumbnail of AT&T at TREC6: SDR Track

AT&T participated in ... more AT&T participated in the Spoken Document Retrieval (SDR) track of TREC-7.Our speech retrieval system uses modern Informa-tion Retrieval (IR) methods in conjunction with in-house automatic speech recognition. The novel feature of our TREC-7 work is the use of document expansion to reduce the performance loss due to ASR errors. Results show that retrieval from automatic transcrip-tions of speech is quite

Research paper thumbnail of AT&T at TREC8

In 1999, AT&T participated in the ad-hoc task and the Question Answering QA, Spoken Document Retr... more In 1999, AT&T participated in the ad-hoc task and the Question Answering QA, Spoken Document Retrieval SDR, and Web tracks. Most of our e ort for TREC-8 focused on the QA and SDR tracks. Results from SDR track show that our document expansion techniques, presented in 8, 9 , are very e ective for speech retrieval. The results for question answering are also encouraging. Our system designed in a relatively short period for this task can nd the correct answer for about 45 of the user questions. This is specially good given the fact that our system extracts only a short phrase as an answer.

Research paper thumbnail of A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars

Research paper thumbnail of D-theory: talking about talking about trees

Linguists, including computational linguists, have always been fond of talking about trees. In th... more Linguists, including computational linguists, have always been fond of talking about trees. In this paper, we outline a theory of linguistic structure which talks about talking about trees; we call this theory Description theory (D-theory). While important issues must be resolved ...