Marc Verhagen - Profile on Academia.edu (original) (raw)

Papers by Marc Verhagen

Research paper thumbnail of TimeBank 1.2 Documentation

TimeBank 1.2 Documentation

Research paper thumbnail of Using ISO-Space for Annotating Spatial Information

In this paper, we describe ISO-Space, an annotation lan-guage currently under community developme... more In this paper, we describe ISO-Space, an annotation lan-guage currently under community development for encoding spatial and spatiotemporal information as expressed in natural language text. After reviewing the requirements of a specification for capturing such knowl-edge from linguistic descriptions, we demonstrate how ISO-Space aims to address these problems. ISO-Space is an emerging resource that is still in its early stages of development; hence community involvement from multiple disciplines and potential users and consumers is necessary, if it is to achieve a level of descriptive adequacy and subsequent adoption. We describe the genres of text that are being used in a pilot annotation study, in order to both refine and enrich the specification language.

Research paper thumbnail of Geolocating Orientational Descriptions of Landmark Configurations

In this paper we outline how to translate verbal subjective descriptions of spatial relations int... more In this paper we outline how to translate verbal subjective descriptions of spatial relations into metrically meaningful positional information, and extend this capability to spatiotemporal monitoring. Document collections, transcriptions, cables, and narratives routinely make reference to objects moving through space over time. Integrating such information derived from textual sources into a geosensor data system can enhance the overall spatiotemporal representation in changing and evolving situations, such as when tracking objects through space with limited image data. We focus on landmark identification, since it proves to be a more tractable problem than open-domain image recognition.

Research paper thumbnail of Temporal Information in Intensional Contexts

We present a system aimed at representing and interpreting the temporal information in intensiona... more We present a system aimed at representing and interpreting the temporal information in intensional contexts, focusing particularly on the representation task. Take as an example the piece of discourse below. (1) 1998-03-03. Thousands of people began gathering in the capital Abuja early Tuesday for the two day rally supporting General Sani Abacha’s candidacy (...). But as supporters of the military leader gathered in the north, riot police deployed in Nigeria’s southern commercial capital Lagos, to break up a protest rally called by the political opposition. The problem is how to represent temporal information about the break up event, which was introduced by a purpose clause in a syntactic subordination pattern. We use TimeML ([3]), an annotation language for temporal information to obtain a TimeNet, a directed cyclic graph representing the temporal information in a text. TimeML uses a basic ontology of expressions denoting temporally relevant entities: time expressions are identifi...

Research paper thumbnail of Three approaches to learning tlinks in timeml

Research paper thumbnail of Annotation of Temporal Relations with Tango

Temporal annotation is a complex task characterized by low markup speed and low inter-annotator a... more Temporal annotation is a complex task characterized by low markup speed and low inter-annotator agreements scores. Tango is a graphical annotation tool for temporal relations. It is developed for the TimeML annotation language and allows annotators to build a graph that resembles a timeline. Temporal relations are added by selecting events and drawing labeled arrows between them. Tango is integrated with a temporal closure component and includes features like SmartLink, user prompting and automatic linking of time expressions. Tango has been used to create two corpora with temporal annotation, TimeBank and the AQUAINT Opinion corpus.

Research paper thumbnail of TimeBank 1.2

TimeBank 1.2

TimeBank 1.2 is a corpus that contains 183 news articles that have been annotated with temporal i... more TimeBank 1.2 is a corpus that contains 183 news articles that have been annotated with temporal information, adding events, times and temporal links between events and times. The annotation follows the TimeML 1.2.1 specification available at www.timeml.org. TimeML aims to capture and represent temporal information. This is accomplished using four primary tag types: TIMEX3 for temporal expressions, EVENT for temporal events, SIGNAL for temporal signals, and LINK for representing relationships.

Research paper thumbnail of SemEval2007 task 15: TempEval temporal relation identification

The TempEval task proposes a simple way to evaluate automatic extraction of temporal relations. I... more The TempEval task proposes a simple way to evaluate automatic extraction of temporal relations. It avoids the pitfalls of evaluat-ing a graph of inter-related labels by defin-ing three sub tasks that allow pairwise eval-uation of temporal relations. The task not only allows straightforward evaluation, it also avoids the complexities of full tempo-ral parsing.

Research paper thumbnail of The Holy Grail of Sense Definition: Creating a Sense-Disambiguated Corpus from Scratch

This paper presents a methodology for cre- ating a gold standard for sense definition using Amazo... more This paper presents a methodology for cre- ating a gold standard for sense definition using Amazon's Mechanical Turk service. We demonstrate how this method can be used to create in a single step, quickly and cheaply, a lexicon of sense inventories and the corresponding sense-annotated lexical sample. We show the results obtained by this method for a sample verb and dis- cuss how it can be improved to produce an exhaustive lexical resource. We then de- scribe how such a resource can be used to further other semantic annotation efforts, using as an example the Generative Lexi- con Mark-up Language (GLML) effort.

Research paper thumbnail of SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations

SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations

Research paper thumbnail of A constraint-based representation scheme of collocational structures

Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics -, 1993

t OTS -Trans 10 -3512 JK Utrecht (NL) CLMT Essex University -Wivenhoe Park -C04 3SQ Colchester (UK)

Research paper thumbnail of Modeling Debate within a Scientific Community

Modeling Debate within a Scientific Community

2013 International Conference on Social Intelligence and Technology, 2013

ABSTRACT There is growing interest in automating the detection and tracking of new and significan... more ABSTRACT There is growing interest in automating the detection and tracking of new and significant developments in science and technology, as they emerge within a given community. A significant component of detecting such patterns of emergence is identifying the presence of a debate in the scientific community. This often reflects disagreements or uncertainties over technologies or concepts as they are actively being discussed and developed. In this paper, we present an algorithm for recognizing debate in large document collections. We distinguish three distinct styles of debate over a document collection: (i) silent debate, (ii) active disagreement, and (iii) topical uncertainty. Our algorithm employs a number of indicators found in the metadata and full text of publications and patents to identify the presence of these types of debate in the community. The paper outlines the details of these features and indicators and reports on the results of applying these indicators to data from several fields classified by subject matter experts, which show that system outputs have high agreement with SME's judgments.

Research paper thumbnail of Clinical TempEval

Clinical TempEval

We describe the Clinical TempEval task which is currently in preparation for the SemEval-2015 eva... more We describe the Clinical TempEval task which is currently in preparation for the SemEval-2015 evaluation exercise. This task involves identifying and describing events, times and the relations between them in clinical text. Six discrete subtasks are included, focusing on recognising mentions of times and events, describing those mentions for both entity types, identifying the relation between an event and the document creation time, and identifying narrative container relations.

Research paper thumbnail of Medstract: the next generation

Medstract: the next generation

ABSTRACT We present MedstractPlus, a resource for mining relations from the Medline bibliographic... more ABSTRACT We present MedstractPlus, a resource for mining relations from the Medline bibliographic database. It was built on the remains of Medstract, a previously created resource that included a bio-relation server and an acronym database. MedstractPlus uses simple and scalable natural language processing modules to structure text and is designed with reusability and extendibility in mind.

Research paper thumbnail of TimeBank 1.2 Documentation

TimeBank 1.2 Documentation

Research paper thumbnail of Learning biological networks via bootstrapping with optimized GO-based gene similarity

Learning biological networks via bootstrapping with optimized GO-based gene similarity

2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010, 2010

Microarray gene expression data provide a unique information resource for learning biological net... more Microarray gene expression data provide a unique information resource for learning biological networks using "reverse engineering" methods. However, there are a variety of cases in which we know which genes are involved in a given pathology of interest, but we do not have enough experimental evidence to support the use of fully-supervised/reverse-engineering learning methods. Moreover, corroboration of the reverse engineered

Research paper thumbnail of SlinkET: A Partial Modal Parser for Events

We present SlinkET, a parser for identifying contexts of event modality in text developed within ... more We present SlinkET, a parser for identifying contexts of event modality in text developed within the TARSQI (Temporal Awareness and Reasoning Systems for Question Interpretation) research framework. SlinkET is grounded on TimeML, a specification language for capturing temporal and event related information in discourse, which provides an adequate foundation to handle event modality. SlinkET builds on top of a robust

Research paper thumbnail of Annotating and Recognizing Event Modality in Text

The Florida AI Research Society Conference, 2006

Current results in basic Information Extraction tasks such as Named Entity Recognition or Event E... more Current results in basic Information Extraction tasks such as Named Entity Recognition or Event Extraction suggest that we are close to achieving a stage where the fundamental units for text understanding are put to- gether; namely, predicates and their arguments. How- ever, other layers of information, such as event modal- ity, are essential for understanding, since the inferences derivable from

Research paper thumbnail of Towards a Generative Lexical Resource: The Brandeis Semantic Ontology

Language Resources and Evaluation, 2006

In this paper we describe the structure and development of the Brandeis Semantic Ontology (BSO), ... more In this paper we describe the structure and development of the Brandeis Semantic Ontology (BSO), a large generative lexicon ontology and lexical database. The BSO has been designed to allow for more widespread access to Generative Lexicon-based lexical resources and help researchers in a variety of computational tasks. The specification of the type system used in the BSO largely follows

Research paper thumbnail of SemEval-2007 task 15

SemEval-2007 task 15

Proceedings of the 4th International Workshop on Semantic Evaluations - SemEval '07, 2007

ABSTRACT The TempEval task proposes a simple way to evaluate automatic extraction of temporal rel... more ABSTRACT The TempEval task proposes a simple way to evaluate automatic extraction of temporal relations. It avoids the pitfalls of evaluating a graph of inter-related labels by defining three sub tasks that allow pairwise evaluation of temporal relations. The task not only allows straightforward evaluation, it also avoids the complexities of full temporal parsing.

Research paper thumbnail of TimeBank 1.2 Documentation

TimeBank 1.2 Documentation

Research paper thumbnail of Using ISO-Space for Annotating Spatial Information

In this paper, we describe ISO-Space, an annotation lan-guage currently under community developme... more In this paper, we describe ISO-Space, an annotation lan-guage currently under community development for encoding spatial and spatiotemporal information as expressed in natural language text. After reviewing the requirements of a specification for capturing such knowl-edge from linguistic descriptions, we demonstrate how ISO-Space aims to address these problems. ISO-Space is an emerging resource that is still in its early stages of development; hence community involvement from multiple disciplines and potential users and consumers is necessary, if it is to achieve a level of descriptive adequacy and subsequent adoption. We describe the genres of text that are being used in a pilot annotation study, in order to both refine and enrich the specification language.

Research paper thumbnail of Geolocating Orientational Descriptions of Landmark Configurations

In this paper we outline how to translate verbal subjective descriptions of spatial relations int... more In this paper we outline how to translate verbal subjective descriptions of spatial relations into metrically meaningful positional information, and extend this capability to spatiotemporal monitoring. Document collections, transcriptions, cables, and narratives routinely make reference to objects moving through space over time. Integrating such information derived from textual sources into a geosensor data system can enhance the overall spatiotemporal representation in changing and evolving situations, such as when tracking objects through space with limited image data. We focus on landmark identification, since it proves to be a more tractable problem than open-domain image recognition.

Research paper thumbnail of Temporal Information in Intensional Contexts

We present a system aimed at representing and interpreting the temporal information in intensiona... more We present a system aimed at representing and interpreting the temporal information in intensional contexts, focusing particularly on the representation task. Take as an example the piece of discourse below. (1) 1998-03-03. Thousands of people began gathering in the capital Abuja early Tuesday for the two day rally supporting General Sani Abacha’s candidacy (...). But as supporters of the military leader gathered in the north, riot police deployed in Nigeria’s southern commercial capital Lagos, to break up a protest rally called by the political opposition. The problem is how to represent temporal information about the break up event, which was introduced by a purpose clause in a syntactic subordination pattern. We use TimeML ([3]), an annotation language for temporal information to obtain a TimeNet, a directed cyclic graph representing the temporal information in a text. TimeML uses a basic ontology of expressions denoting temporally relevant entities: time expressions are identifi...

Research paper thumbnail of Three approaches to learning tlinks in timeml

Research paper thumbnail of Annotation of Temporal Relations with Tango

Temporal annotation is a complex task characterized by low markup speed and low inter-annotator a... more Temporal annotation is a complex task characterized by low markup speed and low inter-annotator agreements scores. Tango is a graphical annotation tool for temporal relations. It is developed for the TimeML annotation language and allows annotators to build a graph that resembles a timeline. Temporal relations are added by selecting events and drawing labeled arrows between them. Tango is integrated with a temporal closure component and includes features like SmartLink, user prompting and automatic linking of time expressions. Tango has been used to create two corpora with temporal annotation, TimeBank and the AQUAINT Opinion corpus.

Research paper thumbnail of TimeBank 1.2

TimeBank 1.2

TimeBank 1.2 is a corpus that contains 183 news articles that have been annotated with temporal i... more TimeBank 1.2 is a corpus that contains 183 news articles that have been annotated with temporal information, adding events, times and temporal links between events and times. The annotation follows the TimeML 1.2.1 specification available at www.timeml.org. TimeML aims to capture and represent temporal information. This is accomplished using four primary tag types: TIMEX3 for temporal expressions, EVENT for temporal events, SIGNAL for temporal signals, and LINK for representing relationships.

Research paper thumbnail of SemEval2007 task 15: TempEval temporal relation identification

The TempEval task proposes a simple way to evaluate automatic extraction of temporal relations. I... more The TempEval task proposes a simple way to evaluate automatic extraction of temporal relations. It avoids the pitfalls of evaluat-ing a graph of inter-related labels by defin-ing three sub tasks that allow pairwise eval-uation of temporal relations. The task not only allows straightforward evaluation, it also avoids the complexities of full tempo-ral parsing.

Research paper thumbnail of The Holy Grail of Sense Definition: Creating a Sense-Disambiguated Corpus from Scratch

This paper presents a methodology for cre- ating a gold standard for sense definition using Amazo... more This paper presents a methodology for cre- ating a gold standard for sense definition using Amazon's Mechanical Turk service. We demonstrate how this method can be used to create in a single step, quickly and cheaply, a lexicon of sense inventories and the corresponding sense-annotated lexical sample. We show the results obtained by this method for a sample verb and dis- cuss how it can be improved to produce an exhaustive lexical resource. We then de- scribe how such a resource can be used to further other semantic annotation efforts, using as an example the Generative Lexi- con Mark-up Language (GLML) effort.

Research paper thumbnail of SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations

SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations

Research paper thumbnail of A constraint-based representation scheme of collocational structures

Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics -, 1993

t OTS -Trans 10 -3512 JK Utrecht (NL) CLMT Essex University -Wivenhoe Park -C04 3SQ Colchester (UK)

Research paper thumbnail of Modeling Debate within a Scientific Community

Modeling Debate within a Scientific Community

2013 International Conference on Social Intelligence and Technology, 2013

ABSTRACT There is growing interest in automating the detection and tracking of new and significan... more ABSTRACT There is growing interest in automating the detection and tracking of new and significant developments in science and technology, as they emerge within a given community. A significant component of detecting such patterns of emergence is identifying the presence of a debate in the scientific community. This often reflects disagreements or uncertainties over technologies or concepts as they are actively being discussed and developed. In this paper, we present an algorithm for recognizing debate in large document collections. We distinguish three distinct styles of debate over a document collection: (i) silent debate, (ii) active disagreement, and (iii) topical uncertainty. Our algorithm employs a number of indicators found in the metadata and full text of publications and patents to identify the presence of these types of debate in the community. The paper outlines the details of these features and indicators and reports on the results of applying these indicators to data from several fields classified by subject matter experts, which show that system outputs have high agreement with SME's judgments.

Research paper thumbnail of Clinical TempEval

Clinical TempEval

We describe the Clinical TempEval task which is currently in preparation for the SemEval-2015 eva... more We describe the Clinical TempEval task which is currently in preparation for the SemEval-2015 evaluation exercise. This task involves identifying and describing events, times and the relations between them in clinical text. Six discrete subtasks are included, focusing on recognising mentions of times and events, describing those mentions for both entity types, identifying the relation between an event and the document creation time, and identifying narrative container relations.

Research paper thumbnail of Medstract: the next generation

Medstract: the next generation

ABSTRACT We present MedstractPlus, a resource for mining relations from the Medline bibliographic... more ABSTRACT We present MedstractPlus, a resource for mining relations from the Medline bibliographic database. It was built on the remains of Medstract, a previously created resource that included a bio-relation server and an acronym database. MedstractPlus uses simple and scalable natural language processing modules to structure text and is designed with reusability and extendibility in mind.

Research paper thumbnail of TimeBank 1.2 Documentation

TimeBank 1.2 Documentation

Research paper thumbnail of Learning biological networks via bootstrapping with optimized GO-based gene similarity

Learning biological networks via bootstrapping with optimized GO-based gene similarity

2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010, 2010

Microarray gene expression data provide a unique information resource for learning biological net... more Microarray gene expression data provide a unique information resource for learning biological networks using "reverse engineering" methods. However, there are a variety of cases in which we know which genes are involved in a given pathology of interest, but we do not have enough experimental evidence to support the use of fully-supervised/reverse-engineering learning methods. Moreover, corroboration of the reverse engineered

Research paper thumbnail of SlinkET: A Partial Modal Parser for Events

We present SlinkET, a parser for identifying contexts of event modality in text developed within ... more We present SlinkET, a parser for identifying contexts of event modality in text developed within the TARSQI (Temporal Awareness and Reasoning Systems for Question Interpretation) research framework. SlinkET is grounded on TimeML, a specification language for capturing temporal and event related information in discourse, which provides an adequate foundation to handle event modality. SlinkET builds on top of a robust

Research paper thumbnail of Annotating and Recognizing Event Modality in Text

The Florida AI Research Society Conference, 2006

Current results in basic Information Extraction tasks such as Named Entity Recognition or Event E... more Current results in basic Information Extraction tasks such as Named Entity Recognition or Event Extraction suggest that we are close to achieving a stage where the fundamental units for text understanding are put to- gether; namely, predicates and their arguments. How- ever, other layers of information, such as event modal- ity, are essential for understanding, since the inferences derivable from

Research paper thumbnail of Towards a Generative Lexical Resource: The Brandeis Semantic Ontology

Language Resources and Evaluation, 2006

In this paper we describe the structure and development of the Brandeis Semantic Ontology (BSO), ... more In this paper we describe the structure and development of the Brandeis Semantic Ontology (BSO), a large generative lexicon ontology and lexical database. The BSO has been designed to allow for more widespread access to Generative Lexicon-based lexical resources and help researchers in a variety of computational tasks. The specification of the type system used in the BSO largely follows

Research paper thumbnail of SemEval-2007 task 15

SemEval-2007 task 15

Proceedings of the 4th International Workshop on Semantic Evaluations - SemEval '07, 2007

ABSTRACT The TempEval task proposes a simple way to evaluate automatic extraction of temporal rel... more ABSTRACT The TempEval task proposes a simple way to evaluate automatic extraction of temporal relations. It avoids the pitfalls of evaluating a graph of inter-related labels by defining three sub tasks that allow pairwise evaluation of temporal relations. The task not only allows straightforward evaluation, it also avoids the complexities of full temporal parsing.