BUAP_1: A Naïve Approach to the Entity Linking Task (original) (raw)
Related papers
Entity linking leveraging: automatically generated annotation
2010
Entity linking refers entity mentions in a document to their representations in a knowledge base (KB). In this paper, we propose to use additional information sources from Wikipedia to find more name variations for entity linking task. In addition, as manually creating a training corpus for entity linking is laborintensive and costly, we present a novel method to automatically generate a large scale corpus annotation for ambiguous mentions leveraging on their unambiguous synonyms in the document collection. Then, a binary classifier is trained to filter out KB entities that are not similar to current mentions. This classifier not only can effectively reduce the ambiguities to the existing entities in KB, but also be very useful to highlight the new entities to KB for the further population. Furthermore, we also leverage on the Wikipedia documents to provide additional information which is not available in our generated corpus through a domain adaption approach which provides further performance improvements. The experiment results show that our proposed method outperforms the state-of-the-art approaches.
2012
Several research results have shown that specifying the information about certain entities is the most common information demand of information retrieval users. The needs should be answered by returning specific entities, their properties or related concepts instead of just any type of documents. While some search engines are capable of recognizing specific types of entities, true entity-oriented search still has a long way to go because of the high ambiguity in names across documents. Entity linking (EL) goes beyond the entity recognition task by linking a textual named entity mention to a knowledge base entry. It is a difficult task involving several challenges. This paper gives a survey of the EL tasks in the general and the biomedical domain. In addition, results of our latest EL work are provided for reference, which uncover new EL challenges found in biomedical text mining, along with discussions regarding their possible solutions.
Entity linking based on the co-occurrence graph and entity probability
Proceedings of the first international workshop on Entity recognition & disambiguation - ERD '14, 2014
This paper describes our system for the Entity Recognition and Disambiguation Challenge 2014. There are two tasks: one to find entities in queries (Short Track), the other to find entities in texts from web pages (Long Track). We have participated in both tracks with the same system tuned to each of the tasks. On the final test set, we reached the f-measure of 71.9% on the Long Track and of 66.9% on the Short Track. We describe our system and its components in depth, together with their influence on performance. The specifics of each of the tasks are also discussed.
Dexter: an open source framework for entity linking
We introduce Dexter, an open source framework for entity linking. The entity linking task aims at identifying all the small text fragments in a document referring to an entity contained in a given knowledge base, e.g., Wikipedia. The annotation is usually organized in three tasks. Given an input document the first task consists in discovering the fragments that could refer to an entity. Since a mention could refer to multiple entities, it is necessary to perform a disambiguation step, where the correct entity is selected among the candidates. Finally, discovered entities are ranked by some measure of relevance. Many entity linking algorithms have been proposed, but unfortunately only a few authors have released the source code or some APIs. As a result, evaluating today the performance of a method on a single subtask, or comparing different techniques is difficult. In this work we present a new open framework, called Dexter, which implements some popular algorithms and provides all the tools needed to develop any entity linking technique. We believe that a shared framework is fundamental to perform fair comparisons and improve the state of the art.
From Entity Recognition to Entity Linking: A Survey of Advanced Entity Linking Techniques
2012
Several research results have shown that specifying the information about certain entities is the most common information demand of information retrieval users. The needs should be answered by returning specific entities, their properties or related concepts instead of just any type of documents. While some search engines are capable of recognizing specific types of entities, true entity-oriented search still has a long way to go because of the high ambiguity in names across documents. Entity linking (EL) goes beyond the entity recognition task by linking a textual named entity mention to a knowledge base entry. It is a difficult task involving several challenges. This paper gives a survey of the EL tasks in the general and the biomedical domain. In addition, results of our latest EL work are provided for reference, which uncover new EL challenges found in biomedical text mining, along with discussions regarding their possible solutions.
Nus-i2r: Learning a combined system for entity linking
2010
In this paper, we report the joint participation of NUS and I2R team in Knowledge Base Population at Text analysis conference 2010. For Entity Linking, we analyze IR approaches and SVM classification in the disambiguation stage and develop a supervised learner for combining these approaches. The combined system performs better than the individual components and achieves results much better than the median. Furthermore, according to our error analysis, quite some errors are caused due to the different Wikipedia version is used, which hinder our system to show significant better performance.
Mining and Leveraging Background Knowledge for Improving Named Entity Linking
Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, 2018
Knowledge-rich Information Extraction (IE) methods aspire towards combining classical IE with background knowledge obtained from third-party resources. Linked Open Data repositories that encode billions of machine readable facts from sources such as Wikipedia play a pivotal role in this development. The recent growth of Linked Data adoption for Information Extraction tasks has shed light on many data quality issues in these data sources that seriously challenge their usefulness such as completeness, timeliness and semantic correctness. Information Extraction methods are, therefore, faced with problems such as name variance and type confusability. If multiple linked data sources are used in parallel, additional concerns regarding link stability and entity mappings emerge. This paper develops methods for integrating Linked Data into Named Entity Linking methods and addresses challenges in regard to mining knowledge from Linked Data, mitigating data quality issues, and adapting algorithms to leverage this knowledge. Finally, we apply these methods to Recognyze, a graph-based Named Entity Linking (NEL) system, and provide a comprehensive evaluation which compares its performance to other well-known NEL systems, demonstrating the impact of the suggested methods on its own entity linking performance.
Context-Based Entity Linking – University of Amsterdam at TAC 2012
This paper describes our approach to the 2012 Text Analysis Conference (TAC) Knowledge Base Population (KBP) entity linking track. For this task, we turn to a state-of-the-art system for entity linking in microblog posts. Compared to the little context microblog posts provide, the documents in the TAC KBP track provide context of greater length and of a less noisy nature. In this paper, we adapt the entity linking system for microblog posts to the KBP task by extending it with approaches that explicitly rely on the query's context. We show that incorporating novel features that leverage the context on the entity-level can lead to improved performance in the TAC KBP task.
A Hybrid Approach to Domain-Specific Entity Linking
The current state-of-the-art Entity Linking (EL) systems are geared towards corpora that are as heterogeneous as the Web, and therefore perform sub-optimally on domain-specific corpora. A key open problem is how to construct effective EL systems for specific domains, as knowledge of the local context should in principle increase, rather than decrease, effectiveness. In this paper we propose the hybrid use of simple specialist linkers in combination with an existing generalist system to address this problem. Our main findings are the following. First, we construct a new reusable benchmark for EL on a corpus of domain-specific conversations. Second, we test the performance of a range of approaches under the same conditions, and show that specialist linkers obtain high precision in isolation, and high recall when combined with generalist linkers. Hence, we can effectively exploit local context and get the best of both worlds.
Supervised Learning for Linking Named Entities to Knowledge Base Entries
2011
Abstract This paper addresses the challenging information extraction problem of linking named entities in text to entries in a knowledge base. Our approach uses supervised learning to (a) rank candidate knowledge base entries for each named entity,(b) classify the top-ranked entry as the correct disambiguation or not, and (c) group together the named entities without a corresponding entry in the knowledge base.