An approach to web-scale named-entity disambiguation (original) (raw)
Related papers
Named entity disambiguation: A hybrid statistical and rule-based incremental approach
The Semantic Web, 2008
The rapidly increasing use of large-scale data on the Web makes named entity disambiguation become one of the main challenges to research in Information Extraction and development of Semantic Web. This paper presents a novel method for detecting proper names in a text and linking them to the right entities in Wikipedia. The method is hybrid, containing two phases of which the first one utilizes some heuristics and patterns to narrow down the candidates, and the second one employs the vector space model to rank the ambiguous cases to choose the right candidate. The novelty is that the disambiguation process is incremental and includes several rounds that filter the candidates, by exploiting previously identified entities and extending the text by those entity attributes every time they are successfully resolved in a round. We test the performance of the proposed method in disambiguation of names of people, locations and organizations in texts of the news domain. The experiment results show that our approach achieves high accuracy and can be used to construct a robust named entity disambiguation system.
Personalized Page Rank for Named Entity Disambiguation
The task of Named Entity Disambiguation is to map entity mentions in the document to their correct entries in some knowledge base. We present a novel graph-based dis-ambiguation approach based on Personalized PageRank (PPR) that combines local and global evidence for disambiguation and effectively filters out noise introduced by incorrect candidates. Experiments show that our method outperforms state-of-the-art approaches by achieving 91.7% in micro-and 89.9% in macroaccuracy on a dataset of 27.8K named entity mentions.
Entity Disambiguation for Knowledge Base Population
2010
Abstract The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues such as non-uniform variations in entity names, mention ambiguity, and entities absent from a knowledge base. We present a state of the art system for entity disambiguation that not only addresses these challenges but also scales to knowledge bases with several million entries using very little resources.
Graph Based Disambiguation of Named Entities using Linked Data
— Identifying entities such as people, organizations, songs, or places in natural language texts is needful for semantic search, machine translation, and information extraction. A key challenge is the ambiguity of entity names, requiring robust methods to disambiguate names to the entities registered in a knowledge base. Several approaches aim to tackle this problem, they still achieve poor accuracy. We address this drawback by presenting a novel knowledge-base-agnostic approach for named entity disambiguation. Our approach includes the HITS algorithm combined with label expansion strategies and string similarity measure like the n-gram similarity. Based on this combination, we can efficiently detect the correct URIs for a given set of named entities within an input text.
Entity Disambiguation and Linking over Queries using Encyclopedic Knowledge
Literature has seen a large amount of work on entity recognition and semantic disambiguation in text but very limited on the effect in noisy text data. In this paper, we present an approach for recognizing and disambiguating entities in text based on the high coverage and rich structure of an online encyclopedia. This work was carried out on a collection of query logs from the Bridgeman Art Library. As queries are noisy unstructured text, pure natural language processing as well as computational techniques can create problems, we need to contend with the impact noise and the demands it places on query analysis. In order to cope with the noisy input, we use machine learning method with statistical measures derived from Wikipedia. It provides a huge electronic text from the Internet, which is also noisy. Our approach is an unsupervised approach and do not need any manual annotation made by human experts. We show that data collection from Wikipedia can be used statistically to derive good performance for entity recognition and semantic disambiguation over noisy unstructured text. Also, as no natural language specific tool is needed, the method can be applied to other languages in a similar manner with little adaptation.
Robust disambiguation of named entities in text
2011
Disambiguating named entities in naturallanguage text maps mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base such as DBpedia or YAGO. This paper presents a robust method for collective disambiguation, by harnessing context from knowledge bases and using a new form of coherence graph. It unifies prior approaches into a comprehensive framework that combines three measures: the prior probability of an entity being mentioned, the similarity between the contexts of a mention and a candidate entity, as well as the coherence among candidate entities for all mentions together. The method builds a weighted graph of mentions and candidate entities, and computes a dense subgraph that approximates the best joint mention-entity mapping. Experiments show that the new method significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs.
A knowledge-based approach to named entity disambiguation in news articles
2007
Named entity disambiguation has been one of the main challenges to research in Information Extraction and development of Semantic Web. Therefore, it has attracted much research effort, with various methods introduced for different domains, scopes, and purposes. In this paper, we propose a new approach that is not limited to some entity classes and does not require wellstructured texts. The novelty is that it exploits relations between co-occurring entities in a text as defined in a knowledge base for disambiguation. Combined with class weighting and coreference resolution, our knowledge-based method outperforms KIM system in this problem. Implemented algorithms and conducted experiments for the method are presented and discussed.
Entity Disambiguation for Wild Big Data Using Multi-Level Clustering
2015
When RDF instances represent the same entity they are said to corefer. For example, two nodes from different RDF graphs 1 both refer to same individual, musical artist James Brown. Disambiguating entities is essential for knowledge base population and other tasks that result in integration or linking of data. Often however, entity instance data originates from different sources and can be represented using differ- ent schemas or ontologies. In the age of Big Data, data can have other characteristics such originating from sources which are schema-less or without ontological structure. Our work involves researching new ways to process this type of data in order to perform entity disambiguation. Our approach uses multi-level clustering and includes fine-grained entity type recognition, contextualization of entities, online processing of which can be supported by a parallel architecture.
Named Entity Disambiguation for Noisy Text
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)
We address the task of Named Entity Disambiguation (NED) for noisy text. We present WikilinksNED, a large-scale NED dataset of text fragments from the web, which is significantly noisier and more challenging than existing newsbased datasets. To capture the limited and noisy local context surrounding each mention, we design a neural model and train it with a novel method for sampling informative negative examples. We also describe a new way of initializing word and entity embeddings that significantly improves performance. Our model significantly outperforms existing state-ofthe-art methods on WikilinksNED while achieving comparable performance on a smaller newswire dataset.
Enriching Ontologies for Named Entity Disambiguation
Detecting entity mentions in a text and then mapping them to their right entities in a given knowledge source is significant to realization of the semantic web, as well as advanced development of natural language processing applications. The knowledge sources used are often close ontologiesbuilt by small groups of experts -and Wikipedia. To date, state-of-the-art methods proposed for named entity disambiguation mainly use Wikipedia as such a knowledge source. This paper proposes a method that enriches a close ontology by Wikipedia and then disambiguates named entities in a text based on that enriched one. The method disambiguates named entities in a text iteratively and incrementally, including several iterative steps. Those named entities that are identified in each iterative step will be used to disambiguate the remaining ones in the next iterative steps. The experiment results show that enrichment of a close ontology noticeably improves disambiguation performance.