Graph Based Disambiguation of Named Entities using Linked Data (original) (raw)
Related papers
An approach to web-scale named-entity disambiguation
2009
We present a multi-pass clustering approach to large scale, wide-scope named-entity disambiguation (NED) on collections of web pages. Our approach uses name co-occurrence information to cluster and hence disambiguate entities, and is designed to handle NED on the entire web. We show that on web collections, NED becomes increasingly difficult as the corpus size increases, not only because of the challenge of scaling the NED algorithm, but also because new and surprising facets of entities become visible in the data.
Graph-based semantic relatedness for named entity disambiguation
Proceedings of S3T, 2009
Natural Language is a mean to express and discuss about concepts, objects, events, i.e. it carries semantic contents. The Semantic Web aims at tightly coupling contents with their precise meanings. One of the ultimate roles of Natural Language Processing techniques is identifying the meaning of the text, providing effective ways to make a proper linkage between textual references and real world objects. This work adresses the problem of giving a sense to proper names in a text, that is automatically associating words representing Named Entities with their identities. The proposed methodology for Named Entity Disambiguation is based on Semantic Relatedness Scores obtained with a graph based model over Wikipedia. We show that, without building a Bag of Words representation of text, but only considering named entities within the text, the proposed paradigm achieves results competitive with the state of the art on a news story dataset.
Robust disambiguation of named entities in text
2011
Disambiguating named entities in naturallanguage text maps mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base such as DBpedia or YAGO. This paper presents a robust method for collective disambiguation, by harnessing context from knowledge bases and using a new form of coherence graph. It unifies prior approaches into a comprehensive framework that combines three measures: the prior probability of an entity being mentioned, the similarity between the contexts of a mention and a candidate entity, as well as the coherence among candidate entities for all mentions together. The method builds a weighted graph of mentions and candidate entities, and computes a dense subgraph that approximates the best joint mention-entity mapping. Experiments show that the new method significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs.
Named Entity Recognition and Disambiguation using Linked Data and Graph-based Centrality Scoring
… of the 4th International Workshop on …, 2012
Named Entity Recognition (NER) is a subtask of information extraction and aims to identify atomic entities in text that fall into predefined categories such as person, location, organization, etc. Recent efforts in NER try to extract entities and link them to linked data entities. Linked data is a term used for data resources that are created using semantic web standards such as DBpedia.
Semantic Relatedness Approach for Named Entity Disambiguation
Communications in Computer and Information Science, 2010
Natural Language is a mean to express and discuss about concepts, objects, events, i.e. it carries semantic contents. One of the ultimate aims of Natural Language Processing techniques is to identify the meaning of the text, providing effective ways to make a proper linkage between textual references and their referents, that is real world objects. This work addresses the problem of giving a sense to proper names in a text, that is automatically associating words representing Named Entities with their referents. The proposed methodology for Named Entity Disambiguation is based on Semantic Relatedness Scores obtained with a graph based model over Wikipedia. We show that, without building a Bag of Words representation of the text, but only considering named entities within the text, the proposed paradigm achieves results competitive with the state of the art on two different datasets.
Entity Disambiguation for Knowledge Base Population
2010
Abstract The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues such as non-uniform variations in entity names, mention ambiguity, and entities absent from a knowledge base. We present a state of the art system for entity disambiguation that not only addresses these challenges but also scales to knowledge bases with several million entries using very little resources.
Enriching Ontologies for Named Entity Disambiguation
Detecting entity mentions in a text and then mapping them to their right entities in a given knowledge source is significant to realization of the semantic web, as well as advanced development of natural language processing applications. The knowledge sources used are often close ontologiesbuilt by small groups of experts -and Wikipedia. To date, state-of-the-art methods proposed for named entity disambiguation mainly use Wikipedia as such a knowledge source. This paper proposes a method that enriches a close ontology by Wikipedia and then disambiguates named entities in a text based on that enriched one. The method disambiguates named entities in a text iteratively and incrementally, including several iterative steps. Those named entities that are identified in each iterative step will be used to disambiguate the remaining ones in the next iterative steps. The experiment results show that enrichment of a close ontology noticeably improves disambiguation performance.
Entity Disambiguation and Linking over Queries using Encyclopedic Knowledge
Literature has seen a large amount of work on entity recognition and semantic disambiguation in text but very limited on the effect in noisy text data. In this paper, we present an approach for recognizing and disambiguating entities in text based on the high coverage and rich structure of an online encyclopedia. This work was carried out on a collection of query logs from the Bridgeman Art Library. As queries are noisy unstructured text, pure natural language processing as well as computational techniques can create problems, we need to contend with the impact noise and the demands it places on query analysis. In order to cope with the noisy input, we use machine learning method with statistical measures derived from Wikipedia. It provides a huge electronic text from the Internet, which is also noisy. Our approach is an unsupervised approach and do not need any manual annotation made by human experts. We show that data collection from Wikipedia can be used statistically to derive good performance for entity recognition and semantic disambiguation over noisy unstructured text. Also, as no natural language specific tool is needed, the method can be applied to other languages in a similar manner with little adaptation.
Using Linked Open Data Sources for Entity Disambiguation
2013
Within the framework of RepLab 2013, the filtering task try to discover if one tweet is related to one certain entity or not. Our work tries to take advantages of the Web of Data in order to create a context for every entity, extracted from the available Linked Data Sources. The context in Natural Lan- guage Processing (NLP) is the outstanding issue able to distinguish the con- tained semantics in a message by analyzing the frame in which the words are embedded.
Personalized Page Rank for Named Entity Disambiguation
The task of Named Entity Disambiguation is to map entity mentions in the document to their correct entries in some knowledge base. We present a novel graph-based dis-ambiguation approach based on Personalized PageRank (PPR) that combines local and global evidence for disambiguation and effectively filters out noise introduced by incorrect candidates. Experiments show that our method outperforms state-of-the-art approaches by achieving 91.7% in micro-and 89.9% in macroaccuracy on a dataset of 27.8K named entity mentions.