Robust disambiguation of named entities in text (original) (raw)

Graph Ranking for Collective Named Entity Disambiguation

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2014

Named Entity Disambiguation (NED) refers to the task of mapping different named entity mentions in running text to their correct interpretations in a specific knowledge base (KB). This paper presents a collective disambiguation approach using a graph model. All possible NE candidates are represented as nodes in the graph and associations between different candidates are represented by edges between the nodes. Each node has an initial confidence score, e.g. entity popularity. Page-Rank is used to rank nodes and the final rank is combined with the initial confidence for candidate selection. Experiments on 27,819 NE textual mentions show the effectiveness of using Page-Rank in conjunction with initial confidence: 87% accuracy is achieved, outperforming both baseline and state-of-the-art approaches.

System for collective entity disambiguation

Proceedings of the first international workshop on Entity recognition & disambiguation - ERD '14, 2014

We present an approach and a system for collective disambiguation of entity mentions occurring in natural language text. Given an input text, the system spots mentions and their candidate entities. Candidate entities across all mentions are jointly modeled as binary nodes in a Markov Random Field. Their edges correspond to the joint signal between pairs of entities. This facilitates collective disambiguation of the mentions achieved by performing MAP inference on the MRF in a binary label space. Our model also allows for a natural treatment of mentions that either have no entity attached or have more than one attachments. By restricting cliques to nodes and edges and with a submodularity assumption on their potentials, we get an inference problem that is efficiently solved using graph min cut.

Graph Based Disambiguation of Named Entities using Linked Data

— Identifying entities such as people, organizations, songs, or places in natural language texts is needful for semantic search, machine translation, and information extraction. A key challenge is the ambiguity of entity names, requiring robust methods to disambiguate names to the entities registered in a knowledge base. Several approaches aim to tackle this problem, they still achieve poor accuracy. We address this drawback by presenting a novel knowledge-base-agnostic approach for named entity disambiguation. Our approach includes the HITS algorithm combined with label expansion strategies and string similarity measure like the n-gram similarity. Based on this combination, we can efficiently detect the correct URIs for a given set of named entities within an input text.

Graph-based semantic relatedness for named entity disambiguation

Proceedings of S3T, 2009

Natural Language is a mean to express and discuss about concepts, objects, events, i.e. it carries semantic contents. The Semantic Web aims at tightly coupling contents with their precise meanings. One of the ultimate roles of Natural Language Processing techniques is identifying the meaning of the text, providing effective ways to make a proper linkage between textual references and real world objects. This work adresses the problem of giving a sense to proper names in a text, that is automatically associating words representing Named Entities with their identities. The proposed methodology for Named Entity Disambiguation is based on Semantic Relatedness Scores obtained with a graph based model over Wikipedia. We show that, without building a Bag of Words representation of text, but only considering named entities within the text, the proposed paradigm achieves results competitive with the state of the art on a news story dataset.

Collective Named Entity Disambiguation using Graph Ranking and Clique Partitioning Approaches

Disambiguating named entities (NE) in running text to their correct interpretations in a specific knowledge base (KB) is an important problem in NLP. This paper presents two collective disambiguation approaches using a graph representation where possible KB candidates for NE textual mentions are represented as nodes and the coherence relations between different NE candidates are represented by edges. Each node has a local confidence score and each edge has a weight. The first approach uses Page-Rank (PR) to rank all nodes and selects a candidate based on PR score combined with local confidence score. The second approach uses an adapted Clique Partitioning technique to find the most weighted clique and expands this clique until all NE textual mentions are disambiguated. Experiments on 27,819 NE textual mentions show the effectiveness of both approaches, outperforming both baseline and state-of-the-art approaches. This work is licensed under a Creative Commons Attribution 4.0 International Licence.

Cultural Knowledge for Named Entity Disambiguation: A Graph-Based Semantic Relatedness Approach

Serdica Journal of …, 2010

One of the ultimate aims of Natural Language Processing is to automate the analysis of the meaning of text. A fundamental step in that direction consists in enabling effective ways to automatically link textual references to their referents, that is, real world objects. The work presented in this paper addresses the problem of attributing a sense to proper names in a given text, i.e., automatically associating words representing Named Entities with their referents. The method for Named Entity Disambiguation proposed here is based on the concept of semantic relatedness, which in this work is obtained via a graph-based model over Wikipedia. We show that, without building the traditional bag of words representation of the text, but instead only considering named entities within the text, the proposed method achieves results competitive with the state-of-the-art on two different datasets.

Named Entity Disambiguation for Noisy Text

Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

We address the task of Named Entity Disambiguation (NED) for noisy text. We present WikilinksNED, a large-scale NED dataset of text fragments from the web, which is significantly noisier and more challenging than existing newsbased datasets. To capture the limited and noisy local context surrounding each mention, we design a neural model and train it with a novel method for sampling informative negative examples. We also describe a new way of initializing word and entity embeddings that significantly improves performance. Our model significantly outperforms existing state-ofthe-art methods on WikilinksNED while achieving comparable performance on a smaller newswire dataset.

Semantic Relatedness Approach for Named Entity Disambiguation

Communications in Computer and Information Science, 2010

Natural Language is a mean to express and discuss about concepts, objects, events, i.e. it carries semantic contents. One of the ultimate aims of Natural Language Processing techniques is to identify the meaning of the text, providing effective ways to make a proper linkage between textual references and their referents, that is real world objects. This work addresses the problem of giving a sense to proper names in a text, that is automatically associating words representing Named Entities with their referents. The proposed methodology for Named Entity Disambiguation is based on Semantic Relatedness Scores obtained with a graph based model over Wikipedia. We show that, without building a Bag of Words representation of the text, but only considering named entities within the text, the proposed paradigm achieves results competitive with the state of the art on two different datasets.

AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables

2011

We present AIDA, a framework and online tool for entity detection and disambiguation. Given a natural-language text or a Web table, we map mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base like DBpedia, Freebase, or YAGO. AIDA is a robust framework centred around collective disambiguation exploiting the prominence of entities, similarity between the context of the mention and its candidates, and the coherence among candidate entities for all mentions. We have developed a Web-based online interface for AIDA where different formats of inputs can be processed on the fly, returning proper entities and showing intermediate steps of the disambiguation process.

Entity Disambiguation for Knowledge Base Population

2010

Abstract The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues such as non-uniform variations in entity names, mention ambiguity, and entities absent from a knowledge base. We present a state of the art system for entity disambiguation that not only addresses these challenges but also scales to knowledge bases with several million entries using very little resources.