Fine grained classification of named entities (original) (raw)
Related papers
Assessing the challenge of fine-grained named entity recognition and classification
2010
Abstract Named Entity Recognition and Classification (NERC) is a well-studied NLP task typically focused on coarse-grained named entity (NE) classes. NERC for more fine-grained semantic NE classes has not been systematically studied. This paper quantifies the difficulty of fine-grained NERC (FG-NERC) when performed at large scale on the people domain. We apply unsupervised acquisition methods to construct a gold standard dataset for FG-NERC.
Inducing fine-grained semantic classes via hierarchical and collective classification
2010
Abstract Research in named entity recognition and mention detection has typically involved a fairly small number of semantic classes, which may not be adequate if semantic class information is intended to support natural language applications. Motivated by this observation, we examine the under-studied problem of semantic subtype induction, where the goal is to automatically determine which of a set of 92 fine-grained semantic classes a noun phrase belongs to.
Context-Dependent Fine-Grained Entity Type Tagging
2014
Entity type tagging is the task of assigning category labels to each mention of an entity in a document. While standard systems focus on a small set of types, recent work (Ling and Weld, 2012) suggests that using a large fine-grained label set can lead to dramatic improvements in downstream tasks. In the absence of labeled training data, existing fine-grained tagging systems obtain examples automatically, using resolved entities and their types extracted from a knowledge base. However, since the appropriate type often depends on context (e.g. Washington could be tagged either as city or government), this procedure can result in spurious labels, leading to poorer generalization.
Supervised Entity and Relation Extraction
We present a system for extracting entities and relations from documents: given a natural text document, identify and classify entities mentioned in the document (e.g. people, locations, etc.) and relations between these entities (e.g. person X lives in location Y). We designed separate systems for relation extraction given already-labeled entities, and for entity extraction from plain text, and then combined the two systems in a pipeline. We ran our system on a small set of sports articles and two larger sets containing biomedical and newswire articles. Both entity extraction and relation extraction are trained in a supervised manner using annotations in the datasets. For entity extraction these annotations allow us to train a conditional random field sequence classifier by matching annotated types to part of speech parse trees that are built from the text. For relation extraction we ran logistic regression using a set of syntactic and surface features of the sentence data. We eval...
SANE: System for Fine Grained Named Entity Typing on Textual Data
Proceedings of the 26th International Conference on World Wide Web Companion, 2017
Assignment of fine-grained types to named entities is gaining popularity as one of the major Information Extraction tasks due to its applications in several areas of Natural Language Processing. Existing systems use huge knowledge bases to improve the accuracy of the fine-grained types. We designed and developed SANE, a system that uses Wikipedia categories to fine grain the type of the named entities recognized in the textual data. The main contribution of this work is building a named entity typing system without the use of knowledge bases. Through our experiments, 1) we establish the usefulness of Wikipedia categories to Named Entity Typing and 2) we show that SANE's performance is on par with the state-of-the-art.
Fine-Grained Named Entity Recognition using ELMo and Wikidata
ArXiv, 2019
Fine-grained Named Entity Recognition is a task whereby we detect and classify entity mentions to a large set of types. These types can span diverse domains such as finance, healthcare, and politics. We observe that when the type set spans several domains the accuracy of the entity detection becomes a limitation for supervised learning models. The primary reason being the lack of datasets where entity boundaries are properly annotated, whilst covering a large spectrum of entity types. Furthermore, many named entity systems suffer when considering the categorization of fine grained entity types. Our work attempts to address these issues, in part, by combining state-of-the-art deep learning models (ELMo) with an expansive knowledge base (Wikidata). Using our framework, we cross-validate our model on the 112 fine-grained entity types based on the hierarchy given from the Wiki(gold) dataset.
Named entity recognition for question answering
2006
Current text-based question answering (QA) systems usually contain a named entity recogniser (NER) as a core component. Named entity recognition has traditionally been developed as a component for information extraction systems, and current techniques are focused on this end use. However, no formal assessment has been done on the characteristics of a NER within the task of question answering. In this paper we present a NER that aims at higher recall by allowing multiple entity labels to strings. The NER is embedded in a question answering system and the overall QA system performance is compared to that of one with a traditional variation of the NER that only allows single entity labels. It is shown that the added noise produced introduced by the additional labels is offset by the higher recall gained, therefore enabling the QA system to have a better chance to find the answer.
Fine-grained entity type classification using GRU with self-attention
International Journal of Information Technology, 2020
Natural language processing is an application of a computational technique that allows the machine to process human language. One of the primary tasks of NLP is information extraction that aims to capture important information from the text. Nowadays, the fast-growing web contains a large amount of textual information, requires a technique to extract relevant information. The entity recognition task is a type of information extraction that attempts to find and classify named entities appearing in the unstructured text document. The traditional coarsegrained entity recognition systems often define a less number of pre-defined named entity categories such as person, location, organization, and date. The fine-grained entity type classification model focused to classify the target entities into fine-grained types. Most of the recent works are accomplished with the help of Bidirectional LSTM with an attention mechanism. But due to the complex structure of bidirectional LSTM, these models consume an enormous amount of time for the training process. The existing attention mechanisms are incapable to pick up the correlation between the new word and the previous context. The proposed system resolves this issue by utilizing bidirectional GRU with the self-attention mechanism. The experiment result shows that the novel approach outperforms state-of-the-art methods.
SANE 2.0: System for fine grained named entity typing on textual data
Engineering Applications of Artificial Intelligence, 2019
Assignment of fine-grained types to named entities is gaining popularity as one of the major Information Extraction tasks due to its applications in several areas of Natural Language Processing. Existing systems use huge knowledge bases to improve the accuracy of the fine-grained types. We designed and developed SANE 2.0, which is an extended version of our earlier work SANE (Lal et al., 2017). It uses Wikipedia categories to fine grain the type of the named entities recognized in the textual data. The entities for which types could not be found using Wikipedia categories are typed using an intelligent information extraction method that uses search results of ℎ search engine. SANE uses an efficient algorithm to assign the fine-grained type to the entities extracted from the data. Wikipedia categorizes related topics under common headings. From these categories, we constructed a database that contains Wikipedia articles and their corresponding categories. SANE uses this database to predict the category types of named entities. We use Stanford NER to identify named entities with their coarse-grained types. For locations, we use Geonames data separately. We calculate the similarity between an entity and its categories using word2vec. Each entity is assigned to the category that has the highest similarity score with it. Finally, we map the category to the most appropriate WordNet (Miller et al., 1995) type. The main contribution of this work is building a named entity typing system without the use of knowledge bases. Through our experiments, 1) we establish the usefulness of Wikipedia categories to Named Entity Typing, 2) we present an intelligent method of using ℎ search results for Named Entity Typing and 3) we show that SANE's performance is on par with the state-of-the-art.
2010
Over the last 15 years the role of named entities became more and more important in natural language processing (NLP). Their information is crucial for tasks in information extraction like coreference resolution or relationship extraction. As recent systems mostly rely on machine learning techniques, their performance is based on the size and quality of given training data. This data is expensive and cumbersome to create because usually experts annotate corpora manually to achieve high quality data.