Combining data-driven systems for improving Named Entity Recognition (original) (raw)

Evaluating and Combining Name Entity Recognition Systems

Proceedings of the Sixth Named Entity Workshop, 2016

Name entity recognition (NER) is an important subtask in natural language processing. Various NER systems have been developed in the last decade. They may target for different domains, employ different methodologies, work on different languages, detect different types of entities, and support different inputs and output formats. These conditions make it difficult for a user to select the right NER tools for a specific task. Motivated by the need of NER tools in our research work, we select several publicly available and well-established NER tools to validate their outputs against both Wikipedia gold standard corpus and a small set of manually annotated documents. All the evaluations show consistent results on the selected tools. Finally, we constructed a hybrid NER tool by combining the best performing tools for the domains of our interest.

Advances In Name Entity Recognition: Exploring State-Of-The-Art Methods

VII. INTERNATIONAL HALICH CONGRESS ON MULTIDISCIPLINARY SCIENTIFIC RESEARCH , 2024

Named Entity Recognition (NER) is all about deciphering and categorizing named entities in open-domain text. Recently, it has been grabbing considerable interest because of its demonstrated capability to enhance the performance of numerous Natural Language Processing (NLP) applications in various areas, such as translation, detecting colloquial and annoying emails, summarizing specific documents, and interacting with others through responses or discussions, and others, making text comprehension more human-friendly. This review article aims to provide a thorough summary and analysis of recent research papers and developments in NER within NLP. The authors meticulously review and clarify the contributions of critical papers published in the last five years, offering insights into their methodologies and developments. Furthermore, a comparative analysis in tabular form highlights critical aspects such as dataset characteristics, accuracy metrics, models used, and other relevant features in these papers. This paper has delved into the current cutting-edge NER, exploring the latest challenges and limitations faced in this field. Moreover, the authors discuss the tools employed in NER, shedding light on their significance in shaping the landscape of this dynamic and evolving research domain. This comprehensive review is a valuable resource for NLP practitioners, researchers, and enthusiasts, providing a nuanced understanding of the recent trends, contributions, and difficulties in NER.

A Concise Review of Named Entity Recognition System: Methods and Features

IOP Conference Series: Materials Science and Engineering

Named Entity Recognition (NER) is an elementary tool for all application areas in Natural Language Processing (NLP) such as Automatic Summarization, Information Extraction, Information Retrieval, Text Mining, Machine Translation, Question Answering, and Genetics. NER is a task to discover and categorises the named entities ('atomic elements') in the text into predefined classes such as the names of persons, organizations, locations, terminologies of time, quantity and etc. Different languages may have different morphologies and thus involve dissimilar NER procedures. For example, an Arabic NER system cannot be practically used in processing Malay texts due to the different morphological features. The morphological features of every language are rich and complex and donates to the difficulties of implementing an actual method to develop the accurate NER system. In this paper, we review on three main techniques that commonly used to develop an NER system well-known as Rule-Based, Machine Learning, and Hybrid approach. This paper also highlights the features of each technique.

Applying Machine Learning for High-Performance Named-Entity Extraction

Computational Intelligence, 2000

This paper describes a machine learning approach to build an efficient, accurate and fast name spotting system. Finding names in free text is an important task in addressing real-world textbased applications. Most previous approaches have been based on carefully hand-crafted modules encoding linguistic knowledge specific to the language and document genre. Such approaches have two drawbacks: they require large amounts of time and linguistic expertise to develop, and they are not easily portable to new languages and genres. This paper describes an extensible system which automatically combines weak evidence for name extraction. This evidence is gathered from easily available sources: part-of-speech tagging, dictionary lookups, and textual information such as capitalization and punctuation. Individually, each piece of evidence is insufficient for robust name detection. However, the combination of evidence, through standard machine learning techniques, yields a system that achieves performance equivalent to the best existing hand-crafted approaches.

Named Entity Recognition Through Corpus Transformation and System Combination

Lecture Notes in Computer Science, 2004

In this paper we investigate the way of combining different taggers to improve their performance in the named entity recognition task. The main resources used in our experiments are the publicly available taggers TnT and TBL and a corpus of Spanish texts in which named entities occurrences are tagged with BIO tags. We have defined three transformations that provide us three additional versions of the training corpus. The transformations change either the words or the tags, and the three of them improve the results of TnT and TBL when they are trained with the original version of the corpus. With the four versions of the corpus and the two taggers, we have eight different models that can be combined with several techniques. The experiments carried out show that using machine learning techniques to combine them the performance improves considerably. We improve the baselines for TnT (F β=1 value of 85.25) and TBL (F β=1 value of 87.45) up to a value of 90.90 in the best of our experiments.

Learn - Filter - Apply - Forget. Mixed Approaches to Named Entity Recognition

2001

We have explored and implemented different approaches to named entity recognition in German, a difficult task in this language since both regular nouns and proper names are capitalized. Our goal is to identify and recognise person names, geographical names and company names in a computer magazine corpus. Our geographical name classifier works with precompiled lists but our company name classifier learns the names from the corpus. For the recognition of person names we work with a precompiled list of first names and the program learns the last names. For this classifier we suggest setting an activation value for the last name and subsequently depriming the value until ÒforgettingÓ the name. Our evaluation results show that our mixed approaches are as good as the recall and precision values reported for English. It is shown that a carefully tuned cascade of name classifiers can even distinguish between different interpretations of a name token within the same document.

A STUDY ON THE APPROACHES OF DEVELOPING A NAMED ENTITY RECOGNITION TOOL

Named entity recognition (NER) is of vital importance in information extraction in natural language processing. Identifying the named entities in a piece of text and classifying them with proper tagging can help in getting a lot of information engraved in the particular text. The following paper presents brief details about the various approaches in developing a NER. Also an overview of the various models and learning methodologies used for the statistical approach is also provided. The various factors that need to be considered in developing this tool are also stated.

Named Entity Recognition using Machine Learning Methods and Pattern-Selection Rules

Proceedings of the Sixth Natural …, 2001

Named Entity recognition, as a task of providing important semantic information, is a critical first step in Information Extraction and Question-Answering system. This paper proposes a hybrid method of the named entity recognition which combines maximum entropy model, neural network, and pattern-selection rules. The maximum entropy model is used for the proper treatment of unknown words, and neural network for disambiguation. The patternselection rules are used for the target word selection and for grouping of adjacent words. We use the data only from a training corpus and a domainindependent named entity dictionary so that our system, it is predicted, is applicable in any other domain. In addition, since each module of our system is independent, a new method can be easily adopted for executing each module. 1

Application of named entity recognition method for Indonesian datasets: a review

Bulletin of Electrical Engineering and Informatics, 2023

A name entity (NE) is a proper name that designates a person, location, or organization. For humans, named entity recognition (NER) is a straightforward process insofar as many named entities are self-names, and most of them have initial capital letters and can be easily recognized, but it is very difficult for machines. This study discusses research trends in the application of NER to Indonesian datasets, particularly as it concerns certain tasks, datasets, methods/techniques, and entity labels. By conducting a systematic literature review (SLR) and bibliometric analysis with VOSviewer, this article hopes to provide opportunities for adopting old methods, combining models from previous research, and even proposing new methods. In addition, the motivation for doing SLR at NER is to look for new strategies in the supervision of financial technology (Fintech). If machines can find illegal Fintech entities on social media and online news, it can help the government to block these illegal Fintech entities. To this end, this study provides an overview of research trends in applying the NER method to Bahasa Indonesia (Indonesian) datasets, including the extraction of news articles, the monitoring of floods, and traffic.