A Stacked, Voted, Stacked Model for Named Entity Recognition (original) (raw)

Learn - Filter - Apply - Forget. Mixed Approaches to Named Entity Recognition

2001

We have explored and implemented different approaches to named entity recognition in German, a difficult task in this language since both regular nouns and proper names are capitalized. Our goal is to identify and recognise person names, geographical names and company names in a computer magazine corpus. Our geographical name classifier works with precompiled lists but our company name classifier learns the names from the corpus. For the recognition of person names we work with a precompiled list of first names and the program learns the last names. For this classifier we suggest setting an activation value for the last name and subsequently depriming the value until ÒforgettingÓ the name. Our evaluation results show that our mixed approaches are as good as the recall and precision values reported for English. It is shown that a carefully tuned cascade of name classifiers can even distinguish between different interpretations of a name token within the same document.

TLR at BSNLP2019: A Multilingual Named Entity Recognition System

Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing

This paper presents our participation at the shared task on multilingual named entity recognition at BSNLP2019. Our strategy is based on a standard neural architecture for sequence labeling. In particular, we use a mixed model which combines multilingualcontextual and language-specific embeddings. Our only submitted run is based on a voting schema using multiple models, one for each of the four languages of the task (Bulgarian, Czech, Polish, and Russian) and another for English. Results for named entity recognition are encouraging for all languages, varying from 60% to 83% in terms of Strict and Relaxed metrics, respectively.

Few-Shot Named Entity Recognition: An Empirical Baseline Study

2021

This paper presents an empirical study to efficiently build named entity recognition (NER) systems when a small amount of in-domain labeled data is available. Based upon recent Transformer-based self-supervised pre-trained language models (PLMs), we investigate three orthogonal schemes to improve model generalization ability in few-shot settings: (1) metalearning to construct prototypes for different entity types, (2) task-specific supervised pretraining on noisy web data to extract entityrelated representations and (3) self-training to leverage unlabeled in-domain data. On 10 public NER datasets, we perform extensive empirical comparisons over the proposed schemes and their combinations with various proportions of labeled data, our experiments show that (i) in the few-shot learning setting, the proposed NER schemes significantly improve or outperform the commonly used baseline, a PLM-based linear classifier fine-tuned using domain labels. (ii) We create new state-of-theart results on both few-shot and training-free settings compared with existing methods.

UC3M-PUCPR at SemEval-2022 Task 11: An Ensemble Method of Transformer-based Models for Complex Named Entity Recognition

Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This study introduces the system submitted to the SemEval 2022 Task 11: MultiCoNER (Multilingual Complex Named Entity Recognition) by the UC3M-PUCPR team. We proposed an ensemble of transformer-based models for entity recognition in cross-domain texts. Our deep learning method benefits from the transformer architecture, which adopts the attention mechanism to handle the long-range dependencies of the input text. Also, the ensemble approach for named entity recognition (NER) improved the results over baselines based on individual models on two of the three tracks we participated in. The ensemble model for the codemixed task achieves an overall performance of 76.36% F1-score, a 2.85 percentage point increase upon our individually best model for this task, XLM-RoBERTa-large (73.51%), outperforming the baseline provided for the shared task by 18.26 points. Our preliminary results suggest that contextualized language models ensembles can, even if modestly, improve the results in extracting information from unstructured data.

GermEval 2014 Named Entity Recognition Shared Task: Companion Paper

This paper describes the GermEval 2014 Named Entity Recognition (NER) Shared Task workshop at KONVENS. It provides background information on the motivation of this task, the data-set, the evaluation method, and an overview of the participating systems, followed by a discussion of their results. In contrast to previous NER tasks, the GermEval 2014 edition uses an extended tagset to account for derivatives of names and tokens that contain name parts. Further, nested named entities had to be predicted, i.e. names that contain other names. The eleven participating teams employed a wide range of techniques in their systems. The most successful systems used state-of-theart machine learning methods, combined with some knowledge-based features in hybrid systems.

Dutch named entity recognition using ensemble classifiers

2010

abstract This paper explores the use of classifier ensembles for the task of named entity recognition (NER) on a Dutch dataset. Classifiers from 3 classification frameworks, namely memory-based learning (MBL), conditional random fields (CRF) and support vector machines (SVM), were trained on 8 different feature sets to create a pool of classifiers from which an ensemble could be built.

A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

Most state-of-the-art models for named entity recognition (NER) rely on the availability of large amounts of labeled data, making them challenging to extend to new, lowerresourced languages. However, there are now several proposed approaches involving either cross-lingual transfer learning, which learns from other highly resourced languages, or active learning, which efficiently selects effective training data based on model predictions. This paper poses the question: given this recent progress, and limited human annotation, what is the most effective method for efficiently creating high-quality entity recognizers in under-resourced languages? Based on extensive experimentation using both simulated and real human annotation, we find a dualstrategy approach best, starting with a crosslingual transferred model, then performing targeted annotation of only uncertain entity spans in the target language, minimizing annotator effort. Results demonstrate that cross-lingual transfer is a powerful tool when very little data can be annotated, but an entity-targeted annotation strategy can achieve competitive accuracy quickly, with just one-tenth of training data. The code is publicly available here. 1

Named entity recognition through redundancy driven classifiers

Proc. of Evalita, 2009

Abstract. We present Typhoon, a classifier combination system for Named Entity Recognition (NER), in which two different classifiers are combined to exploit Data Redundancy and Patterns extracted from a large text corpus. Data Redundancy is attained when the same entity occurs in different places in documents, whereas Patterns are 2-grams, 3-grams, 4-grams and 5-grams preceding, and following entities in documents. The system consists of two classifiers in cascade, but it is possible to use a single classifier making the system ...

Applying Machine Learning for High-Performance Named-Entity Extraction

Computational Intelligence, 2000

This paper describes a machine learning approach to build an efficient, accurate and fast name spotting system. Finding names in free text is an important task in addressing real-world textbased applications. Most previous approaches have been based on carefully hand-crafted modules encoding linguistic knowledge specific to the language and document genre. Such approaches have two drawbacks: they require large amounts of time and linguistic expertise to develop, and they are not easily portable to new languages and genres. This paper describes an extensible system which automatically combines weak evidence for name extraction. This evidence is gathered from easily available sources: part-of-speech tagging, dictionary lookups, and textual information such as capitalization and punctuation. Individually, each piece of evidence is insufficient for robust name detection. However, the combination of evidence, through standard machine learning techniques, yields a system that achieves performance equivalent to the best existing hand-crafted approaches.

Advances In Name Entity Recognition: Exploring State-Of-The-Art Methods

VII. INTERNATIONAL HALICH CONGRESS ON MULTIDISCIPLINARY SCIENTIFIC RESEARCH , 2024

Named Entity Recognition (NER) is all about deciphering and categorizing named entities in open-domain text. Recently, it has been grabbing considerable interest because of its demonstrated capability to enhance the performance of numerous Natural Language Processing (NLP) applications in various areas, such as translation, detecting colloquial and annoying emails, summarizing specific documents, and interacting with others through responses or discussions, and others, making text comprehension more human-friendly. This review article aims to provide a thorough summary and analysis of recent research papers and developments in NER within NLP. The authors meticulously review and clarify the contributions of critical papers published in the last five years, offering insights into their methodologies and developments. Furthermore, a comparative analysis in tabular form highlights critical aspects such as dataset characteristics, accuracy metrics, models used, and other relevant features in these papers. This paper has delved into the current cutting-edge NER, exploring the latest challenges and limitations faced in this field. Moreover, the authors discuss the tools employed in NER, shedding light on their significance in shaping the landscape of this dynamic and evolving research domain. This comprehensive review is a valuable resource for NLP practitioners, researchers, and enthusiasts, providing a nuanced understanding of the recent trends, contributions, and difficulties in NER.