AILAB-Udine@SMM4H 22: Limits of Transformers and BERT Ensembles (original) (raw)

UC3M-PUCPR at SemEval-2022 Task 11: An Ensemble Method of Transformer-based Models for Complex Named Entity Recognition

Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This study introduces the system submitted to the SemEval 2022 Task 11: MultiCoNER (Multilingual Complex Named Entity Recognition) by the UC3M-PUCPR team. We proposed an ensemble of transformer-based models for entity recognition in cross-domain texts. Our deep learning method benefits from the transformer architecture, which adopts the attention mechanism to handle the long-range dependencies of the input text. Also, the ensemble approach for named entity recognition (NER) improved the results over baselines based on individual models on two of the three tracks we participated in. The ensemble model for the codemixed task achieves an overall performance of 76.36% F1-score, a 2.85 percentage point increase upon our individually best model for this task, XLM-RoBERTa-large (73.51%), outperforming the baseline provided for the shared task by 18.26 points. Our preliminary results suggest that contextualized language models ensembles can, even if modestly, improve the results in extracting information from unstructured data.

TMIANE: Design of an efficient Transformer-Based Model for Identification & Feature Analysis of Named Entities via Ensemble Operations

Deleted Journal, 2024

Named Entity Recognition (NER) is a critical task in natural language processing that involves identifying and categorizing named entities in text. These entities can include people, organizations, locations, and more. Extracting features from these entities is equally important as it enables downstream tasks like sentiment analysis, text classification, and more. In this research paper, we propose an efficient Transformer-based model for identifying named entities and analyzing their features through ensemble operations. Our proposed model leverages the power of Transformer models such as BERT and XLNet for identifying named entities. We then convert the identified entities into feature vector sets using a combination of BERT and XLNet. These features are classified using the GoogLeNet convolutional neural network for model validation operations. By combining these different models through ensemble operations, we aim to improve the accuracy, precision, recall, and delay of the model for different use cases. The need for such a model arises due to the limitations of existing models for named entity recognition and feature analysis. While these models have achieved significant success, they still suffer from low accuracy, precision, recall, and high delay. Our proposed model overcomes these limitations by using ensemble operations to combine the strengths of different models. We compare the performance of our proposed model with existing models on standard datasets and show that it outperforms these models in terms of accuracy, precision, recall, and delay. Our results demonstrate the potential of ensemble operations in developing efficient models for named entity recognition and feature analysis. Overall, this research paper contributes to the development of more accurate and efficient models for natural language processing tasks.

Overview of the Transformer-based Models for NLP Tasks

Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, 2020

In 2017, Vaswani et al. proposed a new neural network architecture named Transformer. That modern architecture quickly revolutionized the natural language processing world. Models like GPT and BERT relying on this Transformer architecture have fully outperformed the previous state-of-theart networks. It surpassed the earlier approaches by such a wide margin that all the recent cutting edge models seem to rely on these Transformer-based architectures. In this paper, we provide an overview and explanations of the latest models. We cover the auto-regressive models such as GPT, GPT-2 and XLNET, as well as the auto-encoder architecture such as BERT and a lot of post-BERT models like RoBERTa, ALBERT, ERNIE 1.0/2.0.

Domain Effect Investigation for Bert Models Fine-Tuned on Different Text Categorization Tasks

Arabian Journal for Science and Engineering, 2023

Text categorization (TC) is one of the most useful automatic tools in today's world to organize huge text data automatically. It is widely used by practitioners to classify texts automatically for different purposes, including sentiment analysis, authorship detection, spam detection, and so on. However, studying TC task for different fields can be challenging since it is required to train a separate model on a labeled and large data set specific to that field. This is very time-consuming, and creating a domain-specific large and labeled data is often very hard. In order to overcome this problem, language models are recently employed to transfer learned information from a large data to another downstream task. Bidirectional Encoder Representations from Transformer (BERT) is one of the most popular language models and has been shown to provide very good results for TC tasks. Hence, in this study, we use four pretrained BERT models trained on formal text data as well as our own BERT models trained on Facebook messages. We then fine-tuned BERT models on different downstream data sets collected from different domains such as Twitter, Instagram, and so on. We aim to investigate whether fine-tuned BERT models can provide satisfying results on different downstream tasks of different domains via transfer learning. The results of our extensive experiments show that BERT models provide very satisfying results and selecting both the BERT model and downstream tasks' data from the same or similar domain is akin to improve the performance in a further direction. This shows that a well-trained language model can remove the need for a separate training process for each different downstream TC task within the OSN domain.

Fair Evaluation in Concept Normalization: a Large-scale Comparative Analysis for BERT-based Models

Proceedings of the 28th International Conference on Computational Linguistics, 2020

Linking of biomedical entity mentions to various terminologies of chemicals, diseases, genes, adverse drug reactions is a challenging task, often requiring non-syntactic interpretation. A large number of biomedical corpora and state-of-the-art models have been introduced in the past five years. However, there are no general guidelines regarding the evaluation of models on these corpora in single-and cross-terminology settings. In this work, we perform a comparative evaluation of various benchmarks and study the efficiency of state-of-the-art neural architectures based on Bidirectional Encoder Representations from Transformers (BERT) for linking of three entity types across three domains: research abstracts, drug labels, and usergenerated texts on drug therapy in English. We have made the source code and results available at https://github.com/insilicomedicine/Fair-Evaluation-BERT.

Transformers: "The End of History" for NLP?

ArXiv, 2021

Recent advances in neural architectures, such as the Transformer, coupled with the emergence of large-scale pre-trained models such as BERT, have revolutionized the field of Natural Language Processing (NLP), pushing the state-of-the-art for a number of NLP tasks. A rich family of variations of these models has been proposed, such as RoBERTa, ALBERT, and XLNet, but fundamentally, they all remain limited in their ability to model certain kinds of information, and they cannot cope with certain information sources, which was easy for pre-existing models. Thus, here we aim to shed some light on some important theoretical limitations of pre-trained BERT-style models that are inherent in the general Transformer architecture. First, we demonstrate in practice on two general types of tasks—segmentation and segment labeling—and four datasets that these limitations are indeed harmful and that addressing them, even in some very simple and naı̈ve ways, can yield sizable improvements over vanill...

Hierarchical Transformers for Long Document Classification

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm. We extend its fine-tuning procedure to address one of its major limitations-applicability to inputs longer than a few hundred words, such as transcripts of human call conversations. Our method is conceptually simple. We segment the input into smaller chunks and feed each of them into the base model. Then, we propagate each output through a single recurrent layer, or another transformer, followed by a softmax activation. We obtain the final classification decision after the last segment has been consumed. We show that both BERT extensions are quick to fine-tune and converge after as little as 1 epoch of training on a small, domain-specific data set. We successfully apply them in three different tasks involving customer call satisfaction prediction and topic classification, and obtain a significant improvement over the baseline models in two of them.

LNLF-BERT: Transformer for Long Document Classification with Multiple Attention Levels

IEEE Access, 2024

Transformer-based models, such as BERT, cannot process long sequences because their self-attention operation scales quadratically with the sequence length. To remedy this, we introduce the LNLF-BERT with a two-level self-attention mechanism at the sentence and document levels, which can handle document classifications with thousands of tokens. The self-attention mechanism of LNLF-BERT retains some of the benefits of full self-attention at each level while reducing the complexity of not using full self-attention on the whole document. Our theoretical analysis shows that the LNLF-BERT mechanism is an approximator of the full self-attention model. We pretrain the LNLF-BERT from scratch and fine-tune it on downstream tasks. The experiments were also conducted to demonstrate the feasibility of LNLF-BERT in long text processing. Moreover, LNLF-BERT effectively balances local and global attention, allowing for efficient document-level understanding. Compared to other long-sequence models like Longformer and BigBird, LNLF-BERT shows competitive performance in both accuracy and computational efficiency. The architecture is scalable to various downstream tasks, making it adaptable for different applications in natural language processing.

BERT based Transformers lead the way in Extraction of Health Information from Social Media

Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

This paper describes our submissions for the Social Media Mining for Health (SMM4H) 2021 shared tasks. We participated in 2 tasks: (1) Classification, extraction and normalization of adverse drug effect (ADE) mentions in English tweets (Task-1) and (2) Classification of COVID-19 tweets containing symptoms (Task-6). Our approach for the first task uses the language representation model RoBERTa with a binary classification head. For the second task, we use BERTweet, based on RoBERTa. Fine-tuning is performed on the pre-trained models for both tasks. The models are placed on top of a custom domain-specific pre-processing pipeline. Our system ranked first among all the submissions for subtask-1(a) with an F1-score of 61%. For subtask-1(b), our system obtained an F1-score of 50% with improvements up to +8% F1 over the score averaged across all submissions. The BERTweet model achieved an F1 score of 94% on SMM4H 2021 Task-6.

CNN-Trans-Enc: A CNN-Enhanced Transformer-Encoder On Top Of Static BERT representations for Document Classification

Cornell University - arXiv, 2022

BERT achieves remarkable results in text classification tasks, it is yet not fully exploited, since only the last layer is used as a representation output for downstream classifiers. The most recent studies on the nature of linguistic features learned by BERT, suggest that different layers focus on different kinds of linguistic features. We propose a CNN-Enhanced Transformer-Encoder model which is trained on top of fixed BERT [CLS] representations from all layers, employing Convolutional Neural Networks to generate QKV feature maps inside the Transformer-Encoder, instead of linear projections of the input into the embedding space. CNN-Trans-Enc is relatively small as a downstream classifier and doesn't require any fine-tuning of BERT, as it ensures an optimal use of the [CLS] representations from all layers, leveraging different linguistic features with more meaningful, and generalizable QKV representations of the input. Using BERT with CNN-Trans-Enc keeps 98.9% and 94.8% of current state-of-the-art performance on the IMDB and SST-5 datasets respectably, while obtaining new state-of-the-art on YELP-5 with 82.23 (8.9% improvement), and on Amazon-Polarity with 0.98% (0.2% improvement) (K-fold Cross Validation on a 1M sample subset from both datasets). On the AG news dataset CNN-Trans-Enc achieves 99.94% of the current state-of-the-art, and achieves a new top performance with an average accuracy of 99.51% on DBPedia-14.