A little goes a long way: Improving toxic language classification despite data scarcity (original) (raw)
Related papers
Can We Achieve More with Less? Exploring Data Augmentation for Toxic Comment Classification
ArXiv, 2020
This paper tackles one of the greatest limitations in Machine Learning: Data Scarcity. Specifically, we explore whether high accuracy classifiers can be built from small datasets, utilizing a combination of data augmentation techniques and machine learning algorithms. In this paper, we experiment with Easy Data Augmentation (EDA) and Backtranslation, as well as with three popular learning algorithms, Logistic Regression, Support Vector Machine (SVM), and Bidirectional Long Short-Term Memory Network (Bi-LSTM). For our experimentation, we utilize the Wikipedia Toxic Comments dataset so that in the process of exploring the benefits of data augmentation, we can develop a model to detect and classify toxic speech in comments to help fight back against cyberbullying and online harassment. Ultimately, we found that data augmentation techniques can be used to significantly boost the performance of classifiers and are an excellent strategy to combat lack of data in NLP problems.
ArXiv, 2021
Toxicity is pervasive in social media and poses a major threat to the health of online communities. The recent introduction of pre-trained language models, which have achieved state-of-the-art results in many NLP tasks, has transformed the way in which we approach natural language processing. However, the inherent nature of pre-training means that they are unlikely to capture task-specific statistical information or learn domain-specific knowledge. Additionally, most implementations of these models typically do not employ conditional random fields, a method for simultaneous token classification. We show that these modifications can improve model performance on the Toxic Spans Detection task at SemEval-2021 to achieve a score within 4 percentage points of the top performing team.
Challenges in Automated Debiasing for Toxic Language Detection
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021
Warning: this paper contains content that may be offensive or upsetting. Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection. Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English). Our comprehensive experiments establish that existing methods are limited in their ability to prevent biased behavior in current toxicity detectors. We then propose an automatic, dialectaware data correction method, as a proof-ofconcept. Despite the use of synthetic labels, this method reduces dialectal associations with toxicity. Overall, our findings show that debiasing a model trained on biased toxic language data is not as effective as simply relabeling the data to remove existing biases.
2023
In this paper, we propose a methodology for task 10 of SemEval23, focusing on detecting and classifying online sexism in social media posts. The task is tackling a serious issue, as detecting harmful content on social media platforms is crucial for mitigating the harm of these posts on users. Our solution for this task is based on an ensemble of finetuned transformer-based models (BERTweet, RoBERTa, and DeBERTa). To alleviate problems related to class imbalance, and to improve the generalization capability of our model, we also experiment with data augmentation and semi-supervised learning. In particular, for data augmentation, we use back-translation, either on all classes, or on the underrepresented classes only. We analyze the impact of these strategies on the overall performance of the pipeline through extensive experiments. while for semi-supervised learning, we found that with a substantial amount of unlabelled, indomain data available, semi-supervised learning can enhance the performance of certain models. Our proposed method (for which the source code is available on Github 12) attains an F 1-score of 0.8613 for sub-taskA, which ranked us 10th in the competition.
A Comparative Study of Using Pre-trained Language Models for Toxic Comment Classification
Companion Proceedings of the Web Conference 2021, 2021
As user-generated contents thrive, so does the spread of toxic comment. Therefore, detecting toxic comment becomes an active research area, and it is often handled as a text classification task. As recent popular methods for text classification tasks, pre-trained language model-based methods are at the forefront of natural language processing, achieving state-of-the-art performance on various NLP tasks. However, there is a paucity in studies using such methods on toxic comment classification. In this work, we study how to best make use of pre-trained language model-based methods for toxic comment classification and the performances of different pretrained language models on these tasks. Our results show that, Out of the three most popular language models, i.e. BERT, RoBERTa, and XLM, BERT and RoBERTa generally outperform XLM on toxic comment classification. We also prove that using a basic linear downstream structure outperforms complex ones such as CNN and BiLSTM. What is more, we find that further fine-tuning a pretrained language model with light hyper-parameter settings brings improvements to the downstream toxic comment classification task, especially when the task has a relatively small dataset. CCS CONCEPTS • Computing methodologies → Neural networks; • Social and professional topics → User characteristics.
International Journal of Engineering Research and Technology (IJERT), 2021
https://www.ijert.org/toxic-speech-classification-via-deep-learning-using-combined-features-from-bert-fasttext-embedding https://www.ijert.org/research/toxic-speech-classification-via-deep-learning-using-combined-features-from-bert-fasttext-embedding-IJERTCONV9IS07016.pdf With the growing internet usage rate, people are more likely to express their opinion or ideas openly on social media. A lot of discussion platforms are available nowadays. But some are misused the freedom of speech by spreading online toxic speech. The hate speech that is intended not just to insult or mock, but to harass and cause lasting pain by attacking something uniquely dear to the target. Thus, the necessary of automatically detecting and removing toxic speech in social media is very important. We proposed a feature-based method that combining the features of TF-IDF, FastText Embedding and BERT Embedding and by using a DNN classifier. We compare the individual features of these three methods with the combined features as a performance analysis.
Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020
Social media platforms, online news commenting spaces, and many other public forums have become widely known for issues of abusive behavior such as cyber-bullying and personal attacks. In this paper, we use the annotated tweets of Offensive Language Identification Dataset (OLID) to train three levels of deep learning classifiers to solve the three sub-tasks associated with the dataset. Sub-task A is to determine if the tweet is toxic or not. Then, for offensive tweets, sub-task B requires determining whether the toxicity is targeted. Finally, for sub-task C, we predict the target of the offense; i.e. a group, individual or other entity. In our solution, we tackle the problem of class imbalance in the dataset by using back translation for data augmentation and utilizing fine-tuned BERT model in an ensemble of deep learning classifiers. We used this solution to participate in the three English sub-tasks of SemEval-2020 task 12. The proposed solution achieved 0.91393, 0.6300 and 0.57607 macro F1-average in sub-tasks A, B and C respectively. We achieved the 8th, 14th and 21st places for sub-tasks A, B and C respectively.
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Fine-tuning of pre-trained transformer networks such as BERT yield state-of-the-art results for text classification tasks. Typically, fine-tuning is performed on task-specific training datasets in a supervised manner. One can also fine-tune in unsupervised manner beforehand by further pretraining the masked language modeling (MLM) task. Hereby, in-domain data for unsupervised MLM resembling the actual classification target dataset allows for domain adaptation of the model. In this paper, we compare current pre-trained transformer networks with and without MLM fine-tuning on their performance for offensive language detection. Our MLM fine-tuned RoBERTa-based classifier officially ranks 1st in the SemEval 2020 Shared Task 12 for the English language. Further experiments with the ALBERT model even surpass this result.
HateBERT: Retraining BERT for Abusive Language Detection in English
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), 2021
We introduce HateBERT, a retrained BERT model for abusive language detection in English. The model was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we have curated and made available to the public. We present the results of a detailed comparison between a general pre-trained language model and the retrained version on three English datasets for offensive, abusive language and hate speech detection tasks. In all datasets, HateBERT outperforms the corresponding general BERT model. We also discuss a battery of experiments comparing the portability of the fine-tuned models across the datasets, suggesting that portability is affected by compatibility of the annotated phenomena.
Transformer-based models have demonstrated much success in various natural language processing (NLP) tasks. However, they are often vulnerable to adversarial attacks, such as data poisoning, that can intentionally fool the model into generating incorrect results. In this paper, we present a novel, compound variant of a data poisoning attack on a transformer-based model that maximizes the poisoning effect while minimizing the scope of poisoning. We do so by combining the established data poisoning technique (label flipping) with a novel adversarial artifact selection and insertion technique aimed at minimizing detectability and the scope of the poisoning footprint. We find that using a combination of these two techniques, we achieve a state-of-the-art attack success rate (ASR) of ~90% while poisoning only 0.5% of the original training set, thus minimizing the scope and detectability of the poisoning action.