HASOCOne@FIRE-HASOC2020: Using BERT and Multilingual BERT models for Hate Speech Detection (original) (raw)

KMI-Panlingua at HASOC 2019: SVM vs BERT for Hate Speech and Offensive Content Detection

2019

This paper presents KMI-Panlingua's system description which was submitted at the FIRE Shared Task 2019 on Hate Speech and Offensive Content Identification in Indo-European Languages. Our team submitted systems for all the 3 sub-tasks in two languages-English and Hindi. We experimented with 2 kinds of systems-classic machine learning using SVM and BERT-based system. We discuss the systems and their results in this paper.

Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model

ArXiv, 2021

The enormous amount of data being generated on the web and social media has increased the demand for detecting online hate speech. Detecting hate speech will reduce their neg-ative impact and influence on others. A lot of effort in the Natural Language Processing (NLP) domain aimed to detect hate speech in general or detect specific hate speech such as religion, race, gender, or sexual orientation. Hate communities tend to use abbreviations, intentional spelling mistakes, and coded words in their communication to evade detection, adding more challenges to hate speech detec-tion tasks. Thus, word representation will play an increasingly pivotal role in detecting hate speech. This paper investigates the feasibil-ity of leveraging domain-specific word embed-ding in Bidirectional LSTM based deep model to automatically detect/classify hate speech. Furthermore, we investigate the use of the transfer learning language model (BERT) on hate speech problem as a binary classification task. The...

PRHLT-UPV at SemEval-2020 Task 12: BERT for Multilingual Offensive Language Detection

2020

The present paper describes the system submitted by the PRHLT-UPV team for the task 12 of SemEval-2020: OffensEval 2020. The official title of the task is Multilingual Offensive Language Identification in Social Media, and aims to identify offensive language in texts. The languages included in the task are English, Arabic, Danish, Greek and Turkish. We propose a model based on the BERT architecture for the analysis of texts in English. The approach leverages knowledge within a pre-trained model and performs fine-tuning for the particular task. In the analysis of the other languages the Multilingual BERT is used, which has been pre-trained for a large number of languages. In the experiments, the proposed method for English texts is compared with other approaches to analyze the relevance of the architecture used. Furthermore, simple models for the other languages are evaluated to compare them with the proposed one. The experimental results show that the model based on BERT outperforms...