Performance of Text Classification Methods in Detection of Hate Speech in Media (original) (raw)
Related papers
BIMA JOURNAL OF SCIENCE AND TECHNOLOGY (2536-6041)
Hate speech on online social networks is a general problem across social media platforms that has the potential of causing physical harm to the society. The growing number of hateful comments on the Internet and the rate at which tweets and posts are published per second on social media make it a challenging task to manually identify and remove the hateful commentsfrom such posts. Although numerous publications have proposed machine learning approaches to detect hate speech and other antisocial online behaviours without concentrating on blocking the hate speech from being published on social media. Similarly, prior publications on deep learning and multi-platform approaches did not work on the topic of detecting hate speech in Englishlanguage comments on Twitter and Facebook. This paper proposed a deep learning approach based on a hybrid of convolutional neural network (CNN) and long short-term memory (LSTM) with pre-trained GloVe words embedding to automatically detect and block ha...
Comparative Analysis of Deep Learning Techniques for the Classification of Hate Speech
Nigerian Annals of Pure and Applied Science, 2021
Social media provides opportunities for individuals to anonymously communicate and express hateful feelings and opinions at the comfort of their rooms. This anonymity has become a shield for many individuals or groups who use social media to express deep hatred for other individuals or groups, tribes or race, religion, gender, as well as belief systems. In this study, a comparative analysis is performed using Long Short-Term Memory and Convolutional Neural Network deep learning techniques for Hate Speech classification. This analysis demonstrates that the Long Short-Term Memory classifier achieved an accuracy of 92.47%, while the Convolutional Neural Network classifier achieved an accuracy of 92.74%. These results showed that deep learning techniques can effectively classify hate speech from normal speech.
Investigating Deep Learning Approaches for Hate Speech Detection in Social Media
ArXiv, 2020
The phenomenal growth on the internet has helped in empowering individual's expressions, but the misuse of freedom of expression has also led to the increase of various cyber crimes and anti-social activities. Hate speech is one such issue that needs to be addressed very seriously as otherwise, this could pose threats to the integrity of the social fabrics. In this paper, we proposed deep learning approaches utilizing various embeddings for detecting various types of hate speeches in social media. Detecting hate speech from a large volume of text, especially tweets which contains limited contextual information also poses several practical challenges. Moreover, the varieties in user-generated data and the presence of various forms of hate speech makes it very challenging to identify the degree and intention of the message. Our experiments on three publicly available datasets of different domains shows a significant improvement in accuracy and F1-score.
Empowering hate speech detection: leveraging linguistic richness and deep learning
Bulletin of Electrical Engineering and Informatics
Social media has become a vital part of most modern human personal life. Twitter is one of the social media that was formed from the development of communication technology. A lot of social media gives users the freedom to express themselves. This facility is misused by users, so hate speech is spread. Designing a system to detect hate speech intelligently is needed. This study uses the hybrid deep learning (HDL) and solo deep learning (SDL) approach with the convolutional neural networks (CNN) and bidirectional gated recurrent unit (Bi-GRU) algorithm. There are 4 models built, namely CNN, Bi-GRU, CNN+Bi-GRU, and Bi-GRU+CNN. Term frequency-inverse document frequency (TF-IDF) is used for feature extraction, which is to get linguistic features to be analyzed and studied. FastText is used to perform feature expansion to minimize mismatched vocabulary. Four scenarios are run. CNN with an accuracy of 87.63%, Bi-GRU produces an accuracy of 87.46%, CNN+Bi-GRU provides an accuracy of 87.47% and Bi-GRU+CNN provides an accuracy of 87.34%. The ability of this approach to understand the context is qualified. HDL outperforms SDL in terms of n-gram type, where HDL can understand sentences broken down by hybrid n-gram types, namely Unigram-Bigram-Trigram which is a complex n-gram hybrid.
A Survey on Multilingual Hate Speech Detection and Classification by Machine Learning Techniques
2021
It is critical to identify hate speech on social media. The Spread of uncontrolled hate can damage society, marginalized people, or groups. Social media plays a significant role in hate speech spreading online. Due to paralinguistic posts (e.g., emotions, hashtags) in social media, it is difficult to detect automatically in which it contains plenty of poorly written text. That provides automatic results for hate speech detection, several pieces of research were introduced ranges from simple to complex deep neural networks models. This paper proposes background detection of hate speech. Furthermore, antisocial behaviour topics and recent contributions of hate speech are reviewed. Finally, issues and recommendations of hate speech detections are explained in detail.
Hate Speech and Offensive Content Identification: LSTM Based Deep Learning Approach @ HASOC 2020
2020
The use of hate speech and offensive words is growing around the world. It includes the way of expression in vocal or written form that attacks an individual or a community based on their caste, religion, gender, ethnic groups, physical appearance, etc. The popular social media like Twitter, Facebook, WhatsApp. Print media and visual media are being exploited as a platform for hate speech and offensive and increasingly found in the web. It is a serious matter for a healthy democracy, social stability, and peace. As a consequence, the social media platforms are trying to identify such content in the post for their preventing measure. FIRE 2020 organizes a track aiming to develop a system that will identify hate speech and offensive content in the document. In our system we (CONCORDIA_CIT_TEAM) have used the Long Short Term Memory (LSTM) for automatic hate speech and offensive content identification. Experimental results demonstrate that LSTM can successfully identify hate speech and ...
Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network
In recent years, the increasing propagation of hate speech on social media and the urgent need for effective countermeasures have drawn significant investment from governments, companies, and empirical research. Despite a large number of emerging scientific studies to address the problem, a major limitation of existing work is the lack of comparative evaluations, which makes it difficult to assess the contribution of individual works. This paper introduces a new method based on a deep neural network combining convolutional and gated recurrent networks. We conduct an extensive evaluation of the method against several baselines and state of the art on the largest collection of publicly available Twitter datasets to date, and show that compared to previously reported results on these datasets, our proposed method is able to capture both word sequence and order information in short texts, and it sets new benchmark by outperforming on 6 out of 7 datasets by between 1 and 13 percents in F1. We also extend the existing dataset collection on this task by creating a new dataset covering different topics.
A Multilingual Evaluation for Online Hate Speech Detection
ACM Transactions on Internet Technology, 2020
The increasing popularity of social media platforms such as Twitter and Facebook has led to a rise in the presence of hate and aggressive speech on these platforms. Despite the number of approaches recently proposed in the Natural Language Processing research area for detecting these forms of abusive language, the issue of identifying hate speech at scale is still an unsolved problem. In this article, we propose a robust neural architecture that is shown to perform in a satisfactory way across different languages; namely, English, Italian, and German. We address an extensive analysis of the obtained experimental results over the three languages to gain a better understanding of the contribution of the different components employed in the system, both from the architecture point of view (i.e., Long Short Term Memory, Gated Recurrent Unit, and bidirectional Long Short Term Memory) and from the feature selection point of view (i.e., ngrams, social network–specific features, emotion lex...
DEEP at HASOC2019: A Machine Learning Framework for Hate Speech and Offensive Language Detection
2019
In this paper, we describe the system submitted by our team for Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) shared task held at FIRE 2019. Hate speech and offensive language detection have become an important task due to the overwhelming usage of social media platforms in our daily life. This task has been applied for three languages namely, English, Germany and Hindi. The proposed model uses classical machine learning approaches to create classifiers that are used to classify the given post according to different subtasks.
Deep Learning Models for Multilingual Hate Speech Detection
ECML-PKDD, 2020
Hate speech detection is a challenging problem with most of the datasets available in only one language: English. In this paper, we conduct a large scale analysis of multilingual hate speech in 9 languages from 16 different sources. We observe that in low resource setting, simple models such as LASER embedding with logistic regression performs the best, while in high resource setting BERT based models perform better. In case of zero-shot classification, languages such as Italian and Portuguese achieve good results. Our proposed framework could be used as an efficient solution for low-resource languages. These models could also act as good baselines for future multilingual hate speech detection tasks. We have made our code and experimental settings public [https://github.com/punyajoy/DE-LIMIT\] for other researchers.