From Machine Learning to Deep Learning for Detecting Abusive Messages in Arabic Social Media: Survey and Challenges (original) (raw)

Deep Learning Approaches for Detecting Arabic Cyberbullying Social Media

Procedia Computer Science, 2024

The widespread use of social media has escalated concerns about cyberbullying. Traditional methods for detecting and managing cyberbullying struggle with the sheer volume of electronic text data. This has led to the exploration of deep learning as a potential solution. Researchers study focuses on implementing deep learning techniques to identify cyberbullying in Arabic social media, specifically targeting three prevalent forms of Arabic: dialectal, Modern Standard, and Classical. The collected data corpus was about 30, 0000 tweets. In this work, we first examined the sentiment analysis as cyberbullying, and No cyberbullying, then we further classified the cyberbullying by labelling the data under six different cyberbullying categories. We implemented deep learning models such as CNN, RNN, and a combination of CNN-RNN. The results that obtained from 2-classes classification showed a superiority of LSTM in terms of accuracy with 95.59%, while the best accuracy in the 6-classes classification gained from implementing CNN with 78.75%. Meanwhile the f1-score results were the highest in LSTM for the 2-lasses and 6-classes classifications with 96.73%, and 89%, respectively. These findings emphasize the potential for deep learning techniques to be applied in the development of automated systems for identifying and combating cyberbullying on social media and show how well they work in detecting cyberbullying.

AI ML NIT Patna at HASOC 2019: Deep Learning Approach for Identification of Abusive Content

2019

Social media is a globally open place for online users to express their thoughts and opinions. There are numerous advantages of social media but some severe challenges are also associated with it. Antisocial and abusive conduct has become more common due to the emergence of social media. Identification of Hate Speech, Cyber-aggression, and Offensive language is a very challenging task. The nature of structures of the natural language makes this task even more tedious. Being a challenging task, we are fascinated to propose a deep learning system based on Convolutional Neural Networks to identify Hate Speech, Offensive language, and Profanity. We have done experiments with three different embeddings. These experiments have been associated with comments of code-mixed Hindi-English and multi-domain social media text. We have found that One-hot embedding performed better than pre-trained fastText embedding for the code-mixed Hindi dataset.

Detecting Arabic Offensive Language in Microblogs Using Domain-Specific Word Embeddings and Deep Learning

Tehnički glasnik

In recent years, social media networks are emerging as a key player by providing platforms for opinions expression, communication, and content distribution. However, users often take advantage of perceived anonymity on social media platforms to share offensive or hateful content. Thus, offensive language has grown as a significant issue with the increase in online communication and the popularity of social media platforms. This problem has attracted significant attention for devising methods for detecting offensive content and preventing its spread on online social networks. Therefore, this paper aims to develop an effective Arabic offensive language detection model by employing deep learning and semantic and contextual features. This paper proposes a deep learning approach that utilizes the bidirectional long short-term memory (BiLSTM) model and domain-specific word embeddings extracted from an Arabic offensive dataset. The detection approach was evaluated on an Arabic dataset coll...

Arabic cyberbullying detecting using ensemble deep learning technique

The Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), 2023

There has been a huge growth in recent years interest in studies on abusive language and cyberbullying detection due to its effects on both individual victims and societies. Hate speech, bullying, racism, aggressive content, harassment and other forms of abuse have all significantly increased as a result of Facebook, Instagram, and other social media platforms (SMPs). Since there is a significant need to detect, control, and prohibit the circulation of offensive content on social networking sites, we undertook this study to automate the identification of abusive language or cyberbullying. Arabic data set is balanced and will be used in the offensive detection process. Recently, ensemble machine learning has been used to increase the effectiveness of categorization models. Arabic detection is more precise given that each spatial feature text can make references to every other contextual piece of information. The authors utilized a model that merged convolutional neural network (CNN) with bidirectional long short-term memory (Bi-LSTM) and inverse document frequency gated recurrent unit (GRU) in a hybrid fashion without any post-processing. Our work outperformed every other publicly released cutting-edge ensemble model in the specifications of the official deep learning challenge. The findings indicate that the three-layer inverse document frequency long short-term memory (LSTM) classifier surpassed other classifiers in accuracy with a score of 92.75% compared to different algorithms.

Deep Random Forest and AraBert for Hate Speech Detection from Arabic Tweets

JUCS - Journal of Universal Computer Science

Nowadays, hate speech detection from Arabic tweets attracts the attention of many researchers. Numerous systems and techniques have been proposed to address this classification challenge. Nonetheless, three major limits persist: the use of deep learning models with an excess of hyperparameters, the reliance on hand-crafted features, and the requirement for a huge amount of training data to achieve satisfactory performance. In this study, we propose Contextual Deep Random Forest (CDRF), a hate speech detection approach that combines contextual embedding and Deep Random Forest. From the experimental findings, the Arabic contextual embedding model proves to be highly effective in hate speech detection, outperforming the static embedding models. Additionally, we prove that the proposed CDRF significantly enhances the performance of Arabic hate speech classification.

Detecting Arabic textual threats in social media using artificial intelligence: An overview

Indonesian Journal of Electrical Engineering and Computer Science, 2022

Recent studies show that social media has become an integral part of everyone's daily routine. People often use it to convey their ideas, opinions, and critiques. Consequently, the increasing use of social media has motivated malicious users to misuse online social media anonymity. Thus, these users can exploit this advantage and engage in socially unacceptable behavior. The use of inappropriate language on social media is one of the greatest societal dangers that exist today. Therefore, there is a need to monitor and evaluate social media postings using automated methods and techniques. The majority of studies that deal with offensive language classification in texts have used English datasets. However, the enhancement of offensive language detection in Arabic has gotten less consideration. The Arabic language has different rules and structures. This article provides a thorough review of research studies that have made use of artificial intelligence (AI) for the identification of Arabic offensive language in various contexts.

DETECTING ABUSIVE AND INSULTING COMMENTS ON SOCIAL MEDIA

IJIEMR, 2022

Social media is becoming increasingly exposed to issues of harmful behaviour, such as private assaults and cyberbullying. Manually checking which comments need to be blocked is inconvenient and time-consuming. As a result, automating the process of recognizing and blocking abusive comments will not only save time but also ensure user safety. This paper focuses on employing machine learning and deep learning techniques to solve this challenge. The model is trained using the Twitter dataset. There are two types of comments: abusive and non-abusive. Our project is used to identify and regulate harmful social media remarks.

MACHINE LEARNING AND DEEP LEARNING TECHNIQUES FOR DETECTING ABUSIVE CONTENT ON TWITTER

IRJET, 2022

Cyber Abuse is the use of online systems and virtual devices to inflict harm of either psychological, emotional, sexual, racist, sexist or any negative adjective in nature. Sentiment Analysis is the interpretation and classification of text data by their polarity that is positive, negative, or neutral. This is done with the help of various analysis techniques and the usage of each technique differs from author to author. Sentiment Analysis is being used by major companies in the world so that they can identify what kind of emotions are being passed around the market by the people and thus, use these emotions and make changes to the company's systems in order to grow as a whole.

Abusive comment identification on Indonesian social media data using hybrid deep learning

IAES International Journal of Artificial Intelligence (IJ-AI)

Half of the entire social media users in Indonesia has experienced cyberbullying. Cyberbullying is one of the treatments received as an attack with abusive words. An abusive word is a word or phrase that contained harassment and is expressed be it spoken or in the form of text. This is a serious problem that must be controlled because the act has an impact on the victim's psychology and causes trauma resulting in depression. This study proposed to identify abusive comments from social media in Indonesian language using a deep learning approach. The architecture used is a hybrid model, a combination between recurrent neural network (RNN) and long short-term memory (LSTM). RNN can map the input sequences to fixed-size vectors on hidden vector components and LSTM implemented to overcome gradient vector growth components that have the potential to exist in RNN. The steps carried out include preprocessing, modelling, implementation, and evaluation. The dataset used is indonesian abus...

Automatic Detection of Cyberbullying and Abusive Language in Arabic Content on Social Networks: A Survey

Procedia Computer Science 189 (2021) 156–166, 2021

As a key player in today's world, online social networks are emerging, providing a platform for expression and content distribution. This technology enables users to communicate easily with each other and share their data instantly. However, the internet isn't generally protected; it can be a source for abusive and harmful content and causing harm to others. There is a great need for approaches and strategies to solve these issues due to the negative effect of abusive language and cyberbullying. Arabic text is known for its challenges, complexity, and scarcity of its resources. Many languages have made many efforts to find automated solutions for detecting abusive language and cyberbullying, but not much for the Arabic language. This work analyzes 27 studies on automatic Arabic abusive language and cyberbullying and its related detection approaches. The goal of this paper is to review the findings of the previous studies about cyberbullying and abusive detection in Arabic content on online social networks and help researcher in the future to develop automatic detection systems that are effective and realistic.