Modeling the Detection of Textual Cyberbullying (original) (raw)

Automatic detection of cyberbullying in social media text

PloS one, 2018

While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments o...

Automatic Detection and Prevention of Cyberbullying

The recent development of social media poses new challenges to the research community in analyzing online interactions between people. Social networking sites offer great opportunities for connecting with others, but also increase the vulnerability of young people to undesirable phenomena, such as cybervictimization. Recent research reports that on average, 20% to 40% of all teenagers have been victimized online. In this paper, we focus on cyberbullying as a particular form of cybervictimization. Successful prevention depends on the adequate detection of potentially harmful messages. However, given the massive information overload on the Web, there is a need for intelligent systems to identify potential risks automatically. We present the construction and annotation of a corpus of Dutch social media posts annotated with fine-grained cyberbullying-related text categories, such as insults and threats. Also, the specific participants (harasser, victim or bystander) in a cyberbullying conversation are identified to enhance the analysis of human interactions involving cyberbullying. Apart from describing our dataset construction and annotation, we present proof-of-concept experiments on the automatic identification of cyberbullying events and fine-grained cyberbullying categories.

Cyber Bullying Intensity and Category Prediction of Tweets

International Journal of Advances in Engineering Architecture Science and Technology, 2023

Background: Cyberbullying is a growing problem on social media platforms, and it can have detrimental effects on the mental health and wellbeing of individuals. Objectives: In this paper, we propose a machine learning model for predicting the intensity and category of cyberbullying in tweets using an SVM model and the VADER sentiment analysis tool. Methods / Statistical Analysis: We collected a dataset of 47,000 tweets containing examples of cyberbullying in different categories like age, ethnicity, gender, religion, other cyberbullying, and non-cyberbullying and pre-processed the data using text normalisation, tokenization, stop-word removal, lemmatization, and stemming. We then trained the SVM model with this data to predict the category of the tweets using the VADER sentiment analysis tool to predict the intensity of cyberbullying in the tweets. Findings: The model achieved an accuracy of 83% in predicting the intensity of cyberbullying in tweets and an accuracy of 75% in predicting the category of cyberbullying in tweets. This model has the potential to be useful in identifying and preventing cyberbullying on social media platforms, promoting a safer and healthier online environment. Applications / Improvements: Future research can be done to further improve the accuracy and performance of the model by incorporating additional features or using more advanced NLP techniques. The main objective of the prediction of cyberbullying intensity and category of tweets is to identify and categorise harmful online behaviours and assess their severity.

Cyberbullying Detection: A Comparative Study of Classification Algorithms

International Journal of Computer Science and Mobile Computing (IJCSMC), 2024

In the realm of social media, cyberbullying's pervasive impact raises urgent concerns about its emotional and psychological toll on victims. This study addresses the imperative of effectively detecting cyberbullying. By leveraging ML and DL techniques, we aim to develop reliable methods that accurately identify instances of cyberbullying in social media data, enhancing detection efficiency and accuracy. This facilitates timely intervention and support for affected individuals. In this comprehensive analysis of existing systems, various ML and DL models are extensively texted for cyberbullying detection. The evaluated models include Random Forest, XgBoost, Naive Bayes, SVM, CNN, RNN, and BERT. Pre-processed datasets are utilized to train and evaluate the models. To evaluate the ability of each model to reliably identify cyberbullying in social media data, performance metrics such as F1 score, recall, precision, and accuracy are used. The findings of this study demonstrate the efficacy of different ML and DL models in monitoring cyberbullying in social media data. Among the models evaluated, the BERT model exhibits exceptional performance, achieving the highest accuracy rates of 88.8% for binary classification and 86.6% for multiclass classification.

Cyberbullying Detection - Technical Report 2/2018, Department of Computer Science AGH, University of Science and Technology

ArXiv, 2018

The research described in this paper concerns automatic cyberbullying detection in social media. There are two goals to achieve: building a gold standard cyberbullying detection dataset and measuring the performance of the Samurai cyberbullying detection system. The Formspring dataset provided in a Kaggle competition was re-annotated as a part of the research. The annotation procedure is described in detail and, unlike many other recent data annotation initiatives, does not use Mechanical Turk for finding people willing to perform the annotation. The new annotation compared to the old one seems to be more coherent since all tested cyberbullying detection system performed better on the former. The performance of the Samurai system is compared with 5 commercial systems and one well-known machine learning algorithm, used for classifying textual content, namely Fasttext. It turns out that Samurai scores the best in all measures (accuracy, precision and recall), while Fasttext is the sec...

Using Machine Learning to Detect Cyberbullying

Cyberbullying is the use of technology as a medium to bully someone. Although it has been an issue for many years, the recognition of its impact on young people has recently increased. Social networking sites provide a fertile medium for bullies, and teens and young adults who use these sites are vulnerable to attacks. Through machine learning, we can detect language patterns used by bullies and their victims, and develop rules to automatically detect cyberbullying content.

Combining textual features to detect cyberbullying in social media posts

2020

Cyberbullying has become prevalent in social media communication. To create a safe space for cyber communication, an effective cyberbullying detection method is needed. This study focuses on using combination of textual features to detect cyberbullying across social media platforms. Lexicon enhanced rule-based method was applied to detect cyberbullying on Facebook comments. The resulting algorithm was evaluated using performance measures of accuracy, precision, recall, and F1 Score, and showed promising performance with average recall of 95.981%.

Detection of Cyberbullying on Social Media using Machine Learning

IRJET, 2022

With the rise of the Internet, the usage of social media has increased tremendously, and it has become the most influential networking platform in the twenty-first century. However, increasing social connectivity frequently causes problems. Negative societal effects that add to a handful of disastrous outcomes online harassment, cyberbullying, and other phenomena Online trolling and cybercrime Frequently, cyberbullying leads to severe mental and physical distress, especially in women and children, forcing them to try suicide on occasion. Because of its harmful impact, online abuse attracts attention. Impact on society Many occurrences have occurred recently all across the world. Internet harassment, such as sharing private messages, spreading rumors, etc., and Sexual comments As a result, the detection of bullying texts or messages on social media has grown in popularity. The data we used for our work were collected from the website kaggle.com, which contains a high percentage of bullying content. Electronic databases like Eric, ProQuest, and Google Scholar were used as the data sources. In this work, an approach to detect cyberbullying using machine learning techniques. We evaluated our model on two classifiers SVM and Neural Network, and we used TF-IDF and sentiment analysis algorithms for features extraction. This achieved 92.8% accuracy using Neural Network with 3-grams and 90.3% accuracy using SVM with 4-grams while using TF-IDF and sentiment analysis.

An Approach to Detect Cyberbullying on Social Media

Model and Data Engineering, 2021

Detecting Cyberbullying is still an important issue. Existing approaches often rely on advanced techniques including machine learning and Natural Language Processing algorithms. In this paper, we propose an ontology and classifiers-based approach to detect cyberbullying cases in the context of social media. We propose a cyberbullying ontology in terms of cyberbullying categories and representative terms vocabulary. This ontology is used to build and annotate the toxicity of our training dataset extracted from different data sources. Various unit classifiers are used including messages toxicity detection, gender classifier, age estimation, and personality estimation. Outputs of these classifiers can be combined to intercept contents that could be cyberbullying cases.

Automatic Classification of Abusive Language and Personal Attacks in Various Forms of Online Communication

Language Technologies for the Challenges of the Digital Age, 2018

The sheer ease with which abusive and hateful utterances can be made online-typically from the comfort of your home and the lack of any immediate negative repercussions-using today's digital communication technologies (especially social media), is responsible for their significant increase and global ubiquity. Natural Language Processing technologies can help in addressing the negative effects of this development. In this contribution we evaluate a set of classification algorithms on two types of user-generated online content (tweets and Wikipedia Talk comments) in two languages (English and German). The different sets of data we work on were classified towards aspects such as racism, sexism, hatespeech, aggression and personal attacks. While acknowledging issues with inter-annotator agreement for classification tasks using these labels, the focus of this paper is on classifying the data according to the annotated characteristics using several text classification algorithms. For some classification tasks we are able to reach f-scores of up to 81.58.