Automatic detection of cyberbullying in social media text (original) (raw)

Automatic Detection and Prevention of Cyberbullying

The recent development of social media poses new challenges to the research community in analyzing online interactions between people. Social networking sites offer great opportunities for connecting with others, but also increase the vulnerability of young people to undesirable phenomena, such as cybervictimization. Recent research reports that on average, 20% to 40% of all teenagers have been victimized online. In this paper, we focus on cyberbullying as a particular form of cybervictimization. Successful prevention depends on the adequate detection of potentially harmful messages. However, given the massive information overload on the Web, there is a need for intelligent systems to identify potential risks automatically. We present the construction and annotation of a corpus of Dutch social media posts annotated with fine-grained cyberbullying-related text categories, such as insults and threats. Also, the specific participants (harasser, victim or bystander) in a cyberbullying conversation are identified to enhance the analysis of human interactions involving cyberbullying. Apart from describing our dataset construction and annotation, we present proof-of-concept experiments on the automatic identification of cyberbullying events and fine-grained cyberbullying categories.

Social Media Cyberbullying Detection using Machine Learning

International Journal of Advanced Computer Science and Applications

With the exponential increase of social media users, cyberbullying has been emerged as a form of bullying through electronic messages. Social networks provides a rich environment for bullies to uses these networks as vulnerable to attacks against victims. Given the consequences of cyberbullying on victims, it is necessary to find suitable actions to detect and prevent it. Machine learning can be helpful to detect language patterns of the bullies and hence can generate a model to automatically detect cyberbullying actions. This paper proposes a supervised machine learning approach for detecting and preventing cyberbullying. Several classifiers are used to train and recognize bullying actions. The evaluation of the proposed approach on cyberbullying dataset shows that Neural Network performs better and achieves accuracy of 92.8% and SVM achieves 90.3. Also, NN outperforms other classifiers of similar work on the same dataset.

Cyber Bullying Detection on Social Media using Machine Learning

ITM Web of Conferences, 2021

Usage of internet and social media backgrounds tends in the use of sending, receiving and posting of negative, harmful, false or mean content about another individual which thus means Cyberbullying. Bullying over social media also works the same as threatening, calumny, and chastising the individual. Cyberbullying has led to a severe increase in mental health problems, especially among the young generation. It has resulted in lower self-esteem, increased suicidal ideation. Unless some measure against cyberbullying is taken, self-esteem and mental health issues will affect an entire generation of young adults. Many of the traditional machine learning models have been implemented in the past for the automatic detection of cyberbullying on social media. But these models have not considered all the necessary features that can be used to identify or classify a statement or post as bullying. In this paper, we proposed a model based on various features that should be considered while detec...

Cyberbullying Detection: A Comparative Study of Classification Algorithms

International Journal of Computer Science and Mobile Computing (IJCSMC), 2024

In the realm of social media, cyberbullying's pervasive impact raises urgent concerns about its emotional and psychological toll on victims. This study addresses the imperative of effectively detecting cyberbullying. By leveraging ML and DL techniques, we aim to develop reliable methods that accurately identify instances of cyberbullying in social media data, enhancing detection efficiency and accuracy. This facilitates timely intervention and support for affected individuals. In this comprehensive analysis of existing systems, various ML and DL models are extensively texted for cyberbullying detection. The evaluated models include Random Forest, XgBoost, Naive Bayes, SVM, CNN, RNN, and BERT. Pre-processed datasets are utilized to train and evaluate the models. To evaluate the ability of each model to reliably identify cyberbullying in social media data, performance metrics such as F1 score, recall, precision, and accuracy are used. The findings of this study demonstrate the efficacy of different ML and DL models in monitoring cyberbullying in social media data. Among the models evaluated, the BERT model exhibits exceptional performance, achieving the highest accuracy rates of 88.8% for binary classification and 86.6% for multiclass classification.

Detection of Cyber Bullying on Social Media

International Journal of Advanced Research in Science, Communication and Technology

This study focuses on the pressing issue of cyberbullying on the internet, which negatively impacts both teenagers and adults, sometimes leading to tragic consequences such as suicide and depression. In order to address this problem, there is a growing need to establish regulations regarding content on social media platforms. To tackle this issue, the study aims to utilize data from two distinct forms of cyberbullying: hate speech tweets from Twitter and comments based on specific attacks from Wikipedia forums. The primary objective is to develop an effective model using Natural Language Processing and Machine Learning techniques to identify cyberbullying in textual data. The study explores three different approaches for feature extraction and evaluates the performance of four classifiers to determine the most effective method. The results of the study reveal that the developed model achieves an impressive accuracy of over 90% when applied to tweet data and over 80% when applied to ...

Detection of Cyberbullying on Social Media using Machine Learning

IRJET, 2022

With the rise of the Internet, the usage of social media has increased tremendously, and it has become the most influential networking platform in the twenty-first century. However, increasing social connectivity frequently causes problems. Negative societal effects that add to a handful of disastrous outcomes online harassment, cyberbullying, and other phenomena Online trolling and cybercrime Frequently, cyberbullying leads to severe mental and physical distress, especially in women and children, forcing them to try suicide on occasion. Because of its harmful impact, online abuse attracts attention. Impact on society Many occurrences have occurred recently all across the world. Internet harassment, such as sharing private messages, spreading rumors, etc., and Sexual comments As a result, the detection of bullying texts or messages on social media has grown in popularity. The data we used for our work were collected from the website kaggle.com, which contains a high percentage of bullying content. Electronic databases like Eric, ProQuest, and Google Scholar were used as the data sources. In this work, an approach to detect cyberbullying using machine learning techniques. We evaluated our model on two classifiers SVM and Neural Network, and we used TF-IDF and sentiment analysis algorithms for features extraction. This achieved 92.8% accuracy using Neural Network with 3-grams and 90.3% accuracy using SVM with 4-grams while using TF-IDF and sentiment analysis.

IRJET- INTERNATIONAL RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY (IRJET) Automated Detection of Cyberbullying Using Machine Learning

IRJET, 2020

Increasing the use of Internet and facilitating access to online communities such as social media have led to the emergence of cybercrime. Cyberbullying is very common now a days. which have no tracking like it may harm any individual, business, society, country in past few days it seems that riots were happened due to some statement used by one community on another its important to identify such content which spreads hate or harm community text processing, NLP (natural language processing) is an emerging field with the help of NLP and machine learning algorithms such as naive bayes, random forest, SVM we are going to identify cyberbullying in twitter. Objectives of this implementation written in objective section. Image character with the help of OCR will be done by us to find image-based cyberbullying the impact on individual basis thus will be checked on dummy system. Machine learning and natural language processing techniques to identify the characteristics of a cyberbullying exchange and automatically detect cyberbullying by matching textual data to the identified traits. On the basis of our extensive literature review, we categorise existing approaches into 4 main classes, namely supervised learning, lexicon-based, rule-based, and mixed-initiative approaches. Supervised learning-based approaches typically use classifiers such as SVM and Naïve Bayes to develop predictive models for cyberbullying detection. Index Terms-cyber bullying, natural language processing, machine learning algorithms, Social networking.

Modeling the Detection of Textual Cyberbullying

The scourge of cyberbullying has assumed alarming proportions with an ever-increasing number of adolescents admitting to having dealt with it either as a victim or as a bystander. Anonymity and the lack of meaningful supervision in the electronic medium are two factors that have exacerbated this social menace. Comments or posts involving sensitive topics that are personal to an individual are more likely to be internalized by a victim, often resulting in tragic outcomes. We decompose the overall detection problem into detection of sensitive topics, lending itself into text classification sub-problems. We experiment with a corpus of 4500 YouTube comments, applying a range of binary and multiclass classifiers. We find that binary classifiers for individual labels outperform multiclass classifiers. Our findings show that the detection of textual cyberbullying can be tackled by building individual topic-sensitive classifiers.

Cyberbullying Detection using Natural Language Processing

International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022

Around the world, the use of the Internet and social media has increased exponentially, and they have become an integral part of daily life. It allows people to share their thoughts, feelings, and ideas with their loved ones through the Internet and social media. But with social networking sites becoming more popular, cyberbullying is on the rise. Using technology as a medium to bully someone is known as Cyberbullying. The Internet can be a source of abusive and harmful content and cause harm to others. Social networking sites provide a great medium for harassment, bullies, and youngsters who use these sites are vulnerable to attacks. Bullying can have long-term effects on adolescents' ability to socialize and build lasting friendships Victims of cyberbullying often feel humiliated. social media users often can hide their identity, which helps misuse the available features. The use of offensive language has become one of the most popular issues on social networking. Text containing any form of abusive conduct that displays acts intended to hurt others is offensive language. Cyberbullying frequently leads to serious mental and physical distress, particularly for women and children, and sometimes forces them to commit suicide. The purpose of this project is to develop a technique that is effective to detect and avoid cyberbullying on social networking sites we are using Natural Language Processing and other machine learning algorithms. The dataset that we used for this project was collected from Kaggle, it contains data from Twitter that is then labeled to train the algorithm. Several classifiers are used to train and recognize bullying actions. The evaluation of the proposed Model for cyberbullying dataset shows that Logistic Regression performs better and achieves good accuracy than SVM, Ransom forest, Naive-Bayes, and Xgboost algorithm.

Cyberbullying Detection - Technical Report 2/2018, Department of Computer Science AGH, University of Science and Technology

ArXiv, 2018

The research described in this paper concerns automatic cyberbullying detection in social media. There are two goals to achieve: building a gold standard cyberbullying detection dataset and measuring the performance of the Samurai cyberbullying detection system. The Formspring dataset provided in a Kaggle competition was re-annotated as a part of the research. The annotation procedure is described in detail and, unlike many other recent data annotation initiatives, does not use Mechanical Turk for finding people willing to perform the annotation. The new annotation compared to the old one seems to be more coherent since all tested cyberbullying detection system performed better on the former. The performance of the Samurai system is compared with 5 commercial systems and one well-known machine learning algorithm, used for classifying textual content, namely Fasttext. It turns out that Samurai scores the best in all measures (accuracy, precision and recall), while Fasttext is the sec...