Detection of harassment on web 2.0 (original) (raw)
Related papers
2009
Web 2.0 has led to the development and evolution of web-based communities and applications. These communities provide places for information sharing and collaboration. They also open the door for inappropriate online activities, such as harassment, in which some users post messages in a virtual community that are intentionally offensive to other members of the community. It is a new and challenging task to detect online harassment; currently few systems attempt to solve this problem.
Machine learning approaches to detect online harassment using bag of words
AIP Conference Proceedings, 2023
The time people spend online has been increasing dramatically recently, and people can become anonymous when posting, sharing their own opinion and participating in online chats. Because of this, more and more people are sexually harassed online on various social media, especially with children. This paper aims to detect sexual harassment early on using three types of machine learning (SVM, Logistic regression, XGBoost) with a bag of words to represent the text. This paper deal with two kinds of data sets Chats sexual predators (CSP) and comments sexual harassment (CSH). In the logistic regression for the datasets (CSH, CSP), the accuracy obtained was 98.44%, and 93.71%, while the accuracy of XGBoost was 96.57%, and 90.65%, respectively. XGBoost is used to avoid overfitting and shows promising results in both data sets.
Technology Solutions to Combat Online Harassment
Proceedings of the First Workshop on Abusive Language Online
This work is part of a new initiative to use machine learning to identify online harassment in social media and comment streams. Online harassment goes underreported due to the reliance on humans to identify and report harassment, reporting that is further slowed by requirements to fill out forms providing context. In addition, the time for moderators to respond and apply human judgment can take days, but response times in terms of minutes are needed in the online context. Though some of the major social media companies have been doing proprietary work in automating the detection of harassment, there are few tools available for use by the public. In addition, the amount of labeled online harassment data and availability of cross platform online harassment datasets is limited. We present the methodology used to create a harassment dataset and classifier and the dataset used to help the system learn what harassment looks like.
A Large Human-Labeled Corpus for Online Harassment Research
2017
A fundamental part of conducting cross-disciplinary web science research is having useful, high-quality datasets that provide value to studies across disciplines. In this paper, we introduce a large, handcoded corpus of online harassment data. A team of researchers collaboratively developed a codebook using grounded theory and labeled 35,000 tweets. Our resulting dataset has roughly 15% positive harassment examples and 85% negative examples. This data is useful for training machine learning models, identifying textual and linguistic features of online harassment, and for studying the nature of harassing comments and the culture of trolling.
A Large Labeled Corpus for Online Harassment Research
Proceedings of the 2017 ACM on Web Science Conference - WebSci '17, 2017
A fundamental part of conducting cross-disciplinary web science research is having useful, high-quality datasets that provide value to studies across disciplines. In this paper, we introduce a large, handcoded corpus of online harassment data. A team of researchers collaboratively developed a codebook using grounded theory and labeled 35,000 tweets. Our resulting dataset has roughly 15% positive harassment examples and 85% negative examples. This data is useful for training machine learning models, identifying textual and linguistic features of online harassment, and for studying the nature of harassing comments and the culture of trolling.
IRJET- Real-Time Cyberbullying Analysis on Social Media Using Machine Learning And Text Mining
IRJET, 2020
People now-a-days are fond of using internet technology. As internet technology had been increasing more and more. This technology led to many legal and illegal activities. It is found that much first-hand news has been discussed in Internet forums well before they are reported in traditional mass media. This communication channel provides an effective channel for illegal activities such as dissemination of copyrighted movies, threatening messages and online gambling etc. The law enforcement agencies are looking for solutions to monitor these discussion forums for possible criminal activities and download suspected postings as evidence for investigation. We propose a system which will tackle with this problem. In this project we had used a machine learning algorithm to detect criminal activities and abusive conversation. Comments containing abusive words effect psychology of teens and demoralize them. We encounter this with analysing the words that may psychologically affect an individual using Machine Learning Algorithm (ANN) Artificial Neural Network. The predicted bad/abused words are displayed as the censored content. With censoring the abusive/bad words we prevent innocent victims from depressing activities.
Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximization
We propose a novel method to detect cyberbullying entries on the Internet. “Cyberbullying” is defined as humiliating and slandering behavior towards other people through Internet services, such as BBS, Twitter or e-mails. In Japan members of Parent-Teacher Association (PTA) perform manual Web site monitoring called “net-patrol” to stop such activities. Unfortunately, reading through the whole Web manually is an uphill task. We propose a method of automatic detection of cyberbullying entries. In the proposed method we first use seed words from three categories to calculate semantic orientation score PMI-IR and then maximize the relevance of categories. In the experiment we checked the cases where the test data contains 50% (laboratory condition) and 12% (real world condition) of cyberbullying entries. In both cases the proposed method outperformed baseline settings.
International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022
Cyber bullying is nothing but bullying with the use of digital technology. Cyber bullying can take place on social media platforms, messaging platforms, gaming platforms and mobile phones etc. Cyber bullying is a big issue that is encountered by the individual on internet that affects teenagers as well as adults. It has lead to nuisances like suicide, mental health problems and depression. Therefore regulation of content on Social media platforms has become a growing need. Also detecting Cyber bullying detection at early stages can help to alleviate impacts on the victims. In the following research we make use of data from two different data sets. The first one is tweets from Twitter and second one is comments based on personal attacks from Wikipedia forums. The approach is to build a model based on detection of cyber bullying in text data using Natural Language Processing and Machine learning. We try to build a model which will provide accuracy up to 90% for tweets and accuracy up to 80% for Wikipedia forums. Cyber bullying detection will be done as a binary classification problem where we are detecting two major form of cyber bullying: hate speech on Twitter and Personal attacks on Wikipedia and classifying them as containing Cyber bullying or not. The end result of this proposed work may reduce negative impacts on the victims as a result of early detection.
Cyber Bullying Text Detection Using Machine Learning
International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022
With the upward thrust of the Internet, using social media has exploded, and it's emerged because the most powerful networking platform of the ordinal century. However, exaggerated social networking oftentimes has negative consequences for society, causative to some unwanted phenomena at the side of online abuse, harassment, cyberbullying, cybercrime, and trolling. Cyberbullying causes severe mental and physical distress in several people, particularly ladies and children, and might even cause suicide damaging social impact of online harassment attracts attention. Several incidences of online harassment, equivalent to sharing personal chats, spreading rumours, and creating sexual remarks, have recently occurred everywhere on the planet. As a result, specialists are paying nearer interest to detect bullying the big texts or messages on social media. By combining natural language processing and machine learning the aim of this observation is to create and construct a powerful method for detecting online abusive and bullying texts. The accuracy stage of six different machine learning techniques is evaluated the usage by of extraordinary features, particularly the count vectorizer.
Cyberbullying Detection using Natural Language Processing
International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022
Around the world, the use of the Internet and social media has increased exponentially, and they have become an integral part of daily life. It allows people to share their thoughts, feelings, and ideas with their loved ones through the Internet and social media. But with social networking sites becoming more popular, cyberbullying is on the rise. Using technology as a medium to bully someone is known as Cyberbullying. The Internet can be a source of abusive and harmful content and cause harm to others. Social networking sites provide a great medium for harassment, bullies, and youngsters who use these sites are vulnerable to attacks. Bullying can have long-term effects on adolescents' ability to socialize and build lasting friendships Victims of cyberbullying often feel humiliated. social media users often can hide their identity, which helps misuse the available features. The use of offensive language has become one of the most popular issues on social networking. Text containing any form of abusive conduct that displays acts intended to hurt others is offensive language. Cyberbullying frequently leads to serious mental and physical distress, particularly for women and children, and sometimes forces them to commit suicide. The purpose of this project is to develop a technique that is effective to detect and avoid cyberbullying on social networking sites we are using Natural Language Processing and other machine learning algorithms. The dataset that we used for this project was collected from Kaggle, it contains data from Twitter that is then labeled to train the algorithm. Several classifiers are used to train and recognize bullying actions. The evaluation of the proposed Model for cyberbullying dataset shows that Logistic Regression performs better and achieves good accuracy than SVM, Ransom forest, Naive-Bayes, and Xgboost algorithm.