User Content – Analysis and Classification (original) (raw)
Related papers
International Journal of Computer Science and Engineering, 2024
To an extent, trolls or abusive users tend to penetrate the online community and ruin the potential healthy interactions that members and users can have; they over-engage members in the virtual space. In this regard, our work aims to develop models for the automatic detection and classification of toxic comments. The study is divided into four stages or executed in four steps. The first step is data preparation, which is done in stages; the data is loaded and preprocessed. The second step comprises Exploratory Data Analysis (EDA), where we seek to describe the toxic labels in the data and how they vary. The text is then standardized using text preprocessing techniques such as lower casing and punctuation removal before model training. For the model training tasks, logistic regression and Naive Bayes models are used to label each category of the toxicity classifier. It was observed that more than 96% of accuracy is achieved across varied categories: 96.9% of toxic comments, 97.2% of severe toxicity, 97.7% of obscenity, 98.9% of threats, 97.1% of insults, and 96.9% of identity hate. The models were very robust; the whole work took only 2 minutes and 58.24 seconds, which is an indication of its effectiveness and scalability.
Detection and Classification of Online Toxic Comments
2021
In the current century, social media has created many job opportunities and has become a unique place for people to freely express their opinions. But as every coin has two sides, the good and the bad, along with the pros, social media has many cons. Among these users, a few bunches of users are taking advantage of this system and are misusing this opportunity to express their toxic mindset (i.e., insulting, verbal sexual harassment, foul behavior, etc.). And hence cyberbullying has become a major problem. If we can filter out the hurtful, toxic words expressed on social media platforms like Twitter, Instagram, and Facebook, the online world will become a safer and more harmonious place. We gained initial ideas by researching current toxic comments classifiers to come up with this design. We then took what we found and made the most user-friendly product possible. For this project, we created a Toxic Comments Classifier which will classify the comments depending on the category of t...
IJERT-Multilabel Toxic Comment Detection and Classification
International Journal of Engineering Research and Technology (IJERT), 2021
https://www.ijert.org/multilabel-toxic-comment-detection-and-classification https://www.ijert.org/research/multilabel-toxic-comment-detection-and-classification-IJERTV10IS050012.pdf Toxic comments refers to hatred online comments classified as disrespectful or abusive towards individual or community. With a boom of internet, lot of users are brought to online social discussion platforms. These platforms are created to exchange ideas, learning new things and have meaningful conversations. But due to toxic comments many users are not able to put their points in online discussions. This degrades quality of discussion. In this paper we will check the toxicity of comment. And if the comment is toxic then classify the comments into different categories to examine the type of toxicity. We will utilize different machine learning and deep learning algorithms on our dataset and select the best algorithms based on our evaluation methodology. Moving forward we seek to attain high performance through our machine learning and deep learning models which will help in limiting the toxicity present on various discussion sites.
An Intense Study of Machine Learning Research Approach to Identify Toxic Comments
International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2022
A large number of online public domain comments are usually constructive, but a significant proportion is toxic. The comments include several errors that allow the machine-learning algorithm to train the data set by processing dataset with numerous variety of tasks, in the method of conversion of raw comments previously feeding it to Classification models using a ML method. In this study, we have proposed classification of toxic comments using a ML approach on a multilinguistic toxic comment dataset. The logistic regression method is applied to classify processed dataset, which will distinguish toxic comments from non-toxic comments. The multi-headed model comprises toxicity (obscene, insult, severe toxic, threat, & identity-hate) or Nontoxicity Estimation. We have implemented four models (LSTM, GRU RNN, and BiLSTM) and detected the toxic comments. In Python 3, all models have a simple structure that can adapt to the resolution of other tasks. The classification problem resolution findings are presented with the aid of the proposed models. It has been concluded that all models solve the challenge effectively, but the BiLSTM is the most effective to ensure the best practicable accuracy.
Toxic Comment Classification Using Neural Networks and Machine Learning
International Advanced Research Journal in Science, Engineering and Technology, 2018
: A cornucopia of data is developed through conversations, interactions of humans online. This scenario has contributed considerably well to the quality of human life but it also involves prodigious dangers as online text communications with high toxicity quality cause individual assaults, online provocation and harassing practices. This has activated both industrial and research network over the most recent couple of years while there are a few attempts to distinguish a proficient model for online toxic comment classification and prediction. Be that as it may, these means are still in their earliest stages and new methodologies and structures are required. On parallel, the information blast that shows up always, makes the development of new machine learning computational apparatuses for overseeing this data, a basic need. Gratefully progresses in big data management, hardware and cloud computing administration permit the advancement of Deep Learning approaches showing up exceptionally encouraging execution up until now. Recently the use of Convolutional Neural Networks and Recurrent Neural Networks have been approached for computational purposes for the text classification systems. In this work, we utilize this way to deal with finding toxic comments, remarks in an extensive pool of records given by a current Kaggle's competition with respect to Wikipedia's talk page edits which has divided the level of toxicity into 6 labels: toxicity, severe toxicity, obscenity, threat, insult or identity hate.
International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022
Conversational toxicity is a problem that might drive people to cease truly expressing themselves and seeking out other people's opinions out of fear of being attacked or harassed. The purpose of this research is to employ natural language processing (NLP) techniques to detect toxicity in writing, which might be used to alert people before transmitting potentially toxic informational messages. Natural language processing (NLP) is a part of machine learning that enables computers to comprehend natural language. Understanding, analysing, manipulating, and maybe producing human data language with the help of a machine are all possibilities. Natural Language Processing (NLP) is a type of artificial intelligence that allows machines to understand and interpret human language instead of simply reading it. Machines can understand written or spoken text and execute tasks such as speech recognition, sentiment analysis, text classification, and automatic text summarization using natural language processing (NLP). I.
A Semi-Supervised Approach to Detect Toxic Comments
Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications
Toxic comments contain forms of nonacceptable language targeted towards groups or individuals. These types of comments become a serious concern for government organizations, online communities, and social media platforms. Although there are some approaches to handle non-acceptable language, most of them focus on supervised learning and the English language. In this paper, we deal with toxic comment detection as a semi-supervised strategy over a heterogeneous graph. We evaluate the approach on a toxic dataset of the Portuguese language, outperforming several graph-based methods and achieving competitive results compared to transformer architectures.
Can We Achieve More with Less? Exploring Data Augmentation for Toxic Comment Classification
ArXiv, 2020
This paper tackles one of the greatest limitations in Machine Learning: Data Scarcity. Specifically, we explore whether high accuracy classifiers can be built from small datasets, utilizing a combination of data augmentation techniques and machine learning algorithms. In this paper, we experiment with Easy Data Augmentation (EDA) and Backtranslation, as well as with three popular learning algorithms, Logistic Regression, Support Vector Machine (SVM), and Bidirectional Long Short-Term Memory Network (Bi-LSTM). For our experimentation, we utilize the Wikipedia Toxic Comments dataset so that in the process of exploring the benefits of data augmentation, we can develop a model to detect and classify toxic speech in comments to help fight back against cyberbullying and online harassment. Ultimately, we found that data augmentation techniques can be used to significantly boost the performance of classifiers and are an excellent strategy to combat lack of data in NLP problems.
IJERT-Audio and Video Toxic Comments Detection and Classification
International Journal of Engineering Research and Technology (IJERT), 2020
https://www.ijert.org/audio-and-video-toxic-comments-detection-and-classification https://www.ijert.org/research/audio-and-video-toxic-comments-detection-and-classification-IJERTV9IS120099.pdf Social networking and online conversation platforms provide us with the power to share our views and ideas. However, nowadays on social media platforms, many people are taking these platforms for granted, they see it as an opportunity to harass and target others leading to cyber-attack and cyber-bullying which lead to traumatic experiences and suicidal attempts in extreme cases. Manually identifying and classifying such comments is a very long, tiresome and unreliable process. To solve this challenge, we have developed a deep learning system which will identify such negative content on online discussion platforms and successfully classify them into proper labels. Our proposed model aims to apply the text-based Convolution Neural Network (CNN) with word embedding, using fastText word embedding technique. fastText has shown efficient and more accurate results compared to Word2Vec and GLOVE model. Our model aims to improve detecting different types of toxicity to improve the social media experience. Our model classifies such comments in six classes which are Toxic, Severe Toxic, Obscene, Threat, Insult and Identity-hate. Multi-Label Classification helps us to provide an automated solution for dealing with the toxic comments problem we are facing.
Convolutional Neural Networks for Toxic Comment Classification
Proceedings of the 10th Hellenic Conference on Artificial Intelligence
Flood of information is produced in a daily basis through the global internet usage arising from the online interactive communications among users. While this situation contributes significantly to the quality of human life, unfortunately it involves enormous dangers, since online texts with high toxicity can cause personal attacks, online harassment and bullying behaviors. This has triggered both industrial and research community in the last few years while there are several tries to identify an efficient model for online toxic comment prediction. However, these steps are still in their infancy and new approaches and frameworks are required. On parallel, the data explosion that appears constantly, makes the construction of new machine learning computational tools for managing this information, an imperative need. Thankfully advances in hardware, cloud computing and big data management allow the development of Deep Learning approaches appearing very promising performance so far. For text classification in particular the use of Convolutional Neural Networks (CNN) have recently been proposed approaching text analytics in a modern manner emphasizing in the structure of words in a document. In this work, we employ this approach to discover toxic comments in a large pool of documents provided by a current Kaggle's competition regarding Wikipedia's talk page edits. To justify this decision we choose to compare CNNs against the traditional bag-of-words approach for text analysis combined with a selection of algorithms proven to be very effective in text classification. The reported results provide enough evidence that CNN enhance toxic comment classification reinforcing research interest towards this direction.