Detection of child exploiting chats from a mixed chat dataset as a text classification task (original) (raw)

A Learning-Based Approach for the Identification of Sexual Predators in Chat Logs

2012

The existence of sexual predators that enter into chat rooms or forums and try to convince children to provide some sexual favour is a socially worrying issue. Manually monitoring these interactions is a way to attack this problem. However, this manual approach simply cannot keep pace because of the high number of conversations and the huge number of chatrooms or forums where these conversations daily take place. We need tools that automatically process massive amounts of conversations and alert about possible offenses. The sexual predator identification challenge within PAN 2012 is a valuable way to promote research in this area. Our team faced this task as a Machine Learning problem and we designed several innovative sets of features that guide the construction of classifiers for identifying sexual predation. Our methods are driven by psycholinguistic, chat-based, and tf/idf features and yield to very effective classifiers.

Automated Identification of Child Abuse in Chat Rooms by Using Data Mining

Data Mining Trends and Applications in Criminal Science and Investigations, 2000

Providing a safe environment for juveniles and children in online social networks is considered as one of the major factors of improving public safety. Due to the prevalence of the online conversations, mitigating the undesirable effects of child abuse in cyber space has become inevitable. Using automatic ways to combat this kind of crime is challenging and demands efficient and scalable data mining techniques. The problem can be casted as a combination of textual preprocessing in data/text mining and pattern classification in machine learning. This chapter covers different data mining methods including preprocessing, feature extraction and the popular ways of feature enrichment through extracting sentiments and emotional features. A brief tutorial on classification algorithms in the domain of automated predator identification is also presented through the chapter. Finally, the discussion is summarized and the challenges and open issues in this application domain are discussed.

Child Predator Detection in Online Chat Conversation using Support Vector Machine

2021

Increase in Internet use and facilitating access to social media platform has help the predatory to establish online relationships with children which has boost to increase in online solicitation. We are proposing system that enables us to detect a predator in online chats using Text classification method. In this paper, the use of machine learning algorithm named as support vector machine has been used to determine cyber predators. The main objective of our system is to detect child predator base on chat, comments and post of social media account and send predator record to cyber cell admin & the use of PAN12 dataset is done for text classification Purpose. This paper presents our current development to enable the creation of the child predator system using SVM text classification.

Sexual predator detection in chats with chained classifiers

This paper describes a novel approach for sexual predator detection in chat conversations based on sequences of classifiers. The proposed approach divides documents into three parts, which, we hypothesize, correspond to the different stages that a predator employs when approaching a child. Local classifiers are trained for each part of the documents and their outputs are combined by a chain strategy: predictions of a local classifier are used as extra inputs for the next local classifier. Additionally, we propose a ring-based strategy, in which the chaining process is iterated several times, with the goal of further improving the performance of our method. We report experimental results on the corpus used in the first international competition on sexual predator identification (PAN'12). Experimental results show that the proposed method outperforms a standard (global) classification technique for the different settings we consider; besides the proposed method compares favorably with most methods evaluated in the PAN'12 competition.

Advanced Data Preprocessing for Detecting Cybercrime in Text-Based Online Interactions

Pattern Recognition and Artificial Intelligence, 2020

Social media provides a powerful platform for individuals to communicate globally. This capability has many benefits, but it can also be used by malevolent individuals, i.e. predators. Anonymity exacerbates the problem. The motivation of our work is to help protect our children from this potentially hostile environment, without excluding them from utilizing its benefits. In our research, we aim to develop an online sexual predator identification system, designed to detect cybercrime related to child grooming. We will use AI techniques to analyze chat interactions available from different social networks. However, before any meaningful analysis can be carried out, chats must be preprocessed into a consistent and suitable format. This task poses challenges in itself. In this paper we show how different and diverse chat formats can be automatically normalized into a consistent text-based format that can be subsequently used for analysis.

Combining Psycho-linguistic, Content-based and Chat-based Features to Detect Predation in Chatrooms

The Digital Age has brought great benefits for the human race but also some drawbacks. Nowadays, people from opposite corners of the World can communicate online via instant messaging services. Unfortunately, this has introduced new kinds of crime. Sexual predators have adapted their predatory strategies to these platforms and, usually, the target victims are kids. The authorities cannot manually track all threats because massive amounts of online conversations take place in a daily basis. Automatic methods for alerting about these crimes need to be designed. This is the main motivation of this paper, where we present a Machine Learning approach to identify suspicious subjects in chat-rooms. We propose novel types of features for representing the chatters and we evaluate different classifiers against the largest benchmark available. This empirical validation shows that our approach is promising for the identification of predatory behaviour. Furthermore, we carefully analyse the characteristics of the learnt classifiers. This preliminary analysis is a first step towards profiling the behaviour of the sexual predators when chatting on the Internet.

Cyberbully Detection Using Term Weighting Scheme and Naïve Bayes Classifier

International Journal of Innovative Computing

The internet especially social media has been a major platform where people interact with each other. We are able to interact with each other regardless of time and place because of the advancement of technology. Unfortunately, not all of the interaction that goes on are good or positive. One of the negative interaction that can happen online is cyberbullying which has rapidly increase throughout the years, whether it be through social media, emails or texting. Therefore, it is important to prevent cyberbullying from occurring which is why this research is done. Detection the presence of cyberbullying is one if the main issue in avoiding it from happening. Cyberbullying detection can be challenging because the many languages used in the world, most of the time slangs and informal languages are used and special characters like emoji are also used during online conversation. The aim of this research is to detect the presence of text cyberbullying from online post. Two term weighting s...

ChatBot Detection using Text Classification

The application's main aim is to detect chatbots from a given chat. For this the various patterns in chatbots that are observed were studied. The research helped in deciding what patterns are to be observed in order to implement the chatbot detection features. It was found that many of the features can be implemented passively like the time based and size-based methods. Other features like the text classifier can be implemented by training a model. Various graphical tools can be implemented to analyze the efficiency of the system and the help the end users take a decision. Sentiment of the text will also be analyzed to get an insight and classify the texts. It can be used by various chat room administrators and sites to detect if there are any chatbots present on their website. It will help them in saving the resources from chatbots. E-commerce websites can prevent their resources being used up by chatbots by detecting them and thus directing their resources to real clients.

IRJET- Cyber Bullying Detection in Web Chat Application

IRJET, 2020

In today's life, the effect of social media is increasing popularly .cyber bullying has emerged as a critical afflicting children, youngster's and teenagers. System gaining knowledge of strategies make automatic detection of bullying messages in social media possible, and this may assist to assemble a healthful and safe social media environment. One critical issue in this significant research area is the reliable and discriminative processing of text messages in numerical representation. In this cyber bullying is harassment or bullying executed through digital device like computers, laptops, Smartphone and tablets. Cyber bullying can be defined as belligerent, intentional actions performed by an individual or a group of people via digital communication methods such as sending messages and posting comments against a victim. This paper, review Many companies will used this application for chatting, email notification, group chatting and meeting also. This web is helpful for people who is work on company because many people use the abusing word or comment on post that time ,this application doesn't show abused word only show there related symbols like asterisk, small square box etc. In this web, when person start chatting with another person or comment on post that time they cannot use abused words.

Natural Language Processing and Naïve Bayes Classifier Algorithm to Automate the Detection of Cyberbullying

International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2023

The impact of social media on contemporary culture has been unprecedented, making it the most significant medium of our times. While it has had a positive effect on people's worldview, social media has also been linked to a rise in undesirable phenomena such as cyberbullying, cyberstalking, and cybercrime. Cyberbullying, in particular, can have a negative impact on individuals' mental health and has even been identified as the root cause of mental health issues in some cases. The proliferation of sexually explicit comments and the spread of rumors by multiple individuals are some of the negative influences that have been observed in the social media ecosystem. In recent years, academics have been increasingly concerned about the indicators of online harassment. Our goal is to develop a system that can detect instances of online abuse using Natural Language Processing (NLP) and Naïve Bayes, among other techniques. The cultural norms have shifted dramatically due to the rapid transmission of the COVID-19 virus, resulting in a rise in cyberbullying, especially among adolescents. The younger generation is more likely to engage in this practice, which has become more widespread with the stratospheric rise in popularity of various online engagement-promoting platforms. The COVID-19 pandemic has changed the way people interact online and has contributed to an increase in cyberbullying. As more people began working from home, bullying became a more significant concern. Our proposed system includes modules for data cleansing, text mining, word embedding, and regression analysis, among others. We utilize the Lemmatization technique for text mining, which enhances the model's precision. We also utilize the Vader emotion for feature extraction, which generates word vectors that are scattered numerical representations of word attributes. Additionally, Naive Bayes is used for data categorization to prevent overfitting in the proposed model. This would help in creating vectors that connect words with similar meanings.