Hate Speech Hashtag Classification on Twitter Using the Hybrid Classifier Method (original) (raw)
Related papers
Hate Speech Hashtag Classification Using Hybrid Artificial Neural Network (ANN) Method
JURIKOM (Jurnal Riset Komputer), 2022
Social networking sites Twitter is frequently used as a platform for information gathering various communities/forums as well as individuals to discuss certain things. Dissemination of information on Twitter can be in the form of positive information and negative information. One of the negative information is hate speech contained in the form of hashtags on twitter. Hate Speech Hashtag Classification was be carried out using the Hybrid Artificial Neural Network (ANN) method to produce satisfactory results compared to previous methods such as KNN and so on because the large amount of data in Twitter will be very profitable and produce good accuracy when using Hybrid Learning, Hybrid Learning with 5 Cross Validation the highest accuracy is 79% , the lowest is 73%, the average accuracy is 76%.
Hate Speech Classification Using SVM and Naive BAYES
2022
The spread of hatred that was formerly limited to verbal communications has rapidly moved over the Internet. Social media and community forums that allow people to discuss and express their opinions are becoming platforms for the spreading of hate messages. Many countries have developed laws to avoid online hate speech. They hold the companies that run the social media responsible for their failure to eliminate hate speech. But as online content continues to grow, so does the spread of hate speech However, manual analysis of hate speech on online platforms is infeasible due to the huge amount of data as it is expensive and time consuming. Thus, it is important to automatically process the online user contents to detect and remove hate speech from online media. Many recent approaches suffer from interpretability problem which means that it can be difficult to understand why the systems make the decisions they do. Through this work, some solutions for the problem of automatic detection of hate messages were proposed using Support Vector Machine (SVM) and Naïve Bayes algorithms. This achieved near state-of-the-art performance while being simpler and producing more easily interpretable decisions than other methods. Empirical evaluation of this technique has resulted in a classification accuracy of approximately 99% and 50% for SVM and NB respectively over the test set.
Detecting hate speech on Twitter A comparative study on the naive Bayes classifier
2017
Hate speech and cyberbullying on social media platform Twitter is a growing issue, and to combat this they turn to machine learning and computer science. This study will investigate and compare different configurations for the naive Bayes classifier when classifying hate speech on Twitter. We have achieved a data set of nearly 13000 tweets, some containing hate speech, and trained and tested our classifier with different configurations. The study shows that character level n-grams outperform word level ngrams, and the optimal size n-gram for character level is using combinations between 1-3.
Computer Science Review, 2020
Twitter is a microblogging tool that allow the creation of big data through short digital contents. This study provides a survey of machine learning techniques for hate speech classification from Twitter data streams. Hate speech classification in Twitter data streams has remain a vibrant research focus, but little research efforts have been devoted to the design of a generic metadata architecture, threshold settings and fragmentation issues. Hate speech classification techniques presented in literature address some of the challenges inherent in Twitter data streams but limited in the aforementioned issues. This study presented collection of hate speech benchmarks datasets suitable for testing the efficiency of classification models. This study also presented the pros and cons for single and hybrid machine learning methods in hate speech classification. The summary of performance evaluation for the surveyed machine learning methods was also presented. The study also presented a generic metadata architecture for hate speech classification in Twitter to tackle issues with Twitter data streams. The developed generic metadata architecture was observed to performed better across all evaluation metrics for hate speech detection having 0.95, 0.93, 0.92 and 0.93 for accuracy, precision, recall and F1-score respectively, when compared to similar methods. Similarly, the developed generic metadata architecture for hate speech sentiment classification performed better with F1-score of 91.5% compared to related methods. The developed generic metadata architecture also indicates a more perfect test having an AUC of 0.97, when compared to similar methods. The statistical validation of results points out the efficiency of the developed system. Finally, the results also showed that the developed system is very good for automatic topic detection and categorization.
Hate speech content detection system on Twitter using K-nearest neighbor method
INTERNATIONAL CONFERENCE ON INFORMATICS, TECHNOLOGY, AND ENGINEERING 2021 (InCITE 2021): Leveraging Smart Engineering
Twitter is a social media platform that many Indonesians use to express their thoughts on a variety of topics. In Indonesia, the use of social media is governed by a law known as Information and Electronic Transactions Law. However, until now, the implementation of this law has been subpar. This is because there are still violations occurring, and no legal action has been taken against these violations. Hate speech is a common violation on Twitter. The goal of this research is to create a system that can detect potential violations of content on Twitter, particularly content containing hate speech. The k-nearest neighbor (KNN) method was used in this research, along with the feature extraction method TF-IDF. The system built will detect whether the tweet you want to post violates a specific article in the Information and Electronic Transactions Law. Based on model validation, model classifier built has accuracy value is 67.86%, with K value in the KNN method is 10. Meanwhile, based on user validation, the system created has an accuracy of 77%.
IJERT-Detection of Hate Speech using Text Mining and Natural Language Processing
International Journal of Engineering Research and Technology (IJERT), 2020
https://www.ijert.org/detection-of-hate-speech-using-text-mining-and-natural-language-processing https://www.ijert.org/research/detection-of-hate-speech-using-text-mining-and-natural-language-processing-IJERTV9IS110257.pdf In today's modern world, technology connected with humanity is doing wonderful things. On the other hand, people inclined to social networks where they have anonymity are bringing out the very nastiest of people in the form of hate speech. Social media hate speech is a serious societal problem which can contribute to magnify the violence ranging from lynching to ethical cleansing. One of the critical tasks of automatic detection of hate speech is differentiating it from the other context of offensive languages. The existing works to distinguish the two categories using the lexical methods showed very low performance metrics values which led to major misclassification. The works with supervised machine learning approaches indeed gave significant results in distinguishing hate and offensive but the presence or absence of certain words of both the classes can serve as both merit and demerit to achieve accurate classification. In this paper, a ternary classification of tweets into hate speech, offensive and neither is performed using multi class classifiers. Among the four classifiers: Logistic Regression, Random forests, Support Vector Machines (SVM) and Naïve Bayes. It can be seen that Random Forest classifier performs significantly well with almost all feature combinations giving maximum accuracy of 0.90 for TFIDF feature technique.
Detecting Hate Speech In Twitter Using Long Short-Term Memory and Naïve Bayes Method
Syntax Literate ; Jurnal Ilmiah Indonesia, 2022
The information technologi’s development has been very sophisticated and easy, so that it becomes a lifestyle for people throughout the world without exception Indonesia which also affected by the development of this technology. One of the benefits of information technology is the emergence various kinds of social networking sites or social media such as Facebook, Twitter and Instagram. Technological developments isn’t only have a positive impact, but also have a negative impact the crime of insult or hate speech. This study is aims to classify Indonesian hate speech sentences based on hate speech and neutral sentiments using the Long Short-Term Memory (LSTM) method. Research data is obtained from Indonesian-language tweets. In testing process, the LSTM method will be compared with the Naïve Bayes method
Identification of HATE speech tweets in Pashto language using Machine Learning techniques
International Journal of Advanced Trends in Computer Science and Engineering, 2021
From the last few years, researchers are very much attracted to sentiment analysis, especially towards hate speech detection systems. As in different languages procreation of hate speech has compelling and symbolic consideration on social media. Hate speech has a great impact on society, using hate words harms others dignity. Hate speech detection systems are important to stop the transformation of hate words into crimes. In this research, a framework is developed for hate speech detection system in the Pashto language. A dataset is created for which data is collected from Twitter. Because there is no related data available. Most of the research work has been done in this domain for other languages, and it's very mature in the context of detecting hate speech. But when it arrives at the morphological languages not much work has been done especially in the Pashto language. This research aimed and collected data from Twitter, Tweets related to ethnicity and religion. The data collected from twitter has been annotated manually and categorized the data as hate or not by comparing it with the offensive content. For hate speech detection systems to view the impact of different features/attribute this study performed experiments on the existing classifiers i.e., SVM, Naïve Bayes, Decision tree and KNN. SVM produced the highest result at dataset of 500 i.e., 74% among all the classifiers. KNN and Decision Tree produced same result at dataset of 1500 i.e., 65.0%. Dataset of 2800 Decision Tree produced the highest result i.e., 72% and SVM produced 71.9%.
Detection and Classification of Hate Speech
International Journal of Engineering Applied Sciences and Technology, 2021
The challenges that are to be faced while handling with hate speech is not a new thing. From the past few years due to the boosted usage of internet, hateful activities across social media is increasing rapidly. Improved technology has made it possible to create a platform where people can feel free to share their opinions and experiences.it wouldn't be a problem if this is just the case. but we can also see hateful comments running throughout the social media targeting a person or a community. Hate speech is the statement that targets a person or community of people discriminating based on caste, creed, nationality etc. Our project aims at resolving the above problem by using Machine Learning techniques to automatically detect hate speech and classify them into various classes such as extremely positive, positive neutral etc. We have used classifier that works based on the lexicons and finally compare it with other classifiers that doesn't use lexicons. Aimed beneficiaries of this model are the people who are being targeted on social media. Based on the results they can calculate intensity of the comments.
Hate Speech Detection on Twitter in Indonesia with Feature Expansion Using GloVe
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
Twitter is one of the popular social media to channel opinions in the form of criticism and suggestions. Criticism could be a form of hate speech if the criticism implies attacking something (an individual, race, or group). With the limit of 280 characters in a tweet, there is often a vocabulary mismatch due to abbreviations which can be solved with word embedding. This study utilizes feature expansion to reduce vocabulary mismatches in hate speech on Twitter containing Indonesian language by using Global Vectors (GloVe). Feature selection related to the best model is carried out using the Logistic Regression (LR), Random Forest (RF), and Artificial Neural Network (ANN) algorithms. The results show that the Random Forest model with 5.000 features and a combination of TF-IDF and Tweet corpus built with GloVe produce the best accuracy rate between the other models with an average of 88,59% accuracy score, which is 1,25% higher than the predetermined Baseline. The number of features us...