Textual-based Turkish Offensive Language Detection Model (original) (raw)

Harmful social media comments and posts have a variety of unintended repercussions for individuals. In addition to psychological disorders, researchers have studied this problem as a possible cause of suicidal behaviors. With approximately 16.1 million members, Turkey is the sixth-largest Twitter community in terms of currently active users in 2022, reflecting a diversified demographic for its size. As a result, there is an increasing demand for a high-quality Turkish hate speech detection model for usage in social networks. The vast bulk of prior research has been conducted on tiny, label-imbalanced datasets. This study investigates traditional machine learning and recent deep learning algorithms to detect hate speech in Turkish Language. We used different classification methods and algorithms to detect offensive Turkish language on a big dataset that includes more than 53000 posts. The obtained results are demonstrating that BERT-Features achieved promising results. Additionally, BiLSTM, and Logistic Regression achieved the best performance on the used dataset. The findings for all models demonstrate the resilience of LGMB, Logistic Regression, and BiLSTM for detecting offensive language with around 95% in terms of ROC (AUC).