Buse Carik (Student) | Sabancı University (original) (raw)

Papers by Buse Carik (Student)

Research paper thumbnail of SU-NLP at SemEval-2022 Task 11: Complex Named Entity Recognition with Entity Linking

Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Research paper thumbnail of SU-NLP at CheckThat! 2021: Check-Worthiness of Turkish Tweets

The growth in social media usage increases the spread of misinformation on these platforms. In or... more The growth in social media usage increases the spread of misinformation on these platforms. In order to prevent this disinformation spread, automated fact checking systems that identify and verify claims are needed. The first step of such systems is the identification of whether a claim is worth-checking or not. This paper describes our participation to the check-worthiness task of CLEF 2021 CheckThat! 2021 Lab for Turkish tweets. We propose an ensemble of BERT models which ranked the second best in terms of MAP score.

Articles in Journals by Buse Carik (Student)

Research paper thumbnail of A Turkish Hate Speech Dataset and Detection System

Language Resources and Evaluation (LREC), 2022

Social media posts containing hate speech are reproduced and redistributed at an accelerated pace... more Social media posts containing hate speech are reproduced and redistributed at an accelerated pace, reaching greater audiences at a higher speed. We present a machine learning system for automatic detection of hate speech in Turkish, along with a hate speech dataset consisting of tweets collected in two separate domains. We first adopted a definition for hate speech that is in line with our goals and amenable to easy annotation; then designed the annotation schema for annotating the collected tweets. The Istanbul Convention dataset consists of tweets posted following the withdrawal of Turkey from the Istanbul Convention. The Refugees dataset was created by collecting tweets about immigrants by filtering based on commonly used keywords related to immigrants. Finally, we have developed a hate speech detection system using the transformer architecture (BERTurk), to be used as a baseline for the collected dataset. The binary classification accuracy is 77% when the system is evaluated using 5-fold cross validation on the Istanbul Convention dataset and 71% for the Refugee dataset. We also tested a regression model with 0.66 and 0.83 RMSE on a scale of [0-4], for the Istanbul Convention and Refugees datasets.

Research paper thumbnail of SU-NLP at SemEval-2022 Task 11: Complex Named Entity Recognition with Entity Linking

Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Research paper thumbnail of SU-NLP at CheckThat! 2021: Check-Worthiness of Turkish Tweets

The growth in social media usage increases the spread of misinformation on these platforms. In or... more The growth in social media usage increases the spread of misinformation on these platforms. In order to prevent this disinformation spread, automated fact checking systems that identify and verify claims are needed. The first step of such systems is the identification of whether a claim is worth-checking or not. This paper describes our participation to the check-worthiness task of CLEF 2021 CheckThat! 2021 Lab for Turkish tweets. We propose an ensemble of BERT models which ranked the second best in terms of MAP score.

Research paper thumbnail of A Turkish Hate Speech Dataset and Detection System

Language Resources and Evaluation (LREC), 2022

Social media posts containing hate speech are reproduced and redistributed at an accelerated pace... more Social media posts containing hate speech are reproduced and redistributed at an accelerated pace, reaching greater audiences at a higher speed. We present a machine learning system for automatic detection of hate speech in Turkish, along with a hate speech dataset consisting of tweets collected in two separate domains. We first adopted a definition for hate speech that is in line with our goals and amenable to easy annotation; then designed the annotation schema for annotating the collected tweets. The Istanbul Convention dataset consists of tweets posted following the withdrawal of Turkey from the Istanbul Convention. The Refugees dataset was created by collecting tweets about immigrants by filtering based on commonly used keywords related to immigrants. Finally, we have developed a hate speech detection system using the transformer architecture (BERTurk), to be used as a baseline for the collected dataset. The binary classification accuracy is 77% when the system is evaluated using 5-fold cross validation on the Istanbul Convention dataset and 71% for the Refugee dataset. We also tested a regression model with 0.66 and 0.83 RMSE on a scale of [0-4], for the Istanbul Convention and Refugees datasets.