Hateful and Other Negative Communication in Online Commenting Environments: Content, Structure and Targets (original) (raw)

Targets of Online Hate Speech in Context. A Comparative Digital Social Science Analysis of Comments on Public Facebook Pages from Romania and Hungary

Intersections. East European Journal of Society and Politics, 2019

Online hate speech, especially on social media platforms, is the subject of both policy and political debate in Europe and globally - from the fragmentation of network publics to echo chambers and bubble phenomena, from networked outrage to networked populism, from trolls and bullies to propaganda and non-linear cyberwarfare. Both researchers and Facebook Community standards see the identification of the potential targets of hateful or antagonistic speech as key to classifying and distinguishing the latter from arguments that represent political viewpoints protected by freedom of expression rights. This research is an exploratory analysis of mentions of targets of hate speech in comments in the context of 106 public Facebook pages in Romanian and Hungarian from January 2015 to December 2017. A total of 1.8 million comments were collected through API interrogation and analyzed using a text-mining niche-dictionaries approach and co-occurrence analysis to reveal connections to events on the media and political agenda and discursive patterns. Findings indicate that in both countries the most prominent targets mentioned are connected to current events on the political and media agenda, that targets are most frequently mentioned in contexts created by politicians and news media, and that discursive patterns in both countries involve the proliferation of similar stereotypes about certain target groups.

Hate speech in the Romanian Online Media

This article investigates hate-speech in three of the most important online spaces for public expression: user comments on Facebook Pages, blogs and online news outlets. The co-occurrence of terms referencing frequent targets of hate-speech with elements of violent or offensive language was analyzed in order to detect instances of hate-speech in a sample of over 2.6 million comments published in Romanian in the first six months of 2015. Results indicate a relatively low occurrence of hate-speech - below 1% in the analyzed sample, but also several well-defined contexts and timeframes associated with high occurrence of hate-speech, suggesting possibilities for further in-depth work focusing especially on these particular contexts.

Hate speech targets in COVID-19 related comments on Ukrainian news websites

Journal of Computer-Assisted Linguistic Research

The research focuses on hate speech in the comments section of Ukrainian news websites. Restricted to solely COVID-19 related comments, it seeks to analyze the development of hate speech rates throughout the pandemic. Using a semi-automated machine-learning-aided approach, the paper identifies hate speech in the comments and defines its main targets. The research shows that a crisis like the COVID-19 pandemic can strengthen existing negative stereotypes and gives rise to new forms of stigmatization against social and ethnic groups.

Measuring and Characterizing Hate Speech on News Websites

12th ACM Conference on Web Science, 2020

The Web has become the main source for news acquisition. At the same time, news discussion has become more social: users can post comments on news articles or discuss news articles on other platforms like Reddit. These features empower and enable discussions among the users; however, they also act as the medium for the dissemination of toxic discourse and hate speech. The research community lacks a general understanding on what type of content attracts hateful discourse and the possible effects of social networks on the commenting activity on news articles. In this work, we perform a large-scale quantitative analysis of 125M comments posted on 412K news articles over the course of 19 months. We analyze the content of the collected articles and their comments using temporal analysis, user-based analysis, and linguistic analysis, to shed light on what elements attract hateful comments on news articles. We also investigate commenting activity when an article is posted on either 4chan's Politically Incorrect board (/pol/) or six selected subreddits. We find statistically significant increases in hateful commenting activity around real-world divisive events like the "Unite the Right" rally in Charlottesville and political events like the second and third 2016 US presidential debates. Also, we find that articles that attract a substantial number of hateful comments have different linguistic characteristics when compared to articles that do not attract hateful comments. Furthermore, we observe that the post of a news articles on either /pol/ or the six subreddits is correlated with an increase of (hateful) commenting activity on the news articles.

Targets of Online Hate Speech in Context

Intersections

Online hate speech, especially on social media platforms, is the subject of both policy and political debate in Europe and globally - from the fragmentation of network publics to echo chambers and bubble phenomena, from networked outrage to networked populism, from trolls and bullies to propaganda and non-linear cyberwarfare. Both researchers and Facebook Community standards see the identification of the potential targets of hateful or antagonistic speech as key to classifying and distinguishing the latter from arguments that represent political viewpoints protected by freedom of expression rights. This research is an exploratory analysis of mentions of targets of hate speech in comments in the context of 106 public Facebook pages in Romanian and Hungarian from January 2015 to December 2017. A total of 1.8 million comments were collected through API interrogation and analyzed using a text-mining niche-dictionaries approach and co-occurrence analysis to reveal connections to events o...

Dynamics of online hate and misinformation

Scientific Reports, 2021

Online debates are often characterised by extreme polarisation and heated discussions among users. The presence of hate speech online is becoming increasingly problematic, making necessary the development of appropriate countermeasures. In this work, we perform hate speech detection on a corpus of more than one million comments on YouTube videos through a machine learning model, trained and fine-tuned on a large set of hand-annotated data. Our analysis shows that there is no evidence of the presence of "pure haters", meant as active users posting exclusively hateful comments. Moreover, coherently with the echo chamber hypothesis, we find that users skewed towards one of the two categories of video channels (questionable, reliable) are more prone to use inappropriate, violent, or hateful language within their opponents' community. Interestingly, users loyal to reliable sources use on average a more toxic language than their counterpart. Finally, we find that the overall toxicity of the discussion increases with its length, measured both in terms of the number of comments and time. Our results show that, coherently with Godwin's law, online debates tend to degenerate towards increasingly toxic exchanges of views. Public debates on social media platforms are often heated and polarised 1-3. Back in the 90s, Mike Godwin coined a theorem, today known as Godwin's law, stating that "As an online discussion grows longer, the probability of a comparison involving Nazis or Hitler approaches to one". More recently, with the advent of social media, an increasing number of people is reporting exposure to online hate speech 4 , leading institutions and online platforms to investigate possible solutions and countermeasures 5. To prevent and counter the spread of hate speech online, for example, the European Commission agreed with Facebook, Microsoft, Twitter, YouTube, Instagram, Snapchat, Dailymotion, Jeuxvideo.com, and TikTok on a "Code of conduct on countering illegal hate speech online" 6. In addition to fuelling the toxicity of the online debate, hate speech may have severe offline consequences. Some researchers hypothesised a causal link between online hate and offline violence 7-9. Furthermore, there is empirical evidence that online hate may induce fear of offline repercussions 10. However, the detection and contrast of hate speech is complicated. There are still ambiguities in the very definition of hate speech, with academic and relevant stakeholders providing their own interpretations 4 , including social media companies such as Facebook 11 , Twitter 12 , and YouTube 13. We use the term "hate speech" to cover whole spectrum of language used in online debates, from normal, acceptable to the extreme, inciting violence. On the extreme end, violent speech covers all forms of expression which spread, incite, promote or justify racial hatred, xenophobia, antisemitism or other forms of hatred based on intolerance, including: intolerance expressed by aggressive nationalism and ethnocentrism, discrimination and hostility against minorities, migrants and people of immigrant origin 14. Less extreme forms of unacceptable speech include inappropriate (e.g., profanity) and offensive language (e.g., dehumanisation, offensive remarks), which is not illegal, but deteriorates public discourse and can lead to a more radicalised society. In this work, we analyse a corpus of more than one million comments on Italian YouTube videos related to COVID-19 to unveil the dynamics and trends of online hate. First, we manually annotate a large corpus of YouTube comments for hate speech, and train and fine-tune a hate speech deep learning model to accurately detect it. Then, we apply the model to the entire corpus, aiming to characterise the behaviour of users producing hate, and shed light on the (possible) relationship between the consumption of misinformation and usage of hate and toxic language. The reason for performing hate speech detection on the Italian language is twofold: First, Italy was one of the countries most affected by the COVID-19 pandemic and especially by the early application of non-pharmaceutical interventions (strict lockdown happened on March 9, 2020). Such an event, by forcing people at home, increased the internet use and was likely to exacerbate the public debate and foment hate speech against specific targets such as the government and politicians. Second, Italian is a less studied language

A Web Interface for Analyzing Hate Speech

Future Internet, 2021

Social media services make it possible for an increasing number of people to express their opinion publicly. In this context, large amounts of hateful comments are published daily. The PHARM project aims at monitoring and modeling hate speech against refugees and migrants in Greece, Italy, and Spain. In this direction, a web interface for the creation and the query of a multi-source database containing hate speech-related content is implemented and evaluated. The selected sources include Twitter, YouTube, and Facebook comments and posts, as well as comments and articles from a selected list of websites. The interface allows users to search in the existing database, scrape social media using keywords, annotate records through a dedicated platform and contribute new content to the database. Furthermore, the functionality for hate speech detection and sentiment analysis of texts is provided, making use of novel methods and machine learning models. The interface can be accessed online with a graphical user interface compatible with modern internet browsers. For the evaluation of the interface, a multifactor questionnaire was formulated, targeting to record the users' opinions about the web interface and the corresponding functionality.

In search of hate speech in Lithuanian public discourse: A corpus-assisted analysis of online comments

Lodz Papers in Pragmatics 14.1; Special issue on Narrating hostility, 2018

The present paper aims to report on the preliminary findings from the initial stages of ongoing research on hate speech in Lithuanian online comments. Comments are marked strongly by such phenomena as flaming and trolling; therefore, in this genre we can expect a high degree of hostility, obscenity, high incidence of insults and aggressive lexis, which can inflict harm to individuals or organizations. The goal of the current research is thus to make an attempt to identify some features of verbal aggression in Lithuanian by applying the principles and instruments of corpus linguistics, which proved to be a useful approach when dealing with such issues as trolling. It is expected that further analysis of those features will help to identify and define formal linguistic criteria that could facilitate identification of hate speech in public discourse. The data has been obtained from the Lithuanian corpus of user-generated comments collected from one major Lithuanian portal, www.delfi.lt. The corpus consists of all the comments posted in the year 2014 and in total includes 17,909 comments, which make up 1,160,109 words. For the initial data analysis, linguistic aspects, such as wordlists, collocations, and formulaic language, were analysed by using the AntConc software. The interpretations of the results are still very tentative, but what the initial findings show is that aggressive lexis does not feature among the most frequent and most salient features of comments, since aggressive rhetoric often resorts to creative language use, which emerges mainly through content analysis.

Tuning Out Hate Speech on Reddit: Automating Moderation and Detecting Toxicity in the Manosphere

AoIR Selected Papers of Internet Research, 2020

Over the past two years social media platforms have been struggling to moderate at scale. At the same time, they have come under fire for failing to mitigate the risks of perceived ‘toxic’ content or behaviour on their platforms. In effort to better cope with content moderation, to combat hate speech, ‘dangerous organisations’ and other bad actors present on platforms, discussion has turned to the role that automated machine-learning (ML) tools might play. This paper contributes to thinking about the role and suitability of ML for content moderation on community platforms such as Reddit and Facebook. In particular, it looks at how ML tools operate (or fail to operate) effectively at the intersection between online sentiment within communities and social and platform expectations of acceptable discourse. Through an examination of the r/MGTOW subreddit we problematise current understandings of the notion of ‘tox¬icity’ as applied to cultural or social sub-communities online and explai...