Svetlana Bodrunova | St.Petersburg state University (Russian Federation) (original) (raw)

Papers by Svetlana Bodrunova

2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), 2019

Topic modelling is a technique widely used today to detect hidden topicality of text corpora, inc... more Topic modelling is a technique widely used today to detect hidden topicality of text corpora, including those from social media. But, for many quite widespread online languages, like, e.g., Russian, topic modelling is still used rarely. For the Russian Twitter, only a handful of works exists, and these works lack substantial discussion on topic interpretability. Also, the impact of various properties of texts upon the modelling results remains widely unexplored. We partly cover these gaps by assessing a mid-range text corpus of a conflictual Twitter discussion in two respects. In continuation to our earlier study that applied three topic modelling algorithms (LDA, WNTM, and BTM) and assessed their quality via automated means, we here juxtapose automated assessment to human coding and link the human evaluation of topic quality to sentiment of the topics. We show that human coding disagrees with the results of the objective metrics in the number of interpretable topics, showing slightly higher interpretability for the LDA algorithm, but inter-coder reliability is much higher for BTM. We discuss a range of coding issues true for all the three topic models. We also find that interpretability of a topic by the human coders is linked to presence of negative keywords among the topic descriptors, with the strongest linkage shown by BTM.

Proceedings of the 3rd International Conference on Applications in Information Technology, 2018

Over the past few years the sentiment analysis task of users' posts in social networks has be... more Over the past few years the sentiment analysis task of users' posts in social networks has become very popular among researchers. In this paper, authors present and describe the developed multi-lingual knowledge-based approach of sentiment analysis in major conflict ad hoc discussions of the social network Twitter. An experiment is made in which the quality of the proposed method is evaluated with different parameters on two real ad hoc discussions: Ferguson unrest (USA) and Biryuliovo bashings (Russia). The results of the experiment show a good quality of the sentiment analysis of the discussions. In particular, the average value of the accuracy of Russian and English is 0.65, and the f-measure is 0.7.

Internet Science, 2019

For the purposes of searching for various communities on the Internet, automatic typology of text... more For the purposes of searching for various communities on the Internet, automatic typology of text messages defined via application of methods of cluster analysis may be used. In this paper, we address one of the significant issues in text classification via cluster analysis, namely determination of the number of clusters. For clustering based on semantics, text documents are typically represented in the form of vectors within n-dimensional linear space. What we suggest as a method for determining the number of clusters is the agglomerative clustering of vectors in the linear space. In our work, statistical analysis is combined with neural network algorithms to obtain a more accurate semantic portrait of a text. Then, using the techniques of distributive semantics, mapping of the derived network structures into a vector form is constructed. A statistical criterion for the completion of the clustering process is derived, defined as a Markovian moment. By obtaining automatic partitioning into clusters, one can compare texts that are closest to the centroids with actual content samples or evaluate such texts with the help of experts. If the display of texts in a vector form is adequate, all informational messages from a fixed cluster have the same meaning and the same emotional coloring. In addition, we discuss a possibility to use vector representation of texts for sentiment detection in short texts like search engines input or tweets.

Social Media + Society, 2021

Russian-speaking diaspora has spread across the world during the last century and plays a signifi... more Russian-speaking diaspora has spread across the world during the last century and plays a significant role in cultural and political life of the host countries. But its virtual presence remains heavily understudied; it is only Russian-speaking news websites that have received some scholarly attention. This study aims at estimating the globality of mass self-communication of Russian emigrants on Instagram in the context of virtual diaspora studies as a new form of imagined communities. Instagram communication of emigrants illustrates how the nature of mass self-communication influences the nature of ties between diaspora members. We confirm the global scale of ties that are developed by the Russian-speaking “InstaMigrants” by network analysis. We also show that such seemingly apolitical publics possess a potential for politicization of everyday life and migration experience in unconventional ways.

Social Computing and Social Media: Experience Design and Social Network Analysis, 2021

Background. Public discussions on social networks have trans-border and multilingual nature. This... more Background. Public discussions on social networks have trans-border and multilingual nature. This is especially true for conflictual discussions that reach global trending topics. Being part of the global public sphere, such discussions were expected by many observers to become horizontal, all-involving, and democratically efficient. But, with time, criticism towards the democratic quality of discussions in social media arose, with many works discovering the patterns of echo chambering in social networks. Even if so, there is still scarce knowledge on how affective hashtags work in terms of user clusterization, as well as on the differences between emotionally ‘positive’ and ‘negative’ hashtags. Objectives. We address this gap by analyzing the Twitter discussion on the Charlie Hebdo massacre of 2015. In this discussion, the Twittershpere has created #jesuischarlie and #jenesuispascharlie - two discussion clusters with, allegedly, opposite sentiments towards the journal’s ethics and ...

Social Computing and Social Media: Experience Design and Social Network Analysis, 2021

Social Media + Society, 2021

YouTube-based discussions are a growing area of academic attention. However, we still lack knowle... more YouTube-based discussions are a growing area of academic attention. However, we still lack knowledge on whether YouTube provides for forming critical publics in countries with no established democratic tradition. To address this question, we study commenting to Belarusian oppositional YouTube blogs in advance of the major wave of Belarusian post-election protests of 2020. Based on the crawled data of the whole year of 2018 for six Belarusian political videoblogs, we define the structure of the commenters’ community, detect the core commenters, and assess their discourse for aggression, orientation of dialogue, direction of criticism, and antagonism/agonism. We show that, on Belarusian YouTube, the commenters represented a genuine adversarial self-critical public with cumulative patterns of solidarity formation and find markers of readiness for the protest spillover.

Design, User Experience, and Usability: Theory, Methodology, and Management, 2017

Background. Understanding the relations between user perception and aesthetics is crucial for web... more Background. Understanding the relations between user perception and aesthetics is crucial for web design. But it is frequent in today's graphic and media design that rules, established by practitioners even before the advent of Internet and still untested empirically, are taught at design schools and widely used for online interface design. So far, there is no well-established linkage between the in-class recommendations and our empirical knowledge on usability, for which design plays a role just as crucial as web projecting. Will webpages that are better from the designers' viewpoint perform better in terms of usability? And can one have a list of recommendations tested empirically? This is especially important for large-scale organizational web spaces where design plays a huge role in brand recognition and visual unity. Large web spaces need complex ergonomic assessment both on the level of selected nodes and on that of architecture/navigation. Of many large web spaces, university portals suit best for elaboration and pre-testing of such a methodology, as they serve various publics, contain sub-domains, and often face criticism for their user-unfriendly design and messy structure. Objectives. We aim at creating a two-level usability expert test for a large web space that would be based on design recommendations tested empirically, thus eliminating the necessity of tech-based assessment of newcoming products. In this paper, we elaborate the node-level methodology. For this, basing on leading design literature, we create a page usability index (U-index) for 'good' design that provides quantitative measurement for traditional design decisions on the micro-and macro-level of a web page. Then, we test by eye tracking whether 'better' design (corresponding to higher U-index values) favors a particular pattern of content consumptionnot 'random search' but more efficient 'reading'. Research design. To check whether web design measured qualitatively correlates with perception of web pages as tested by eye tracking, we first define target nodes by collecting the hyperlink structure and constructing web graphs for three web spaces of the biggest universities in the USA and Russia (Harvard University, Moscow State University, St. Petersburg State University). For this, we combine web crawling and web analytics. Second, we construct the U-index with the maximum value of 22. Third, to assess user perception of the target web pages, we create a series of tasks on information search and measure three test parameters (number of eye fixations, duration of fixations, and saccade length) and their derivatives, as well as heat maps. To avoid bias in quantitative

International Journal of Communication, 2017

Communication in social media is increasingly being found to reproduce or even reinforce ethnic p... more Communication in social media is increasingly being found to reproduce or even reinforce ethnic prejudice and hostility toward migrants. In Russia of the 2010s, with its world’s second largest immigrant population, polls have detected high levels of hostility of the Russian population toward migranty (migrants), a label attached to resettlers from Central Asia and the Caucasus. We tested the online hostility hypothesis by using the data of 363,000 posts from the Russian-language LiveJournal . We applied data mining, regression analysis, and selective interpretative reading to map bloggers’ attitudes toward migranty, among other ethnicities and nations. Our findings significantly alter the picture drawn from the polls: Migranty neither provoke the biggest amount of discussion nor experience the worst treatment in Russian blogs, in which Americans take the lead. Furthermore, Central Asians and North Caucasians are treated very differently.

The linkages between intensity and topicality of online discussions, on one hand, and those of of... more The linkages between intensity and topicality of online discussions, on one hand, and those of offline on-street political activity, on the other hand, have recently become a subject of studies around the world. But the results of quantitative assessment of causal relations between onsite and online activities of citizens are contradictory. In our research, we use conflicts with violent trig-gers and the subsequent lines of events that include street rallies, political manifestations, and/or peaceful mourning, as well as public political talk, to trace the pivotal points in the conflict via measuring Twitter content. We show that in some cases Granger test does not work well, like in the case of Cologne mass harassment, for detecting the causality between online and onsite activities. In order to suggest a way to qualitatively assess the linkages between online and offline activities of users, we deploy topic modeling and further qualitative assessment of the changes in the topicali...

Future Internet, 2021

The community-based structure of communication on social networking sites has long been a focus o... more The community-based structure of communication on social networking sites has long been a focus of scholarly attention. However, the problem of discovery and description of hidden communities, including defining the proper level of user aggregation, remains an important problem not yet resolved. Studies of online communities have clear social implications, as they allow for assessment of preference-based user grouping and the detection of socially hazardous groups. The aim of this study is to comparatively assess the algorithms that effectively analyze large user networks and extract hidden user communities from them. The results we have obtained show the most suitable algorithms for Twitter datasets of different volumes (dozen thousands, hundred thousands, and millions of tweets). We show that the Infomap and Leiden algorithms provide for the best results overall, and we advise testing a combination of these algorithms for detecting discursive communities based on user traits or vi...

Internet Science, 2019

Till today, classification of documents into negative, neutral, or positive remains a key task wi... more Till today, classification of documents into negative, neutral, or positive remains a key task within the analysis of text tonality/sentiment. There are several methods for the automatic analysis of text sentiment. The method based on network models, the most linguistically sound, to our viewpoint, allows us take into account the syntagmatic connections of words. Also, it utilizes the assumption that not all words in a text are equivalent; some words have more weight and cast higher impact upon the tonality of the text than others. We see it natural to represent a text as a network for sentiment studies, especially in the case of short texts where grammar structures play a higher role in formation of the text pragmatics and the text cannot be seen as just “a bag of words”. We propose a method of text analysis that combines using a lexical mask and an efficient clustering mechanism. In this case, cluster analysis is one of the main methods of typology which demands obtaining formal rules for calculating the number of clusters. The choice of a set of clusters and the moment of completion of the clustering algorithm depend on each other. We show that cluster analysis of data from an n-dimensional vector space using the “single linkage” method can be considered a discrete random process. Sequences of “minimum distances” define the trajectories of this process. “Approximation-estimating test” allows establishing the Markov moment of the completion of the agglomerative clustering process.

Communications in Computer and Information Science, 2018

Ad hoc discussions have been gaining a growing amount of attention in scholarly discourse. But ea... more Ad hoc discussions have been gaining a growing amount of attention in scholarly discourse. But earlier research has raised doubts in comparability of ad hoc discussions in social media, as they are formed by unstable, affective, and hardly predictable issue publics. We have chosen inter-ethnic conflicts in the USA, Germany, France, and Russia (six cases altogether, from Ferguson riots to the attack against Charlie Hebdo) to see whether similar patterns are found in the discussion structure across countries, cases, and vocabulary sets. Choosing degree distribution as the structural proxy for differentiating discussion types, we show that exponents change in the same manner across cases if the discussion density changes, this being true for neutral vs. affective hashtags, as well as hashtags vs. hashtag conglomerates. This adds to our knowledge on comparability of ad hoc discussions online, as well as on structural differences between core and periphery in them.

Human Interaction, Emerging Technologies and Future Systems V, 2021

Internet Science, 2018

The article looks at one of the factors that may cast impact upon interest to educational program... more The article looks at one of the factors that may cast impact upon interest to educational programs of today’s universities in various regions of the world, namely at effective web presence of a university in the global information space. To successfully advance the university in the World Wide Web, efficient interaction with the global networked audience is necessary.

Proceedings of the International Conference on Electronic Governance and Open Society: Challenges in Eurasia, 2016

Understanding the mechanisms of visual perception is important in the context of both media resea... more Understanding the mechanisms of visual perception is important in the context of both media research and its applications in design practice. Within the functional approach to interface design, eye tracking is an established method to analyze interface efficacy. At the same time, in today's media design, many rules have been established by practitioners and remain untested. In this mixed-method study, we combine web crawling, web analytics and heat map analysis based on eye tracking, and qualitative usability analysis of composite-graphic model of a website. We check whether eye tracking test results (numeric data and heat map analysis) correlate to usability of key pages of a large website, as measured qualitatively according to recommendations of leading design literature. Among large web spaces, university website clusters represent a special type and suit well for our analysis, as they unite very different publics and are multi-task. We elaborate and pre-test the methodology on three sites of leading universities in the USA and Russia (Harvard University, Moscow State University and St.Petersburg State University). Our results suggest that there is no direct link between design-based elements of page usability and numeric eye tracking data, but heat maps show correlation with design quality; this means we need to continue checking the suggested methodology on larger number of assessors.

Qualitative studies, such as sociological research, opinion analysis, or media studies, can benef... more Qualitative studies, such as sociological research, opinion analysis, or media studies, can benefit greatly from automated topic mining provided by topic models such as LDA. However, examples of qualitative studies that employ topic modelling as a tool are currently few and far between. In this work, we identify two important problems along the way to using topic models in qualitative studies: lack of a good quality metric that closely matches human judgement in understanding topics and the need to indicate specific subtopics that a specific qualitative study may be most interested in mining. For the first problem, we propose a new quality metric, tf-idf coherence, that reflects human judgement more accurately than regular coherence, and conduct an experiment to verify this claim. For the second problem, we propose an interval semi-supervised LDA approach, or ISLDA, in which certain predefined sets of keywords (that define the topics researchers are interested in) are restricted to specific intervals of topic assignments. We also present a case study on a Russian Livejournal dataset aimed at ethnicity discourse analysis. In Russia of 2006–2013, grassroots political activity and its radicalization have both risen rapidly. Urban ethnic conflicts with Caucasian-origin resettlers and emigrants from ex-Soviet states, including Moscow of 2010–2013, have created new research agendas focused on mapping and prognosis of current ethnic attitudes. The project aims at mapping current ethnic attitudes in the Russianlanguage Livejournal community by big data research methods, Livejournal selected for its ‘platform for Runet elite’ image. To interpret the topics, framing theory is used: ‘politicized vs. ritualized ethnicities’ semi-automated frame analysis and manual discourse analysis are deployed, altogether forming a mixed quality-assessment method for LDA topics. Factors influencing ‘politicization/ritualization’ are suggested.

2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), 2019

Proceedings of the 3rd International Conference on Applications in Information Technology, 2018

Internet Science, 2019

Social Media + Society, 2021

Social Computing and Social Media: Experience Design and Social Network Analysis, 2021

Social Media + Society, 2021

Design, User Experience, and Usability: Theory, Methodology, and Management, 2017

International Journal of Communication, 2017

Future Internet, 2021

Internet Science, 2019

Communications in Computer and Information Science, 2018

Human Interaction, Emerging Technologies and Future Systems V, 2021

Internet Science, 2018

Proceedings of the International Conference on Electronic Governance and Open Society: Challenges in Eurasia, 2016