Cultural Diversity of Quality of Information on Wikipedias (original) (raw)

Cultural Configuration of Wikipedia: Measuring Autoreferentiality in Different Languages

Proceedings of recent advances in natural language processing: Hissar, Bulgaria, 2011

Among the motivations to write in Wikipedia given by the current literature there is often coincidence, but none of the studies presents the hypothesis of contributing for the visibility of the own national or language related content. Similar to topical coverage studies, we outline a method which allows collecting the articles of this content, to later analyse them in several dimensions. To prove its uni-versality, the tests are repeated for up to twenty language editions of Wikipedia. Finally , through the best indicators from each dimension we obtain an index which represents the degree of autoreferentiality of the encyclopedia. Last, we point out the impact of this fact and the risk of not considering its existence in the design of applications based on user generated content.

Global perspective on Wikipedia research

Proceedings of The Asist Annual Meeting, 2008

This panel will provide a global perspective on Wikipedia research. The literature on Wikipedia is mostly anecdotal, and most of the research has focused attention primarily on the English Wikipedia examining the accuracy of entries compared to established online encyclopedias (Emigh & Herring, 2005; Giles, 2005; Rosenzweig, 2006) and analyzing the evolution of articles over time (Viégas, Wattenberg, & Dave, 2004; Viégas, Wattenberg, Kriss, & van Ham, 2007). Others have examined the quality of contribution (Stvilia et al., 2005). However, only a few studies have conducted comparative analyses across languages or analyzed Wikipedia in languages other than English (e.g., Pfeil, Zaphiris, & Ang, 2006). There is a need for international, cross-cultural understanding of Wikipedia. In an effort to address this gap, this panel will present a range of international and cross-cultural research of Wikipedia.The presenters will contribute different perspectives of Wikipedia as an international sociocultural institution and will describe similarities and differences across various national/language versions of Wikipedia. Shachaf and Hara will present variation of norms and behaviors on talk pages in various languages of Wikipedia. Herring and Callahan will share results from a cross-language comparison of biographical entries that exhibit variations in content of entries in the English and Polish versions of Wikipedia and will explain how they are influenced by the culture and history of the US and Poland. Stvilia will discuss some of the commonalities and variability of quality models used by different Wikipedias, and the problems of cross-language quality measurement aggregation and reasoning. Matei will describe the social structuration and distribution of roles and efforts in wiki teaching environments. Solomon's comments, as a discussant, will focus on how these comparative insights provide evidence of the ways in which an evolving institution, such as Wikipedia, may be a force for supporting cultural identity (or not).

Quality and Importance of Wikipedia Articles in Different Languages

Communications in Computer and Information Science, 2016

This article aims to analyse the importance of the Wikipedia articles in different languages (English, French, Russian, Polish) and the impact of the importance on the quality of articles. Based on the analysis of literature and our own experience we collected measures related to articles, specifying various aspects of quality that will be used to build the models of articles importance. For each language version, the influential parameters are selected that may allow automatic assessment of the validity of the article. Links between articles in different languages offer opportunities in terms of comparison and verification of the quality of information provided by various Wikipedia communities. Therefore, the model can be used not only for a relative assessment of the content of the whole article, but also for a relative assessment of the quality of data contained in their structural parts, the so-called infoboxes.

The Struggle of Small and Non-Western Wikipedia Editions

2018

The online encyclopedia Wikipedia has become one of the most influential Internet platforms on the World Wide Web and is currently the sixth-most visited website overall. For smaller languages, creating their own Wikipedia editions can constitute a tremendous boost to their general online presence. This paper investigates whether Wikipedia’s internal structure and culture is really inclusive in its treatment and representation of minority, endangered, regional, and non-Western languages. The paper argues that Wikipedia and, indeed, the Internet itself favor Western, mainstream languages and content and thus make it almost impossible for smaller languages to achieve a meaningful online presence.

Interactions of Cultures and Top People of Wikipedia from Ranking of 24 Language Editions

PLOS ONE, 2015

Wikipedia is a huge global repository of human knowledge, that can be leveraged to investigate interwinements between cultures. With this aim, we apply methods of Markov chains and Google matrix, for the analysis of the hyperlink networks of 24 Wikipedia language editions, and rank all their articles by PageRank, 2DRank and CheiRank algorithms. Using automatic extraction of people names, we obtain the top 100 historical figures, for each edition and for each algorithm. We investigate their spatial, temporal, and gender distributions in dependence of their cultural origins. Our study demonstrates not only the existence of skewness with local figures, mainly recognized only in their own cultures, but also the existence of global historical figures appearing in a large number of editions. By determining the birth time and place of these persons, we perform an analysis of the evolution of such figures through 35 centuries of human history for each language, thus recovering interactions and entanglement of cultures over time. We also obtain the distributions of historical figures over world countries, highlighting geographical aspects of cross-cultural links. Considering historical figures who appear in multiple editions as interactions between cultures, we construct a network of cultures and identify the most influential cultures according to this network.

Wikipedia Culture Gap: Quantifying Content Imbalances Across 40 Language Editions

The online encyclopedia Wikipedia is the largest general information repository created through collaborative efforts from all over the globe. Despite the project's goal being to achieve the sum of human knowledge, there are strong content imbalances across the language editions. In order to quantify and investigate these imbalances, we study the impact of cultural context in 40 language editions. To this purpose, we developed a computational method to identify articles that can be related to the editors' cultural context associated to each Wikipedia language edition. We employed a combination of strategies taking into account geolocated articles, specific keywords and categories, as well as links between articles. We verified the method's quality with manual assessment and found an average precision of 0.92 and an average recall of 0.95. The results show that about a quarter of each Wikipedia language edition is dedicated to represent the corresponding cultural context. Although a considerable part of this content was created during the first years of the project, its creation is sustained over time. An analysis of cross-language coverage of this content shows that most of it is unique in its original language, and reveals special links between cultural contexts; at the same time, it highlights gaps where the encyclopedia could extend its content. The approach and findings presented in this study can help to foster participation and inter-cultural enrichment of Wikipedias. The datasets produced are made available for further research.

Examining Wikipedia across linguistic and temporal borders

The Web has grown to be an integral part of modern society offering novel ways for humans to communicate, interact, and share information. New collaborative platforms are forming which are providing individuals with new communities and knowledge bases and, at the same time, offering insights into human activity for researchers, policy-makers and engineers. On a global scale, the role of cultural and language barriers when studying such phenomena becomes particularly relevant and presents significant challenges: due to insufficient information, it is often hard to establish the cultural or language groups in which individuals belong, while there are technical difficulties in establishing the relevance and in analysing resources in different languages. This paper presents a framework to the end of addressing those issues by leveraging data on the use of Wikipedia. Resources available in different languages are explicitly correlated in Wikipedia along with time-stamped logs of access to its articles. This paper provides a framework to enable temporal page views in Wikipedia to be associated with specific geographic profiles. This framework is then used to examine the exchange of information between the English speaking and Chinese speaking localities and reports initial findings on the role of language and culture in diffusion in this context.

Cultural Identities in Wikipedias

In this paper we study identity-based motivation in Wikipedia as a drive for editors to act congruently with their cultural identity values by contributing with content related to them. To assess its influence, we developed a computational method to identify articles related to the cultural identities associated to a language and applied it to 40 Wikipedia language editions. The results show that about a quarter of each Wikipedia language edition is dedicated to represent the corresponding cultural identities. The topical coverage of these articles reflects that geography, biographies, and culture are the most common themes, although each language shows its idiosyncrasy and other topics are also present. The majority of these articles remain exclusive to each language, which is consistent with the idea that a Cultural Identity is defined in relation to others; as entangled and separated. An analysis of how this content is shared among language editions reveals special links between cultures. The approach and findings presented in this study can help to foster participation and inter-cultural enrichment of Wikipedias. The datasets produced in this study are made available for further research.

Relative Quality and Popularity Evaluation of Multilingual Wikipedia Articles

2017

Despite the fact that Wikipedia is often criticized for its poor quality, it continues to be one of the most popular knowledge base in the world. Articles in this free encyclopedia on various topics can be created and edited in about 300 different language versions independently. Our research showed that in language sensitive topics quality of information can be relatively better in the relevant language versions. However, in most cases it is difficult for the Wikipedia readers to determine the language affiliation of the described subject. Additionally, each language edition of Wikipedia can have own rules in manual assessing of the content quality. This makes automatic quality comparison of articles between various languages a challenging task. The paper presents results of relative quality and popularity assessment of over 28 million articles in 44 selected language versions. In addition, a comparative analysis of the quality and popularity of articles in some topics was conducte...