Wikipedias as complex networks (original) (raw)

Wikipedias: Collaborative web-based encyclopedias as complex networks

Physical Review E, 2006

Wikipedia is a popular web-based encyclopedia edited freely and collaboratively by its users. In this paper we present an analysis of Wikipedias in several languages as complex networks. The hyperlinks pointing from one Wikipedia article to another are treated as directed links while the articles represent the nodes of the network. We show that many network characteristics are common to different language versions of Wikipedia, such as their degree distributions, growth, topology, reciprocity, clustering, assortativity, path lengths and triad significance profiles. These regularities, found in the ensemble of Wikipedias in different languages and of different sizes, point to the existence of a unique growth process. We also compare Wikipedias to other previously studied networks.

Network Analysis for Wikipedia

Network analysis is a quantitative methodology for studying properties related to connectivity and distances in graphs, with diverse applications like citation indexing and information retrieval on the Web. The hyperlinked structure of Wikipedia and the ongoing, incremental editing process behind it make it an interesting and unexplored target domain for network analysis techniques.

Social networks of Wikipedia

2010

Wikipedia, the free online encyclopedia anyone can edit, is a live social experiment: millions of individuals volunteer their knowledge and time to collective create it. It is hence interesting trying to understand how they do it. While most of the attention concentrated on article pages, a less known share of activities happen on user talk pages, Wikipedia pages where a message can be left for the specific user. This public conversations can be studied from a Social Network Analysis perspective in order to highlight the structure of the "talk" network. In this paper we focus on this preliminary extraction step by proposing different algorithms. We then empirically validate the differences in the networks they generate on the Venetian Wikipedia with the real network of conversations extracted manually by coding every message left on all user talk pages. The comparisons show that both the algorithms and the manual process contain inaccuracies that are intrinsic in the freedom and unpredictability of Wikipedia growth. Nevertheless, a precise description of the involved issues allows to make informed decisions and to base empirical findings on reproducible evidence. Our goal is to lay the foundation for a solid computational sociology of wikis. For this reason we release the scripts encoding our algorithms as open source and also some datasets extracted out of Wikipedia conversations, in order to let other researchers replicate and improve our initial effort.

Connecting every bit of knowledge: The structure of Wikipedia's First Link Network

Journal of Computational Science, 2017

Apples, porcupines, and the most obscure Bob Dylan song-is every topic a few clicks from Philosophy? Within Wikipedia, the surprising answer is yes: nearly all paths lead to Philosophy. Wikipedia is the largest, most meticulously indexed collection of human knowledge ever amassed. More than information about a topic, Wikipedia is a web of naturally emerging relationships. By following the first link in each article, we algorithmically construct a directed network of all 4.7 million articles: Wikipedia's First Link Network. Here, we study the English edition of Wikipedia's First Link Network for insight into how the many articles on inventions, places, people, objects, and events are related and organized. By traversing every path, we measure the accumulation of first links, path lengths, groups of path-connected articles, and cycles. We also develop a new method, traversal funnels, to measure the influence each article exerts in shaping the network. Traversal funnels provides a new measure of influence for directed networks without spill-over into cycles, in contrast to traditional network centrality measures. Within Wikipedia's First Link Network, we find scale-free distributions describe path length, accumulation, and influence. Far from dispersed, first links disproportionately accumulate at a few articles-flowing from specific to general and culminating around fundamental notions such as Community, State, and Science. Philosophy directs more paths than any other article by two orders of magnitude. We also observe a gravitation towards topical articles such as Health Care and Fossil Fuel. These findings enrich our view of the connections and structure of Wikipedia's ever growing store of knowledge.

Properties of language networks in Japanese Wikipedia

The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems, 2012

Linguistic activity is highly complicated things that is produced from human brain. When the topic which is written or spoken becomes difficult, the produced sentence and article become more complex. Traditional analysis of the linguistic activity was based on the word frequency in use. Recently, the analysis based on the relation between word usage is attracting attention. These relation can be represented by network called "language networks." Many findings from the research of complex networks can be applied to this area. In this study, we investigate cooccurrence networks that are made from Wikipedia's article. Several network indices are used to classify the co-occurrence networks. We found that the co-occurrence networks made from the similar categories show the similarities in terms of indices.

Network analysis of collaboration structure in Wikipedia

Proceedings of the 18th international conference on World wide web, 2009

In this paper we give models and algorithms to describe and analyze the collaboration among authors of Wikipedia from a network analytical perspective. The edit network encodes who interacts how with whom when editing an article; it significantly extends previous network models that code author communities in Wikipedia. Several characteristics summarizing some aspects of the organization process and allowing the analyst to identify certain types of authors can be obtained from the edit network. Moreover, we propose several indicators characterizing the global network structure and methods to visualize edit networks. It is shown that the structural network indicators are correlated with quality labels of the associated Wikipedia articles.

Temporal Analysis of the Wikigraph

2006

Wikipedia (www.wikipedia.org) is an online encyclopedia, available in more than 100 languages and comprising over 1 million articles in its English version. If we consider each Wikipedia article as a node and each hyperlink between articles as an arc we have a "Wikigraph", a graph that represents the link structure of Wikipedia.

Complex Networks in Different Languages: A Study of an Emergent Multilingual Encyclopedia

Unifying Themes in Complex Systems, 2010

There is an increasing interest to the study of complex networks in an interdisciplinary way. Language, as a complex network, has been a part of this study due to its importance in human life. Moreover, the Internet has also been at the center of this study by making access to large amounts of information possible. With these ideas in mind, this work aims to evaluate conceptual networks in different languages with the data from a large and open source of information in the Internet, namely Wikipedia. As an evolving multilingual encyclopedia that can be edited by any Internet user, Wikipedia is a good example of an emergent complex system. In this paper, different * This work is partially supported by Bogazici University Research Fund under the grant number 06A105.