Properties of language networks in Japanese Wikipedia (original) (raw)

Comparison of the language networks from literature and blogs

In this paper we present the comparison of the linguistic networks from literature and blog texts. The linguistic networks are constructed from texts as directed and weighted co-occurrence networks of words. Words are nodes and links are established between two nodes if they are directly co-occurring within the sentence. The comparison of the networks structure is performed at global level (network) in terms of: average node degree, average shortest path length, diameter, clustering coefficient, density and number of components. Furthermore, we perform analysis on the local level (node) by comparing the rank plots of in and out degree, strength and selectivity. The selectivity-based results point out that there are differences between the structure of the networks constructed from literature and blogs.

Complex Networks in Different Languages: A Study of an Emergent Multilingual Encyclopedia

Unifying Themes in Complex Systems, 2010

There is an increasing interest to the study of complex networks in an interdisciplinary way. Language, as a complex network, has been a part of this study due to its importance in human life. Moreover, the Internet has also been at the center of this study by making access to large amounts of information possible. With these ideas in mind, this work aims to evaluate conceptual networks in different languages with the data from a large and open source of information in the Internet, namely Wikipedia. As an evolving multilingual encyclopedia that can be edited by any Internet user, Wikipedia is a good example of an emergent complex system. In this paper, different * This work is partially supported by Bogazici University Research Fund under the grant number 06A105.

Analysis of the semantic network structure of Japanese word associations

2007

This paper presents some network analyses of a large-scale semantic network representation of the Japanese Word Association Database (JWAD). Version 1 of the JWAD consists of the word association responses made to a selection of approximately 2,100 basic Japanese kanji and words from up to 50 respondents. Graph representation and graph theory techniques are particularly promising methods for detecting and perceiving the intricate patterns of connectivity within largescale linguistic knowledge resources. This paper focuses on the structure of the JWAD association network representation from the perspectives of two important statistical features; namely, the distribution in node connections and the clustering coefficient as an index of the interconnectivity strength between neighboring nodes. The developed association network is shown to exhibit scale-free characteristics and a pattern of sparse connectivity. The Recurrent Markov Clustering (RMCL) method of graph clustering is also applied to the JWAD representation. The RMCL improves on van Dongen's Markov Clustering algorithm by making it possible to adjust the proportions of cluster sizes, thereby providing greater control over the sizes of concept domains by modifying graph granularity. RMCL clustering yields structurally simpler network representations which can be utilized in hierarchically organizing semantic spaces. Accordingly, the technique is especially useful for the visualization of large-scale linguistic resources.

] 1 0 A ug 2 00 5 The network of concepts in written texts

2005

Complex network theory is used to investigate the structure of meaningful concepts in written texts of individual authors. Networks have been constructed after a two phase filtering, where words with less meaning contents are eliminated, and all remaining words are set to their canonical form, without any number, gender or time flexion. Each sentence in the text is added to the network as a clique. A large number of written texts have been scrutinized, and its found that texts have small-world as well as scale-free structures. The growth process of these networks has also been investigated, and a universal evolution of network quantifiers have been found among the set of texts written by distinct authors. Further analyzes, based on shuffling procedures taken either on the texts or on the constructed networks, provide hints on the role played by the word frequency and sentence length distributions to the network structure. Since the meaningful words are related to concepts in the aut...

Construction and analysis of the word network based on the Random Reading Frame (RRF) method

Network Biology, 2021

In present study, a method was developed to construct and analyze the word network. The core of the method is Random Reading Frame (RRF) method. First, download or collect word files (in various formats, e.g., pdf, txt, doc, docx, rtf, html, etc.) from internet or local machine in terms of the concerned topics. All files were then combined in a final text file. Excepting for splitting words and stop words, all words were arranged in a word vector following their orders in the combined text file. In the RRF method, for a given pair of unique words (x, y), x, y<-{u 1 ,u 2 ,…,u m }, a reading frame with randomly changeable width is randomly placed on the vector to count the respective number of the two words in the frame. Randomly repeating the procedure p times, the paired numbers are thus achieved: (x 1 , y 1), (x 2 , y 2), …, (x p , y p). In such a way, the paired numbers for all pairs of unique words are achieved. Thereafter, for a given pair of unique words (x, y), Pearson correlation and Pearson partial correlation, Spearman rank correlation, or point correlation is used to calculate their correlation value according to their paired numbers (x 1 , y 1), (x 2 , y 2), …, (x p , y p), and the statistically significance can be determined by t-test (Pearson correlation, Pearson partial correlation, Spearman rank correlation) or χ 2-test (point correlation). In such a way, all statistically significant word pairs are achieved in terms of the correlation measure chosen by user. Finally, the word network, in terms of the correlation measure chosen, can be constructed based on these word pairs, and no links between statistically insignificant word pairs. Network analysis is conducted for the word network constructed from significant between-word positive correlations among all unique words. Word centrality measures, word tree, word chains, word modules, etc., can be calculated in the method. The Matlab software, wordNetwork for the method was given also.

Network Analysis for Wikipedia

Network analysis is a quantitative methodology for studying properties related to connectivity and distances in graphs, with diverse applications like citation indexing and information retrieval on the Web. The hyperlinked structure of Wikipedia and the ongoing, incremental editing process behind it make it an interesting and unexplored target domain for network analysis techniques.

The network of concepts in written texts

European Physical Journal B, 2006

Complex network theory is used to investigate the structure of meaningful concepts in written texts of individual authors. Networks have been constructed after a two phase filtering, where words with less meaning contents are eliminated and all remaining words are set to their canonical form, without any number, gender or time flexion. Each sentence in the text is added to the network as a clique. A large number of written texts have been scrutinised, and it is found that texts have small-world as well as scale-free structures. The growth process of these networks has also been investigated, and a universal evolution of network quantifiers have been found among the set of texts written by distinct authors. Further analyses, based on shuffling procedures taken either on the texts or on the constructed networks, provide hints on the role played by the word frequency and sentence length distributions to the network structure.

Wikipedias as complex networks

arXiv (Cornell University), 2006

Wikipedia is a popular web-based encyclopedia edited freely and collaboratively by its users. In this paper we present an analysis of Wikipedias in several languages as complex networks. The hyperlinks pointing from one Wikipedia article to another are treated as directed links while the articles represent the nodes of the network. We show that many network characteristics are common to different language versions of Wikipedia, such as their degree distributions, growth, topology, reciprocity, clustering, assortativity, path lengths and triad significance profiles. These regularities, found in the ensemble of Wikipedias in different languages and of different sizes, point to the existence of a unique growth process. We also compare Wikipedias to other previously studied networks.

Network of words

Artificial Life and Robotics, 2004

Chinese characters form a web of semantic and phonetic links between characters. The links can be observed as structural components in an ideographic structure of characters. We examined the structural principles underlying the semantic and phonetic network by studying the statistical properties of the network structure. We report our preliminary results about the distribution of the number of links from components and between characters, and other quantities.