The website «Minority languages of Russia»: visualizing language data (original) (raw)

Computational Linguistics and Intellectual Technologies: Russian Minority Languages on the Web: Descriptive statistics

The paper presents quantitative data about the web segments in minority languages of Russia. An ad-hoc search procedure allows to locate sites and pages on social networks that contain texts in a certain language of Rus-sia. According to our data, there are texts in at least 48 of the examined languages on the Internet. We compared the gathered statistical data with the data from Wikipedia and the number of native speakers and found out that none of the " live " online data has a good correlation with the offline-life of language. Keywords: minority languages, web as a resource, social networks, sociolinguistics Acknowledgements: we thank Kirill Reshetnikov and Daria Ignatenko for collecting lexical markers for some of the languages and Dmitry Granovsky for his assistance in proofreading the text.

A data-based classification of Slavic languages: Indices of qualitative variation applied to grapheme frequencies

2015

The Ord's graph is a simple graphical method for displaying frequency distributions of data or theoretical distributions in the two-dimensional plane. Its coordinates are proportions of the first three moments, either empirical or theoretical ones. A modification of the Ord's graph based on proportions of indices of qualitative variation is presented. Such a modification makes the graph applicable also to data of categorical character. In addition, the indices are normalized with values between 0 and 1, which enables comparing data files divided into different numbers of categories. Both the original and the new graph are used to display grapheme frequencies in eleven Slavic languages. As the original Ord's graph requires an assignment of numbers to the categories, graphemes were ordered decreasingly according to their frequencies. Data were taken from parallel corpora, i.e., we work with grapheme frequencies from a Russian novel and its translations to ten other Slavic languages. Then, cluster analysis is applied to the graph coordinates. While the original graph yields results which are not linguistically interpretable, the modification reveals meaningful relations among the languages. M. Koščová and J. Mačutek were supported by VEGA grant 2/0047/15.

Tishkov V. A. Languages of a Nation // Herald of the Russian Academy of Sciences, 2016, Vol. 86, No. 2, pp. 64–76.

Herald of the Russian Academy of Sciences, 2016

—Two postulates about the role of ethnic diversity and the fate of languages in the world are revised on the basis of Russian materials. The author makes the following conclusions: (a) the ethnic fragmentation of the population and language diversity of the countries in the world do not correlate directly with their levels of democracy, presence of conflicts, and economic success and (b) widely publicized predictions about the quick extinction of most languages in the world have turned out to be a myth, and international campaigns and declarations in support of endangered languages were excessively politicized. The process of revitaliza tion of languages is under way; they are acquiring a higher status, acknowledgment, and support on the terri tory of the former Soviet Union, including the minority languages of the peoples of Dagestan, the North, and Siberia. The state policy of providing an official status for regional languages and the ethnic component of the federative system as ethnocultural autonomy for individual regions and ethnic communities play a key role. A list of endangered languages is given; motives and factors of assimilation in favor of the Russian language in Russia are explained. Categories and social practices based on them, such as mother tongue and national language are revised in favor of multiply and mutually nonexclusive approaches.

The World's Languages Explorer: Visual Analysis of Language Features in Genealogical and Areal Contexts

2012

This paper presents a novel Visual Analytics approach that helps linguistic researchers to explore the world's languages with respect to several important tasks: (1) The comparison of manually and automatically extracted language features across languages and within the context of language genealogy, (2) the exploration of interrelations among several of such features as well as their homogeneity and heterogeneity within subtrees of the language genealogy, and (3) the exploration of genealogical and areal influences on the features. We introduce the WORLD'S LANGUAGES EXPLORER, which provides the required functionalities in one single Visual Analytics environment. Contributions are made for different parts of the system: We introduce an extended Sunburst visualization whose so-called feature-rings allow for a cross-comparison of a large number of features at once, within the hierarchical context of the language genealogy. We suggest a mapping of homogeneity measures to all levels of the hierarchy. In addition, we suggest an integration of information from the areal data space into the hierarchical data space. With our approach we bring Visual Analytics research to a new application field, namely Historical Comparative Linguistics, and Linguistic and Areal Typology. Finally, we provide evidence of the good performance of our system in this area through two application case studies conducted by domain experts.

Matrёška and Areal Clusters Involving Varieties of Slavic On Methodology and Data Treatment

This article is based on the assumption that any “linguistic area is an arbitrarily defined areal subsample of the global sample that can be shown to deviate significantly from other areal subsamples or the global sample“ (Wiemer/Wälchli 2012: 6, following Dahl 2001). In order to “see” significant deviance, one has however to use a suitable background (in terms of geography and/or populations), and the granularity of the features must fit with the scope of areal comparison. Furthermore, it proves problematic to point at ‘the typical representative of Slavic’ or to give substance to ‘common Slavic heritage’. This is demonstrated on the basis of convenience samples. Furthermore, the article addresses the question of how useful circumstantialist descriptions of features encountered in different Slavic-speaking regions are, and it critically assesses some basic questions related to the quantification of inner-Slavic differentiation. The methodological claims made are substantiated by case studies dealing mainly with morphosyntax (participle-based constructions, use of adpositions and cases, so-called case syncretism, reflexive marking), but also, very briefly, with phonology (spirantization of /g/).

Interculturality in the Modern Russian Linguistic Landscape

Philological Class, 2021

The purpose of this article is to give a quick overview of intercultural tendencies in certain Russian regions’ modern linguistic landscapes: where they can be found, why languages other than Russian are used, what the purpose of their use is, and who uses them. The material for this study includes several thousand photos taken between 2010 and 2018 in different regions of Russia, representing advertising material and signboards where different languages and cultures meet. Methodologically, the photos were classified and analyzed according to the types of code-switching and hybrid structures appearing in and on them. Some history is given on the cities studied, as well as the state of the languages that are part of their linguistic repertory. A few particular situations are scrutinized, involving national republics and other areas where linguistic minorities exist (major cities, provinces, villages). A strong tendency for the use of foreign culture was evident in the findings all ov...

Cultural and Linguistic Minorities in the Russian Federation and the European Union

This is the first comprehensive volume to compare the sociolinguistic situations of minorities in Russia and in Western Europe. As such, it provides insight into language policies, the ethnolinguistic vitality and the struggle for reversal of language shift, language revitalization and empowerment of minorities in Russia and the European Union. The volume shows that, even though largely unknown to a broader English-reading audience, the linguistic composition of Russia is by no means less diverse than multilingualism in the EU. It is therefore a valuable introduction into the historical backgrounds and current linguistic, social and legal affairs with regard to Russia’s manifold ethnic and linguistic minorities, mirrored on the discussion of recent issues in a number of well-known Western European minority situations. Content Level » Research Keywords » Aboriginal culture in Northern Russia - Basque language - Finnic minorities of Ingria - Frisian - Global biodiversity in the early 21st century - Global extinction of languages - Languages in the Russian Federation - Latgalian - Linguistic Rights of National Groups - Minority languages and cultures - Scottish Gaelic - Sociolinguistic ethnolinguistic variation - Sorbian languages in Germany - Sámi languages in Finland - languages in Mari El - languages in Udmurtia - linguistic and cultural diversity - minority language speakers - revitalization of endangered languages - saami languages

'Invisible minorities' and 'hidden diversity' in Saint-Petersburg's linguistic landscape

Language and Communication, 2019

The article deals with representation of labour migrants' languages in St. Petersburg's linguistic landscape. The data analyzed in the article were gathered through fieldwork (in 2016–2017) in different districts of the city. The communication between the majority and ethnic minorities is conducted only in Russian, both in official and in informal exchanges, such as between commercial agencies and non-Russian speakers. Even in places with no official regulation, non-Russian languages' use is significantly rare and occurs predominantly in the frame of in-group communication. Only two languages, Chinese and Uzbek, occasionally can be used in advertisements but targeted exclusively to minorities. Both official language policy and attitudes of ethnic majority tend to ignore actual diversity of the city, maintaining urban monolingual 'façade'. The full text is available here https://www.sciencedirect.com/science/article/pii/S0271530918302180?via=ihub

On New Text Corpora For Minority Languages On The Helsinki korp.csc.fi Server

2019

ON NEW TEXT CORPORA FOR MINORITY LANGUAGES ON THE HELSINKI KORP.CSC.FI SERVER The korp.csc.fi server in Finland provides text corpora of multiple varieties for numerous languages large and small. The Korp infrastructure is developed by the Swedish Språkbanken in the University and Gothenburg, and the source code is released under MIT license. Open nature of the systems makes it easily transferred into new environments, and there are already numerous Korp installations available. The one we discuss is maintained by the Language Bank of Finland.

A graphic representation of language distribution in multilingual societies

Most Europeans are monolingual, but multilingualism is a frequent phenomenon in other world regions, and there are important exceptions in Europe itself. Multilingualism arises in societies where different languages or language communities coexist in specific constellations. One such pattern of language distribution is diglossia, where languages (or varieties of one language) are chosen depending on the formality of the situation and the intimacy with the interlocutor. Alternatively, there can be segregation, where individual speakers or communities have a preferential language which they aim to use exclusively as long as they do not transgress community borders. The two parameters register and population allow for a graphic representation of different multilingual situations. The basic patterns are minorized languages, prestige languages, mixed patterns, unstable patterns, official languages as lingua franca, or several languages for formal purposes. Since the important works of Ferguson (1959), Fishman (1967), Stewart (1968), and others, the situation has changed drastically in some of Europe’s multilingual corners. We will look at the situation in Luxembourg, where the traditional pattern of language distribution has become unstable, and we will see an example from outside Europe, the ABC islands, where Papiamentu, Dutch, English and Spanish co-exist. My graphic representation aims at schematizing this complex interplay between linguistic varieties in multilingual societies for a general audience. Finally, a global outlook will show that languages can be classified into five groups with respect to their chances to be used in formal situations. In such a hierarchy, English is at the top, followed by a handful of prestigious languages, whereas thousands of “small” languages have no access at all to formal situations.