Sergei Monakhov | Friedrich-Schiller-Universität Jena (original) (raw)

Papers by Sergei Monakhov

Research paper thumbnail of Complex Words as Shortest Paths in the Network of Lexical Knowledge

Cognitive Science, 2024

Lexical models diverge on the question of how to represent complex words. Under the morpheme‐base... more Lexical models diverge on the question of how to represent complex words. Under the morpheme‐based approach, each morpheme is treated as a separate unit, while under the word‐based approach, morphological structure is derived from complex words. In this paper, we propose a new computational model of morphology that is based on graph theory and is intended to elaborate the word‐based network approach. Specifically, we use a key concept of network science, the notion of shortest path, to investigate how complex words are learned, stored, and processed. The notion of shortest path refers to the task of finding the shortest or most optimal path connecting two non‐adjacent nodes in a network. Building on this notion, the current study shows (i) that new complex words can be segmented into morphemes through the shortest path analysis; (ii) that attested English words tend to represent the shortest paths in the morphological network; and (iii) that novel (unattested) words receive higher acceptability ratings in experiments when they are formed along established optimal paths. The model's performance is tested in two experiments with human participants as well as against the behavioral data from the English Lexicon Project. We interpret our empirical results from the perspective of a usage‐based model of grammar and argue that network science provides a powerful tool for analyzing language structure.

Research paper thumbnail of Parsability revisited and reassessed

Journal of Linguistics, 2024

This paper provides evidence that the inveterate way of assessing linguistic items' degrees of an... more This paper provides evidence that the inveterate way of assessing linguistic items' degrees of analysability by calculating their derivation to base frequency ratios may obfuscate the difference between two meaning processing models: one based on the principle of compositionality and another on the principle of parsability. I propose to capture the difference between these models by estimating the ratio of two transitional probabilities for complex words: P (affix | base) and P (base | affix). When transitional probabilities are comparably low, each of the elements entering into combination is equally free to vary. The combination itself is judged by speakers to be semantically transparent, and its derivational element tends to be more linguistically productive. In contrast, multi-morphemic words that are characterised by greater discrepancies between transitional probabilities are similar to collocations in the sense that they also consist of a node (conditionally independent element) and a collocate (conditionally dependent element). Such linguistic expressions are also considered to be semantically complex but appear less transparent because the collocate's meaning does not coincide with the meaning of the respective free element (even if it exists) and has to be parsed out from what is available.

Research paper thumbnail of How Complex Verbs Acquire Their Idiosyncratic Meanings

Language and Speech, 2023

Complex verbs with the same preverb/prefix/particle that is both linguistically productive and an... more Complex verbs with the same preverb/prefix/particle that is both linguistically productive and analyzable can be compositional as well as non-compositional in meaning. For example, the English on has compositional spatial uses (put a hat on) but also a non-spatial "continuative" use, where its semantic contribution is consistent with multiple verbs (we played / worked / talked on despite the interruption). Comparable examples can be given with German preverbs or Russian prefixes, which are the main data analyzed in the present paper. The preverbs/prefixes/particles that encode noncompositional, construction-specific senses have been extensively studied; however, it is still far from clear how their semantic idiosyncrasies arise. Even when one can identify the contribution of the base, it is counterintuitive to assign the remaining sememes to the preverb/prefix/particle part. Therefore, on one hand, there seems to be an element without meaning, and on the other, there is a word sense that apparently comes from nowhere. In this article, I suggest analyzing compositional and non-compositional complex verbs as instantiations of two different types of constructions: one with an open slot for the preverb/prefix/particle and a fixed base verb and another with a fixed preverb/prefix/particle and an open slot for the base verb. Both experimental and corpus evidence supporting this decision is provided for Russian data. I argue that each construction implies its own meaning-processing model and that the actual choice between the two can be predicted by taking into account the discrepancy in probabilities of transition from preverb/prefix/particle to base and from base to preverb/prefix/particle.

Research paper thumbnail of Terminological subsystems of modern Russian school textbooks: A study based on Word2Vec and neural networks

Journal of applied linguistics and lexicography, 2021

The article reports the results of the study that explored the inventory and functioning of scien... more The article reports the results of the study that explored the inventory and functioning of scientific terms and special lexemes in textbooks for Russia's secondary schools. The toolset included modern methods of natural language processing and deep learning. The number of terms from different fields of knowledge that a secondary school student should learn has never been evaluated. According to the preliminary evaluations based on the Model Basic Curriculum for General and Secondary Education 2015, a secondary school leaver is supposed to be able to understand, recognise and use about 1,000 terms and terminological combinations in the subject Russian Language alone. Thus, taking into account the number of school subjects, the total number of special vocabulary studied in general education schools is measured in thousands. At the same time, the comparative characteristics of the inventory and functioning of terms in textbooks for different school subjects are under-scrutinized and remain unknown. Besides, it is unclear how the terminological density of school textbooks for different subjects correlates with the place occupied by these subjects in the curriculum. A p p l i e d L i n g u i s t i c s

Research paper thumbnail of The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies

Scientific Data, 2020

Advances in computer-assisted linguistic research have been greatly influential in reshaping ling... more Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world’s languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing stude...

Research paper thumbnail of Acquisition of demonstratives in cross-linguistic perspective

Journal of Child Language, 2022

This paper examines the acquisition of demonstratives (e.g., that, there) from a cross-linguistic... more This paper examines the acquisition of demonstratives (e.g., that, there) from a cross-linguistic perspective. Although demonstratives are often said to play a crucial role in L1 acquisition, there is little systematic research on this topic. Using extensive corpus data of spontaneous child speech, the paper investigates the emergence and development of demonstratives in three European (English, French, Spanish) and four non-European languages (Japanese, Chinese, Hebrew, Indonesian) between age 1;0 and 6;0. The data show that, across languages, demonstratives are among the earliest and most frequent child words, but their frequency decreases with age and MLU. As children grow older, they tend to use other types of referring terms (e.g., anaphoric pronouns) and other types of spatial expressions (e.g., adpositions). Considering these results, we hypothesize that children shift from using a body-oriented strategy of deictic communication to more abstract and disembodied strategies of encoding reference and space during the preschool years.

Research paper thumbnail of Understanding Troll Writing as a Linguistic Phenomenon

The current study yielded a number of important findings. We managed to build a neural network th... more The current study yielded a number of important findings. We managed to build a neural network that achieved an accuracy score of 91 per cent in classifying troll and genuine tweets. By means of regression analysis, we identified a number of features that make a tweet more susceptible to correct labelling and found that they are inherently present in troll tweets as a special type of discourse. We hypothesised that those features are grounded in the sociolinguistic limitations of troll writing, which can be best described as a combination of two factors: speaking with a purpose and trying to mask the purpose of speaking. Next, we contended that the orthogonal nature of these factors must necessarily result in the skewed distribution of many different language parameters of troll messages. Having chosen as an example distribution of the topics and vocabulary associated with those topics, we showed some very pronounced distributional anomalies, thus confirming our prediction.

Research paper thumbnail of Collective language creativity as a trade-off between priming and antipriming

PLOS ONE, 2021

It is now a matter of scientific consensus that priming, a recency effect of activation in memory... more It is now a matter of scientific consensus that priming, a recency effect of activation in memory, has a significant impact on language users' choice of linguistic means. However, it has long remained unclear how priming effects coexist with the creative aspect of language use, and the importance of the latter has been somewhat downplayed. By introducing the results of two experiments, for English and Russian native speakers, this paper seeks to explain the mechanisms establishing balance of priming and language creativity. In study 1, I discuss the notion of collective language creativity that I understand as a product of two major factors interacting: cognitive priming effects and the unsolicited desire of the discourse participants to be linguistically creative, that is, to say what one wants to say using the words that have not yet been used. In study 2, I explore how priming and antipriming effects work together to produce collective language creativity. By means of cluster analysis and Bayesian network modelling, I show that patterns of repetition for both languages differ drastically depending on whether participants of the experiment had to communicate their messages being or not being able to see what others had written before them.

Research paper thumbnail of New Method of Automated Terminology Extraction: Case Study of Russian-Language Textbooks

Lecture Notes in Networks and Systems

Research paper thumbnail of How analysis of mobile app reviews problematises linguistic approaches to internet troll detection

Humanities and Social Sciences Communications, 2021

State-sponsored internet trolls repeat themselves in a unique way. They have a small number of me... more State-sponsored internet trolls repeat themselves in a unique way. They have a small number of messages to convey but they have to do it multiple times. Understandably, they are afraid of being repetitive because that will inevitably lead to their identification as trolls. Hence, their only possible strategy is to keep diluting their target message with ever-changing filler words. That is exactly what makes them so susceptible to automatic detection. One serious challenge to this promising approach is posed by the fact that the same troll-like effect may arise as a result of collaborative repatterning that is not indicative of any malevolent practices in online communication. The current study addresses this issue by analysing more than 180,000 app reviews written in English and Russian and verifying the obtained results in the experimental setting where participants were asked to describe the same picture in two experimental conditions. The main finding of the study is that both observational and experimental samples became less troll-like as the time distance between their elements increased. Their 'troll coefficient' calculated as the ratio of the proportion of repeated content words among all content words to the proportion of repeated content word pairs among all content word pairs was found to be a function of time distance between separate individual contributions. These findings definitely render the task of developing efficient linguistic algorithms for internet troll detection more complicated. However, the problem can be alleviated by our ability to predict what the value of the troll coefficient of a certain group of texts would be if it depended solely on these texts' creation time.

Research paper thumbnail of Изучение терминологических подсистем современных школьных учебников на русском языке с помощью модели анализа семантики естественных языков Word2Vec

Journal of Applied Linguistics and Lexicography, 2021

Цель исследования, первые результаты которого представлены в настоящей статье, — анализ состава и... more Цель исследования, первые результаты которого представлены в настоящей статье, — анализ состава и особенностей функционирования терминологической лексики в учебниках для средней школы Российской Федерации с помощью методов и средств компьютерной лингвистики. Количество терминов из разных областей знания, которое школьник должен усвоить за время обучения в средней школе, никогда не подвергалось оценке. По предварительным подсчетам, произведенным на материале Примерной основной образовательной программы общего и среднего образования 2015 года только в части предмета «Русский язык», ученик в 5–11 классах средней школы должен понимать, распознавать и уметь употреблять около 1000 терминов и терминологических сочетаний из этой сферы знания. Таким образом, учитывая количество школьных дисциплин, общее число единиц специальной лексики, изучаемых в общеобразовательной школе, измеряется тысячами. В то же время сопоставительные характеристики состава и функционирования терминов в учебниках для разных школьных предметов не изучены и остаются неизвестными. Неясна корреляция между терминологической плотностью учебного текста в школьных учебниках по разным предметам и местом, занимаемым этими предметами в учебных планах. Традиционным способом вычленения терминов из специальных текстов является их просмотр и «ручное» формирование соответствующих перечней. При надежности такого способа в отношении интеллектуализации принципов отбора он плохо приложим к большим массивам данных и не отражает ни частотность употребления терминов, ни специфику их синтагматических связей, ни системные отношения между терминами, формируемые их сочетаемостным поведением. Реализация описываемого проекта предусматривает создание полнотекстового корпуса на материале текстов школьных учебников 5–11 классов, включенных в Федеральный перечень Министерства просвещения РФ, автоматическое вычленение и стратификацию терминов при помощи методов дистрибутивной семантики, создание и обучение глубокой нейросети, способной по поданной на вход группе векторных представлений терминов определить учебную дисциплину, уровень обучения и учебную тему. Результаты исследования могут представлять теоретический интерес в перспективе развития терминоведения и иметь практическое применение при создании школьной учебной литературы разных типов.

Research paper thumbnail of Russian prefixed verbs as constructional schemas

Russian Linguistics, 2021

This study tests the morphological gradience theory on Russian prefixed verbs. With the help of a... more This study tests the morphological gradience theory on Russian prefixed verbs. With the help of a specially designed experiment, in which participants were asked to evaluate the semantic transparency of a prefixed nonse verb given in minimal context, as well as to semanticise it by suggesting an existing Russian verb with the same prefix, we offer evidence that these verbs can be analysed as constructional schemas and that the degree of their morphological decomposition depends upon the different levels of activation of their sequential and lexical links. We prove that speakers of Russian are very sensitive to the etymological connection between verb prefixes and the prepositions they are related to. Thus, prefix-stem constructions with prefixes that correspond to prepositions are more likely to be morphologically decomposed, while prefix-stem constructions with prefixes that do not relate to prepositions tend to be regarded as single lexical units. Moreover, the general, highly abstract semantics of Russian prefix-stem constructions, especially of those that retain their 'prepositional' meaning, is undoubtedly accessible to language users, which is confirmed by the fact that the interpretability of these constructions is affected by priming.

Research paper thumbnail of Early detection of internet trolls: Introducing an algorithm based on word pairs / single words multiple repetition ratio

PLOS ONE, 2020

Troll internet messages, especially those posted on Twitter, have recently been recognised as a v... more Troll internet messages, especially those posted on Twitter, have recently been recognised as a very powerful weapon in hybrid warfare. Hence, an important task for the academic community is to provide a tool for identifying internet troll accounts as quickly as possible. At the same time, this tool must be highly accurate so that its employment will not violate peo-ple's rights and affect the freedom of speech. Though such a task can be effectively fulfilled on purely linguistic grounds, as of yet, very little work has been done that could help to explain the discourse-specific features of this type of writing. In this paper, we suggest a quantitative measure for identifying troll messages which is based on taking into account certain sociolinguistic limitations of troll speech, and discuss two algorithms that both require as few as 50 tweets to establish the true nature of the tweets, whether 'genuine' or 'troll-like'.

Research paper thumbnail of Understanding Troll Writing as a Linguistic Phenomenon

Intelligent Systems and Applications, 2021

The current study yielded a number of important findings. We built a neural network that achieved... more The current study yielded a number of important findings. We built a neural network that achieved an accuracy score of 91% in classifying troll and genuine tweets. By means of regression analysis, we identified a number of features that make a tweet more susceptible to correct labelling and found that they are inherently present in troll tweets as a special type of discourse. We hypoth-esised that those features are grounded in the sociolinguistic limitations of troll writing, which can be best described as a combination of two factors: speaking with a purpose and trying to mask the purpose of speaking. Next, we contended that the orthogonal nature of these factors must necessarily result in the skewed distribution of language parameters of troll messages. Having chosen as an example distribution of the topics and vocabulary associated with them, we showed some very pronounced distributional anomalies, thus confirming our prediction.

Research paper thumbnail of One mechanism of Russian poetic language

Journal of Applied Linguistics and Lexicography, 2020

Traditionally, the phenomenon of the semantic aura of the verse metre was regarded exclusively as... more Traditionally, the phenomenon of the semantic aura of the verse metre was regarded exclusively as historically determined; the question of a potential synaesthesia (the imitative potential possessed by the rhythmic structure of a poetic text) was essentially disregarded. This paper aims to approach the problem of "metre and meaning" from the perspective of possible actualisation of certain language forms in the metrical structures of binary and ternary metres; in other words, to analyse how the metrical nature of verse determines its basic semantic model. We have come to the conclusion that the fundamental difference between Russian binary and ternary metres lies in the level of rhythmical prominence of metrically dual words, the majority of which are pronouns. The very structure of binary metres suggests a constant possibility for pronouns to be in proximity to an unstressed syllable and to receive more or less heavy stress. In ternary metres pronouns find themselves inside the circle of metrical stresses and, being inevitably adjacent to either the preceding or the following one, lose their accent and are swallowed during pronunciation. The latter, in turn, results in weakening of deictic and anaphoric language functions and undermines the established logic of textual development. That is where different, i. e., poetic, mechanisms of creating meaning come to the fore. Ternary metres put rhythmic stress on notional words, creating-in accordance with the law of poetic analogy and via omission of intermediary elements-linguistically unpredictable associations between them; binary metres emphasise semi-notional and functional words, stressing the logical and grammatical order of text development.

Research paper thumbnail of English and Russian Genitive Alternations: A Study in Construction Typology

Russian Journal of Linguistics, 2020

There is little doubt that one of the most important areas of future research within the framewor... more There is little doubt that one of the most important areas of future research within the framework of Construction Grammar will be the comparative study of constructions in different languages of the world. One significant gain that modern Construction Grammar can make thanks to the cross-linguistic perspective is finding a clue to some contradictory cases of construction alternation. The aim of the present paper is to communicate the results of a case study of two pairs of alternating constructions in English and Russian: s-genitive (SG) and of-genitive (OG) in English and noun + noun in genitive case (NNG) and relative adjective derived from noun + noun (ANG) in Russian. It is evident that the long years of elaborate scientific analysis have not yielded any universally accepted view on the problem of English genitive alternation. There are at least five different accounts of this problem: the hypotheses of the animacy hierarchy, given-new hierarchy, topic-focus hierarchy, end-weight principle, and two semantically distinct constructions. We hypothesised that in this case the comparison of the distribution of two English and two Russian genitives could be insightful. The analysis presupposed two consecutive steps. First, we established an inter-language comparability of two pairs of constructions in English and Russian. Second, we tested the similarity of intra-language distribution of each pair of constructions from the perspective of the animacy hierarchy. For these two purposes, two types of corpora were used: (1) a translation corpus consisting of original texts in one language and their translations into one or more languages; and (2) national corpora consisting of original texts in two respective languages. It was established that in both languages, the choice between members of an alternating pair is governed by the rules of animacy hierarchisa-tion. Additionally, it was possible to disprove the idea that the animacy hierarchy is necessarily based on the linearisation hierarchy. Two Russian constructions are typologically aligned with their English counterparts, not on the grounds of the linear order of head and modifier but on the grounds of structural similarity. The English SG and Russian NNG construction are diametrically opposed in terms of word order. However, they reveal the same underlying structure of the inflectional genitive as contrasted with the analytical genitive of the Russian ANG and the English OG. These findings speak strongly in favour of the animacy hierarchy account of English genitive alternation.

Research paper thumbnail of Горе от ума by Alexander Griboedov in the Light of Debates about Russian Ballads in the 1810–1820s

In his poetic practice, Griboedov successfully followed the principles he introduced in his criti... more In his poetic practice, Griboedov successfully followed the principles he introduced in his critical response to Gnedich. To be precise, he used all the available literary context as a resource to describe characters and situations. It is interesting to analyse the connection between different ballad plots represented in this comedy. Griboedov takes the image of a sentimental man from «Людмила», a motif of a dead man in a dream from «Светлана», and a motif of playing music at night from «Эолова арфа», the motifs of a «ball of the dead» from a wide range of resources. That brings us to the conclusion that the scheme of the opposition between the Archaists and the Innovators offered by Tynianov looks like an example of extreme simplification that is rather harmful than beneficial. Facts are presented so as to give an impression that the debate at the beginning of the 19th century was about different genres and words: the Archaists liked one genre, while the Innovators liked another one. Eventually, one gets an impression that the development of the Russian literature was determined by Katenin’s «летучая сволочь» [flying beast], which is as funny as, according to Leo Tolstoi, thinking that the outcome of the Battle of Borodino was predetermined by a valet who forgot to give Napoleon his warm boots. In fact, as we can see, it was not certain genres or expressions that Griboedov objected to. He objected to the very fact of separating the word from the subject it defined by some sort of stylistic prism.

Research paper thumbnail of Поэтика дисгармонической неточности: цикл сатир Н. А. Некрасова «О погоде»

Статья сфокусирована на одной характерной особенности поэтики Н. А. Некрасова, которая в наибольш... more Статья сфокусирована на одной характерной особенности поэтики Н. А. Некрасова, которая в наибольшей степени привлекала внимание исследователей-формалистов, видевших в ней своеобразное переосмысление стилистических форм поэзии первой трети XIX века. Как показывают наши наблюдения над циклом сатир «О погоде», некрасовское пародирование шаблонов предыдущих литературных традиций является лишь следствием применения характерного для поэта принципа тематически-экспрессивного построения текста. Потребность вновь и вновь обращаться по ходу развития лирического сюжета к подобным словам, избегая откровенных повторов, ставит Некрасова перед необходимостью поиска синонимических и парафрастических конструкций, в том числе и принадлежащих к разным стилевым пластам. Легко понять, почему стихи Некрасова усыпаны обломками разных стилевых образований. Если в предшествующих некрасовской литературных системах текст развертывается вертикально — на основе подбора слов и оборотов, закрепленных за тем или иным стилем или жанром, то у Некрасова строение организуется горизонтально — по линии экспрессивной окрашенности выражений. В результате некрасовский текст как бы пронизывает обособленные вертикали более ранних поэтических эпох, представляя их в едином разрезе — негативной экспрессивной окраски слова.

Research paper thumbnail of О вертикальном ритме трехсложников Н. А. Некрасова

The paper aims to provide analysis of the principles of vertical (within-the-stanza) rhythmic org... more The paper aims to provide analysis of the principles of vertical (within-the-stanza) rhythmic organisation of the Russian poet N. Nekrasov’s ternary trimeters. The problem is formulated as follows: are there any distinct patterns in the arrangement of word boundaries in different lines of a quatrain and, if so, how do they correspond with the arrangement of word boundaries within a line?

The frequency distribution of the most simple, paired, vertical combinations of word boundaries dependent on the rhyme scheme (in pairs of odd and even lines of AbAb quatrain) reveals a contrasting opposition of Nekrasov’s anapest and amphibrach; it encompasses both the tendencies in the distribution of different types of word boundaries and the strong positions where these tendencies are actualized.

Thus, the feminine and the dactylic word boundaries, on the one hand, and the masculine word boundary, on the other hand, are opposed in the first strong position of anapest and the second strong position of amphibrach: 1) the number of feminine and dactylic word boundaries in anapest increases in the second line of each pair, while the number of masculine word boundaries decreases; 2) the number of feminine and dactylic word boundaries in amphibrach, on the contrary, decreases in the second line of each pair, while the number of masculine word boundaries increases.

In the second strong position of anapest and the first strong position of amphibrach three different patterns can be distinguished: 1) the distribution of “foot-constituting” word boundaries – the masculine in anapest and the feminine in amphibrach – is governed by the same principle: they are less frequent in the third line of the quatrain than in the first one and more frequent in the fourth line of the quatrain than in the second one; 2) the distribution of dactylic word boundaries in anapest and masculine word boundaries in amphibrach is governed by the same principle: their frequency decreases in the second lines of both pairs in the quatrain; 3) the distribution of feminine word boundaries in anapest and dactylic word boundaries in amphibrach is governed by the same principle: their frequency increases in the second lines of both pairs in the quatrain.

These tendencies suggest that the vertical (within-the-stanza) arrangement of word boundaries in Nekrasov’s anapaest and amphibrach reproduces at a higher, stanzaic, level the same “double-pitched” rhythmic movement that has been earlier identified at the horizontal (within-the-line) level: in accordance with it, the first word boundary determines the beginning of the line (anapest uses the masculine one, amphibrach the feminine / dactylic one), while the second word boundary is contrasted with the first on the grounds of “ascending” and “descending” rhythmical movement (anapest uses the feminine / dactylic one, amphibrach the masculine one) and tends to coincide with a clausula.

In other words, the word boundaries rhythms of line and stanza in Nekrasov’s poetry are homogeneous.

Research paper thumbnail of Жанрово-стилевые модели в поэме Н. В. Гоголя "Мертвые души"

В статье предлагается характеристика принципов организации пространства-времени в мирах пяти гого... more В статье предлагается характеристика принципов организации пространства-времени в мирах пяти гоголевских помещиков. Формулируется вывод о том, что предшествующая литературная традиция, на которую ориентировался Гоголь, не столько предопределила жанровые и стилистические принципы поэмы, сколько вошла в нее в качестве предмета описания, иначе говоря, как важнейшая — пространствообразующая характеристика создаваемых в поэме миров. Делается предположение, что Гоголь располагал имения, посещенные Чичиковым, с учетом определенной хронологической последовательности смены в русской литературе (считая с начала XIX века) тех "жанрово-стилевых моделей", по законам которых устроены его поместья: Манилов — сентиментальная модель, Коробочка — идиллическая, Ноздрёв — романтическая, Собакевич — народно-эпическая, Плюшкин — натуралистическая.

Research paper thumbnail of Complex Words as Shortest Paths in the Network of Lexical Knowledge

Cognitive Science, 2024

Lexical models diverge on the question of how to represent complex words. Under the morpheme‐base... more Lexical models diverge on the question of how to represent complex words. Under the morpheme‐based approach, each morpheme is treated as a separate unit, while under the word‐based approach, morphological structure is derived from complex words. In this paper, we propose a new computational model of morphology that is based on graph theory and is intended to elaborate the word‐based network approach. Specifically, we use a key concept of network science, the notion of shortest path, to investigate how complex words are learned, stored, and processed. The notion of shortest path refers to the task of finding the shortest or most optimal path connecting two non‐adjacent nodes in a network. Building on this notion, the current study shows (i) that new complex words can be segmented into morphemes through the shortest path analysis; (ii) that attested English words tend to represent the shortest paths in the morphological network; and (iii) that novel (unattested) words receive higher acceptability ratings in experiments when they are formed along established optimal paths. The model's performance is tested in two experiments with human participants as well as against the behavioral data from the English Lexicon Project. We interpret our empirical results from the perspective of a usage‐based model of grammar and argue that network science provides a powerful tool for analyzing language structure.

Research paper thumbnail of Parsability revisited and reassessed

Journal of Linguistics, 2024

This paper provides evidence that the inveterate way of assessing linguistic items' degrees of an... more This paper provides evidence that the inveterate way of assessing linguistic items' degrees of analysability by calculating their derivation to base frequency ratios may obfuscate the difference between two meaning processing models: one based on the principle of compositionality and another on the principle of parsability. I propose to capture the difference between these models by estimating the ratio of two transitional probabilities for complex words: P (affix | base) and P (base | affix). When transitional probabilities are comparably low, each of the elements entering into combination is equally free to vary. The combination itself is judged by speakers to be semantically transparent, and its derivational element tends to be more linguistically productive. In contrast, multi-morphemic words that are characterised by greater discrepancies between transitional probabilities are similar to collocations in the sense that they also consist of a node (conditionally independent element) and a collocate (conditionally dependent element). Such linguistic expressions are also considered to be semantically complex but appear less transparent because the collocate's meaning does not coincide with the meaning of the respective free element (even if it exists) and has to be parsed out from what is available.

Research paper thumbnail of How Complex Verbs Acquire Their Idiosyncratic Meanings

Language and Speech, 2023

Complex verbs with the same preverb/prefix/particle that is both linguistically productive and an... more Complex verbs with the same preverb/prefix/particle that is both linguistically productive and analyzable can be compositional as well as non-compositional in meaning. For example, the English on has compositional spatial uses (put a hat on) but also a non-spatial "continuative" use, where its semantic contribution is consistent with multiple verbs (we played / worked / talked on despite the interruption). Comparable examples can be given with German preverbs or Russian prefixes, which are the main data analyzed in the present paper. The preverbs/prefixes/particles that encode noncompositional, construction-specific senses have been extensively studied; however, it is still far from clear how their semantic idiosyncrasies arise. Even when one can identify the contribution of the base, it is counterintuitive to assign the remaining sememes to the preverb/prefix/particle part. Therefore, on one hand, there seems to be an element without meaning, and on the other, there is a word sense that apparently comes from nowhere. In this article, I suggest analyzing compositional and non-compositional complex verbs as instantiations of two different types of constructions: one with an open slot for the preverb/prefix/particle and a fixed base verb and another with a fixed preverb/prefix/particle and an open slot for the base verb. Both experimental and corpus evidence supporting this decision is provided for Russian data. I argue that each construction implies its own meaning-processing model and that the actual choice between the two can be predicted by taking into account the discrepancy in probabilities of transition from preverb/prefix/particle to base and from base to preverb/prefix/particle.

Research paper thumbnail of Terminological subsystems of modern Russian school textbooks: A study based on Word2Vec and neural networks

Journal of applied linguistics and lexicography, 2021

The article reports the results of the study that explored the inventory and functioning of scien... more The article reports the results of the study that explored the inventory and functioning of scientific terms and special lexemes in textbooks for Russia's secondary schools. The toolset included modern methods of natural language processing and deep learning. The number of terms from different fields of knowledge that a secondary school student should learn has never been evaluated. According to the preliminary evaluations based on the Model Basic Curriculum for General and Secondary Education 2015, a secondary school leaver is supposed to be able to understand, recognise and use about 1,000 terms and terminological combinations in the subject Russian Language alone. Thus, taking into account the number of school subjects, the total number of special vocabulary studied in general education schools is measured in thousands. At the same time, the comparative characteristics of the inventory and functioning of terms in textbooks for different school subjects are under-scrutinized and remain unknown. Besides, it is unclear how the terminological density of school textbooks for different subjects correlates with the place occupied by these subjects in the curriculum. A p p l i e d L i n g u i s t i c s

Research paper thumbnail of The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies

Scientific Data, 2020

Advances in computer-assisted linguistic research have been greatly influential in reshaping ling... more Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world’s languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing stude...

Research paper thumbnail of Acquisition of demonstratives in cross-linguistic perspective

Journal of Child Language, 2022

This paper examines the acquisition of demonstratives (e.g., that, there) from a cross-linguistic... more This paper examines the acquisition of demonstratives (e.g., that, there) from a cross-linguistic perspective. Although demonstratives are often said to play a crucial role in L1 acquisition, there is little systematic research on this topic. Using extensive corpus data of spontaneous child speech, the paper investigates the emergence and development of demonstratives in three European (English, French, Spanish) and four non-European languages (Japanese, Chinese, Hebrew, Indonesian) between age 1;0 and 6;0. The data show that, across languages, demonstratives are among the earliest and most frequent child words, but their frequency decreases with age and MLU. As children grow older, they tend to use other types of referring terms (e.g., anaphoric pronouns) and other types of spatial expressions (e.g., adpositions). Considering these results, we hypothesize that children shift from using a body-oriented strategy of deictic communication to more abstract and disembodied strategies of encoding reference and space during the preschool years.

Research paper thumbnail of Understanding Troll Writing as a Linguistic Phenomenon

The current study yielded a number of important findings. We managed to build a neural network th... more The current study yielded a number of important findings. We managed to build a neural network that achieved an accuracy score of 91 per cent in classifying troll and genuine tweets. By means of regression analysis, we identified a number of features that make a tweet more susceptible to correct labelling and found that they are inherently present in troll tweets as a special type of discourse. We hypothesised that those features are grounded in the sociolinguistic limitations of troll writing, which can be best described as a combination of two factors: speaking with a purpose and trying to mask the purpose of speaking. Next, we contended that the orthogonal nature of these factors must necessarily result in the skewed distribution of many different language parameters of troll messages. Having chosen as an example distribution of the topics and vocabulary associated with those topics, we showed some very pronounced distributional anomalies, thus confirming our prediction.

Research paper thumbnail of Collective language creativity as a trade-off between priming and antipriming

PLOS ONE, 2021

It is now a matter of scientific consensus that priming, a recency effect of activation in memory... more It is now a matter of scientific consensus that priming, a recency effect of activation in memory, has a significant impact on language users' choice of linguistic means. However, it has long remained unclear how priming effects coexist with the creative aspect of language use, and the importance of the latter has been somewhat downplayed. By introducing the results of two experiments, for English and Russian native speakers, this paper seeks to explain the mechanisms establishing balance of priming and language creativity. In study 1, I discuss the notion of collective language creativity that I understand as a product of two major factors interacting: cognitive priming effects and the unsolicited desire of the discourse participants to be linguistically creative, that is, to say what one wants to say using the words that have not yet been used. In study 2, I explore how priming and antipriming effects work together to produce collective language creativity. By means of cluster analysis and Bayesian network modelling, I show that patterns of repetition for both languages differ drastically depending on whether participants of the experiment had to communicate their messages being or not being able to see what others had written before them.

Research paper thumbnail of New Method of Automated Terminology Extraction: Case Study of Russian-Language Textbooks

Lecture Notes in Networks and Systems

Research paper thumbnail of How analysis of mobile app reviews problematises linguistic approaches to internet troll detection

Humanities and Social Sciences Communications, 2021

State-sponsored internet trolls repeat themselves in a unique way. They have a small number of me... more State-sponsored internet trolls repeat themselves in a unique way. They have a small number of messages to convey but they have to do it multiple times. Understandably, they are afraid of being repetitive because that will inevitably lead to their identification as trolls. Hence, their only possible strategy is to keep diluting their target message with ever-changing filler words. That is exactly what makes them so susceptible to automatic detection. One serious challenge to this promising approach is posed by the fact that the same troll-like effect may arise as a result of collaborative repatterning that is not indicative of any malevolent practices in online communication. The current study addresses this issue by analysing more than 180,000 app reviews written in English and Russian and verifying the obtained results in the experimental setting where participants were asked to describe the same picture in two experimental conditions. The main finding of the study is that both observational and experimental samples became less troll-like as the time distance between their elements increased. Their 'troll coefficient' calculated as the ratio of the proportion of repeated content words among all content words to the proportion of repeated content word pairs among all content word pairs was found to be a function of time distance between separate individual contributions. These findings definitely render the task of developing efficient linguistic algorithms for internet troll detection more complicated. However, the problem can be alleviated by our ability to predict what the value of the troll coefficient of a certain group of texts would be if it depended solely on these texts' creation time.

Research paper thumbnail of Изучение терминологических подсистем современных школьных учебников на русском языке с помощью модели анализа семантики естественных языков Word2Vec

Journal of Applied Linguistics and Lexicography, 2021

Цель исследования, первые результаты которого представлены в настоящей статье, — анализ состава и... more Цель исследования, первые результаты которого представлены в настоящей статье, — анализ состава и особенностей функционирования терминологической лексики в учебниках для средней школы Российской Федерации с помощью методов и средств компьютерной лингвистики. Количество терминов из разных областей знания, которое школьник должен усвоить за время обучения в средней школе, никогда не подвергалось оценке. По предварительным подсчетам, произведенным на материале Примерной основной образовательной программы общего и среднего образования 2015 года только в части предмета «Русский язык», ученик в 5–11 классах средней школы должен понимать, распознавать и уметь употреблять около 1000 терминов и терминологических сочетаний из этой сферы знания. Таким образом, учитывая количество школьных дисциплин, общее число единиц специальной лексики, изучаемых в общеобразовательной школе, измеряется тысячами. В то же время сопоставительные характеристики состава и функционирования терминов в учебниках для разных школьных предметов не изучены и остаются неизвестными. Неясна корреляция между терминологической плотностью учебного текста в школьных учебниках по разным предметам и местом, занимаемым этими предметами в учебных планах. Традиционным способом вычленения терминов из специальных текстов является их просмотр и «ручное» формирование соответствующих перечней. При надежности такого способа в отношении интеллектуализации принципов отбора он плохо приложим к большим массивам данных и не отражает ни частотность употребления терминов, ни специфику их синтагматических связей, ни системные отношения между терминами, формируемые их сочетаемостным поведением. Реализация описываемого проекта предусматривает создание полнотекстового корпуса на материале текстов школьных учебников 5–11 классов, включенных в Федеральный перечень Министерства просвещения РФ, автоматическое вычленение и стратификацию терминов при помощи методов дистрибутивной семантики, создание и обучение глубокой нейросети, способной по поданной на вход группе векторных представлений терминов определить учебную дисциплину, уровень обучения и учебную тему. Результаты исследования могут представлять теоретический интерес в перспективе развития терминоведения и иметь практическое применение при создании школьной учебной литературы разных типов.

Research paper thumbnail of Russian prefixed verbs as constructional schemas

Russian Linguistics, 2021

This study tests the morphological gradience theory on Russian prefixed verbs. With the help of a... more This study tests the morphological gradience theory on Russian prefixed verbs. With the help of a specially designed experiment, in which participants were asked to evaluate the semantic transparency of a prefixed nonse verb given in minimal context, as well as to semanticise it by suggesting an existing Russian verb with the same prefix, we offer evidence that these verbs can be analysed as constructional schemas and that the degree of their morphological decomposition depends upon the different levels of activation of their sequential and lexical links. We prove that speakers of Russian are very sensitive to the etymological connection between verb prefixes and the prepositions they are related to. Thus, prefix-stem constructions with prefixes that correspond to prepositions are more likely to be morphologically decomposed, while prefix-stem constructions with prefixes that do not relate to prepositions tend to be regarded as single lexical units. Moreover, the general, highly abstract semantics of Russian prefix-stem constructions, especially of those that retain their 'prepositional' meaning, is undoubtedly accessible to language users, which is confirmed by the fact that the interpretability of these constructions is affected by priming.

Research paper thumbnail of Early detection of internet trolls: Introducing an algorithm based on word pairs / single words multiple repetition ratio

PLOS ONE, 2020

Troll internet messages, especially those posted on Twitter, have recently been recognised as a v... more Troll internet messages, especially those posted on Twitter, have recently been recognised as a very powerful weapon in hybrid warfare. Hence, an important task for the academic community is to provide a tool for identifying internet troll accounts as quickly as possible. At the same time, this tool must be highly accurate so that its employment will not violate peo-ple's rights and affect the freedom of speech. Though such a task can be effectively fulfilled on purely linguistic grounds, as of yet, very little work has been done that could help to explain the discourse-specific features of this type of writing. In this paper, we suggest a quantitative measure for identifying troll messages which is based on taking into account certain sociolinguistic limitations of troll speech, and discuss two algorithms that both require as few as 50 tweets to establish the true nature of the tweets, whether 'genuine' or 'troll-like'.

Research paper thumbnail of Understanding Troll Writing as a Linguistic Phenomenon

Intelligent Systems and Applications, 2021

The current study yielded a number of important findings. We built a neural network that achieved... more The current study yielded a number of important findings. We built a neural network that achieved an accuracy score of 91% in classifying troll and genuine tweets. By means of regression analysis, we identified a number of features that make a tweet more susceptible to correct labelling and found that they are inherently present in troll tweets as a special type of discourse. We hypoth-esised that those features are grounded in the sociolinguistic limitations of troll writing, which can be best described as a combination of two factors: speaking with a purpose and trying to mask the purpose of speaking. Next, we contended that the orthogonal nature of these factors must necessarily result in the skewed distribution of language parameters of troll messages. Having chosen as an example distribution of the topics and vocabulary associated with them, we showed some very pronounced distributional anomalies, thus confirming our prediction.

Research paper thumbnail of One mechanism of Russian poetic language

Journal of Applied Linguistics and Lexicography, 2020

Traditionally, the phenomenon of the semantic aura of the verse metre was regarded exclusively as... more Traditionally, the phenomenon of the semantic aura of the verse metre was regarded exclusively as historically determined; the question of a potential synaesthesia (the imitative potential possessed by the rhythmic structure of a poetic text) was essentially disregarded. This paper aims to approach the problem of "metre and meaning" from the perspective of possible actualisation of certain language forms in the metrical structures of binary and ternary metres; in other words, to analyse how the metrical nature of verse determines its basic semantic model. We have come to the conclusion that the fundamental difference between Russian binary and ternary metres lies in the level of rhythmical prominence of metrically dual words, the majority of which are pronouns. The very structure of binary metres suggests a constant possibility for pronouns to be in proximity to an unstressed syllable and to receive more or less heavy stress. In ternary metres pronouns find themselves inside the circle of metrical stresses and, being inevitably adjacent to either the preceding or the following one, lose their accent and are swallowed during pronunciation. The latter, in turn, results in weakening of deictic and anaphoric language functions and undermines the established logic of textual development. That is where different, i. e., poetic, mechanisms of creating meaning come to the fore. Ternary metres put rhythmic stress on notional words, creating-in accordance with the law of poetic analogy and via omission of intermediary elements-linguistically unpredictable associations between them; binary metres emphasise semi-notional and functional words, stressing the logical and grammatical order of text development.

Research paper thumbnail of English and Russian Genitive Alternations: A Study in Construction Typology

Russian Journal of Linguistics, 2020

There is little doubt that one of the most important areas of future research within the framewor... more There is little doubt that one of the most important areas of future research within the framework of Construction Grammar will be the comparative study of constructions in different languages of the world. One significant gain that modern Construction Grammar can make thanks to the cross-linguistic perspective is finding a clue to some contradictory cases of construction alternation. The aim of the present paper is to communicate the results of a case study of two pairs of alternating constructions in English and Russian: s-genitive (SG) and of-genitive (OG) in English and noun + noun in genitive case (NNG) and relative adjective derived from noun + noun (ANG) in Russian. It is evident that the long years of elaborate scientific analysis have not yielded any universally accepted view on the problem of English genitive alternation. There are at least five different accounts of this problem: the hypotheses of the animacy hierarchy, given-new hierarchy, topic-focus hierarchy, end-weight principle, and two semantically distinct constructions. We hypothesised that in this case the comparison of the distribution of two English and two Russian genitives could be insightful. The analysis presupposed two consecutive steps. First, we established an inter-language comparability of two pairs of constructions in English and Russian. Second, we tested the similarity of intra-language distribution of each pair of constructions from the perspective of the animacy hierarchy. For these two purposes, two types of corpora were used: (1) a translation corpus consisting of original texts in one language and their translations into one or more languages; and (2) national corpora consisting of original texts in two respective languages. It was established that in both languages, the choice between members of an alternating pair is governed by the rules of animacy hierarchisa-tion. Additionally, it was possible to disprove the idea that the animacy hierarchy is necessarily based on the linearisation hierarchy. Two Russian constructions are typologically aligned with their English counterparts, not on the grounds of the linear order of head and modifier but on the grounds of structural similarity. The English SG and Russian NNG construction are diametrically opposed in terms of word order. However, they reveal the same underlying structure of the inflectional genitive as contrasted with the analytical genitive of the Russian ANG and the English OG. These findings speak strongly in favour of the animacy hierarchy account of English genitive alternation.

Research paper thumbnail of Горе от ума by Alexander Griboedov in the Light of Debates about Russian Ballads in the 1810–1820s

In his poetic practice, Griboedov successfully followed the principles he introduced in his criti... more In his poetic practice, Griboedov successfully followed the principles he introduced in his critical response to Gnedich. To be precise, he used all the available literary context as a resource to describe characters and situations. It is interesting to analyse the connection between different ballad plots represented in this comedy. Griboedov takes the image of a sentimental man from «Людмила», a motif of a dead man in a dream from «Светлана», and a motif of playing music at night from «Эолова арфа», the motifs of a «ball of the dead» from a wide range of resources. That brings us to the conclusion that the scheme of the opposition between the Archaists and the Innovators offered by Tynianov looks like an example of extreme simplification that is rather harmful than beneficial. Facts are presented so as to give an impression that the debate at the beginning of the 19th century was about different genres and words: the Archaists liked one genre, while the Innovators liked another one. Eventually, one gets an impression that the development of the Russian literature was determined by Katenin’s «летучая сволочь» [flying beast], which is as funny as, according to Leo Tolstoi, thinking that the outcome of the Battle of Borodino was predetermined by a valet who forgot to give Napoleon his warm boots. In fact, as we can see, it was not certain genres or expressions that Griboedov objected to. He objected to the very fact of separating the word from the subject it defined by some sort of stylistic prism.

Research paper thumbnail of Поэтика дисгармонической неточности: цикл сатир Н. А. Некрасова «О погоде»

Статья сфокусирована на одной характерной особенности поэтики Н. А. Некрасова, которая в наибольш... more Статья сфокусирована на одной характерной особенности поэтики Н. А. Некрасова, которая в наибольшей степени привлекала внимание исследователей-формалистов, видевших в ней своеобразное переосмысление стилистических форм поэзии первой трети XIX века. Как показывают наши наблюдения над циклом сатир «О погоде», некрасовское пародирование шаблонов предыдущих литературных традиций является лишь следствием применения характерного для поэта принципа тематически-экспрессивного построения текста. Потребность вновь и вновь обращаться по ходу развития лирического сюжета к подобным словам, избегая откровенных повторов, ставит Некрасова перед необходимостью поиска синонимических и парафрастических конструкций, в том числе и принадлежащих к разным стилевым пластам. Легко понять, почему стихи Некрасова усыпаны обломками разных стилевых образований. Если в предшествующих некрасовской литературных системах текст развертывается вертикально — на основе подбора слов и оборотов, закрепленных за тем или иным стилем или жанром, то у Некрасова строение организуется горизонтально — по линии экспрессивной окрашенности выражений. В результате некрасовский текст как бы пронизывает обособленные вертикали более ранних поэтических эпох, представляя их в едином разрезе — негативной экспрессивной окраски слова.

Research paper thumbnail of О вертикальном ритме трехсложников Н. А. Некрасова

The paper aims to provide analysis of the principles of vertical (within-the-stanza) rhythmic org... more The paper aims to provide analysis of the principles of vertical (within-the-stanza) rhythmic organisation of the Russian poet N. Nekrasov’s ternary trimeters. The problem is formulated as follows: are there any distinct patterns in the arrangement of word boundaries in different lines of a quatrain and, if so, how do they correspond with the arrangement of word boundaries within a line?

The frequency distribution of the most simple, paired, vertical combinations of word boundaries dependent on the rhyme scheme (in pairs of odd and even lines of AbAb quatrain) reveals a contrasting opposition of Nekrasov’s anapest and amphibrach; it encompasses both the tendencies in the distribution of different types of word boundaries and the strong positions where these tendencies are actualized.

Thus, the feminine and the dactylic word boundaries, on the one hand, and the masculine word boundary, on the other hand, are opposed in the first strong position of anapest and the second strong position of amphibrach: 1) the number of feminine and dactylic word boundaries in anapest increases in the second line of each pair, while the number of masculine word boundaries decreases; 2) the number of feminine and dactylic word boundaries in amphibrach, on the contrary, decreases in the second line of each pair, while the number of masculine word boundaries increases.

In the second strong position of anapest and the first strong position of amphibrach three different patterns can be distinguished: 1) the distribution of “foot-constituting” word boundaries – the masculine in anapest and the feminine in amphibrach – is governed by the same principle: they are less frequent in the third line of the quatrain than in the first one and more frequent in the fourth line of the quatrain than in the second one; 2) the distribution of dactylic word boundaries in anapest and masculine word boundaries in amphibrach is governed by the same principle: their frequency decreases in the second lines of both pairs in the quatrain; 3) the distribution of feminine word boundaries in anapest and dactylic word boundaries in amphibrach is governed by the same principle: their frequency increases in the second lines of both pairs in the quatrain.

These tendencies suggest that the vertical (within-the-stanza) arrangement of word boundaries in Nekrasov’s anapaest and amphibrach reproduces at a higher, stanzaic, level the same “double-pitched” rhythmic movement that has been earlier identified at the horizontal (within-the-line) level: in accordance with it, the first word boundary determines the beginning of the line (anapest uses the masculine one, amphibrach the feminine / dactylic one), while the second word boundary is contrasted with the first on the grounds of “ascending” and “descending” rhythmical movement (anapest uses the feminine / dactylic one, amphibrach the masculine one) and tends to coincide with a clausula.

In other words, the word boundaries rhythms of line and stanza in Nekrasov’s poetry are homogeneous.

Research paper thumbnail of Жанрово-стилевые модели в поэме Н. В. Гоголя "Мертвые души"

В статье предлагается характеристика принципов организации пространства-времени в мирах пяти гого... more В статье предлагается характеристика принципов организации пространства-времени в мирах пяти гоголевских помещиков. Формулируется вывод о том, что предшествующая литературная традиция, на которую ориентировался Гоголь, не столько предопределила жанровые и стилистические принципы поэмы, сколько вошла в нее в качестве предмета описания, иначе говоря, как важнейшая — пространствообразующая характеристика создаваемых в поэме миров. Делается предположение, что Гоголь располагал имения, посещенные Чичиковым, с учетом определенной хронологической последовательности смены в русской литературе (считая с начала XIX века) тех "жанрово-стилевых моделей", по законам которых устроены его поместья: Манилов — сентиментальная модель, Коробочка — идиллическая, Ноздрёв — романтическая, Собакевич — народно-эпическая, Плюшкин — натуралистическая.

[Research paper thumbnail of Б. В. Томашевский. Избранные работы о стихе [Комментарий]](

Задача настоящего комментария — помочь студентам, аспирантам филологических факультетов университ... more Задача настоящего комментария — помочь студентам, аспирантам филологических факультетов университетов, преподавателям школ и гимназий, широкому кругу читателей, интересующихся историей и теорией стиха, адекватно понять положения стиховедческих работ Б. В. Томашевского. Именно поэтому преимущественное внимание здесь уделяется научному контексту 1910–1920 х годов, в котором формировались стиховедческие взгляды автора, вырабатывалась методика описания ритмического строя художественной речи. В необходимых случаях приводятся краткие изложения позиций исследователей, которые имеет в виду Томашевский, опираясь на своих предшественников и современников или полемизируя с ними.

Research paper thumbnail of Number and Markedness in Dolakha Newar

This paper dedicated to markedness in Dolakha Newar (Sino-Tibetan; Himalayish; Newar; Dolakha, Ne... more This paper dedicated to markedness in Dolakha Newar (Sino-Tibetan; Himalayish; Newar; Dolakha, Nepal) adopts Elšík and Matras’s (2006) matter-of-fact attitude. We are not interested in labelling some values as marked and some as unmarked. Instead, we aim to explore the interplay of values within different categories and look for possible reasons behind it. For our analysis, we chose the grammatical category of number coded in Dolakha Newar in nouns, pronouns, and verbs.

Research paper thumbnail of P-alignment in Dolakha Newar

This paper discusses the arguments of ditransitive constructions in Dolakha Newar (Sino-Tibetan; ... more This paper discusses the arguments of ditransitive constructions in Dolakha Newar (Sino-Tibetan; Himalayish; Newar; Dolakha, Nepal) with regard to alignment types, coding patterns, and behavioural patterns (ditransitive alternations and lexical splits are not attested in this language). The study’s data comprise all ditransitive constructions with the prototypical verbs ‘give’, ‘show’, ‘teach’, ‘tell’, ‘send’, ‘ask’, and ’bring’.

Research paper thumbnail of Acquisition of basic colour terms by English speaking children

Research paper thumbnail of The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies

Scientific Data, 2020

Advances in computer-assisted linguistic research are greatly influencing and reshaping linguisti... more Advances in computer-assisted linguistic research are greatly influencing and reshaping linguistic investigation. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. In this work we present CLICS, a Database of Cross-Linguistic Colexifications which aims to both tackle interdisciplinary and interconnected research questions as well as showcasing best practices in preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and supplying an updated version with CLICS3 which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing student tasks for database updates. PREPRINT: This draft has not been peer-reviewed, it is currently under review. Download (free access): http://dx.