Anastasiya Lopukhina | National Research University Higher School of Economics (original) (raw)

Other by Anastasiya Lopukhina

Research paper thumbnail of Word Sense Induction Methods: Which One Is Better for Russian

The topic of this study is word sense induction (WSI), that is the automatic discovery of the pos... more The topic of this study is word sense induction (WSI), that is the automatic discovery of the possible senses of a word in text corpora. WSI is a challenging task as there are few examples of WSI being successfully deployed in end-user applications. Our aim is to apply WSI to Russian lexicography as a supporting tool for linguists. For this purpose, we compared the methods previously applied to English: Adaptive Skip-gram (Adagram), Latent Dirichlet Allocation (LDA) as well as several clustering techniques based on word2vec-clustering of contexts, clustering of context words and clustering of synonyms. In this study we quantitatively and a qualitatively evaluated the aforemen-tioned WSI methods for Russian nouns and verbs. For the quantitative evaluation, we measured the similarity of the suggested clustering to the existing dictionary senses with Adjusted Rand Index (ARI) and V-measure scores, using labeled contexts. For the qualitative evaluation, we assessed the interpretability of the derived senses, the number of duplicate senses, the number of mixed senses and derivation of rare senses. The study was performed on 15 nouns using RuWac Internet corpus.

Papers by Anastasiya Lopukhina

Research paper thumbnail of Can Heritage Speakers predict lexical and morphosyntactic in-formation in reading?

Research paper thumbnail of Monolingual and Bilingual Reading Processes in Russian: An Exploratory Scanpath Analysis

Reading Research Quarterly, 2021

Research paper thumbnail of Can Heritage Speakers Predict Lexical and Morphosyntactic Information in Reading?

Languages, 2022

Ample evidence suggests that monolingual adults can successfully generate lexical and morphosynta... more Ample evidence suggests that monolingual adults can successfully generate lexical and morphosyntactic predictions in reading and that correct predictions facilitate sentence comprehension. In this eye-tracking corpus reading study, we investigate whether the same is true for reading in heritage language. Specifically, we ask whether heritage speakers (HSs) of Russian are able to anticipate lexical and/or morphosyntactic information of the upcoming words in the sentence and whether they differ in the predictions from monolingual children and L2 learners. We are also interested in whether the literacy level (i.e., Russian literacy experience or reading fluency in English) influences lexical and morphosyntactic prediction. Our results indicate that HSs as well as other groups were able to anticipate the specific lexical item, and the ability was contingent on the Russian literacy experience and reading fluency in dominant English as evident in some of the early and late eye-tracking me...

Research paper thumbnail of Receptive Language in Primary-School-Aged Children with Autism Spectrum Disorder

Клиническая и специальная психология, 2021

The objective of the present study is to investigate the relationship between the receptive langu... more The objective of the present study is to investigate the relationship between the receptive language, and the index of non-verbal intelligence and the level of severity of autistic disorders in primary-school-aged children with Autism spectrum disorder. One of the main areas influenced by autistic disorders is communication. Therefore, the study of the language abilities of such children and factors that affect them provides a better approach to the therapy and education. The sample included 50 children aged 7–11 years diagnosed with autism spectrum disorders. Children were tested using the KORABLIK method (basic linguistic skills), the Kaufman Assessment Battery for Children (KABC-II) or the Wechsler Intelligence Scale for Children ― Third Edition (WISC-III) (non-verbal intelligence), the Autism Diagnosis Observation Schedule ― Second Edition (ADOS-II) (autistic traits). The results support the hypothesis of the relationship between receptive language skills, the index of non-verba...

Research paper thumbnail of Russian Child Language Assessment Battery (RuCLAB) and its Application in Primary School Children with ASD

Autism and Developmental Disorders, 2021

In speech-language pathology practice, standardized language assessment tools are used to evaluat... more In speech-language pathology practice, standardized language assessment tools are used to evaluate the level of language development and to specify the details of language impairment. For Russian language, a novel Russian Child Language Assessment Battery (RuCLAB) was developed. The RuCLAB provides the assessment of phonology, vocabulary, morphosyntax, and discourse in production and comprehension. Present study aims to describe RuCLAB in detail and to report its application in 7—11 years-old children with Autism Spectrum Disorder (ASD). The results revealed between-group differences in children with and without ASD as well as highlighted some individual features in the group of children with ASD: for example, expressive and receptive patterns differed depending on the linguistic level and non-verbal IQ; also children with ASD (as children with complex language disorders) better acquired nouns in comparison to verbs, and the words’ frequency influenced the accuracy in sentence repet...

Research paper thumbnail of The Impact of Phonological and Orthographic Processing on Reading Speed in Russian-Speaking Children

The Russian Journal of Cognitive Science, 2021

Phonological and orthographic processing are reported to be among the strongest predictors of rea... more Phonological and orthographic processing are reported to be among the strongest predictors of reading development across different Indo-European languages. The relative impact of these factors can be modulated by cross-linguistic script and orthographic differences, as evidenced by many studies in European languages. The present study investigates the effect of phonological and orthographic processing on reading speed in 6- to 12-year-old (1 – 5 grades) Russian-speaking children (N = 117), taking into account age as a factor as well. Phonological and orthographic processing were assessed with behavioral tests. The results revealed that both skills predict reading speed in Russian. Moreover, the age of young readers can also be a non-linguistic predictor of reading speed in Russian, especially in children between 6 and 10 years old. Children aged 10 to 12 also demonstrated some variability in reading speed, although an increase in reading speed was no longer observed.

Research paper thumbnail of Metaphor Is Between Metonymy and Homonymy: Evidence From Event-Related Potentials

Frontiers in Psychology, 2020

Research paper thumbnail of Phonological Neighbourhood Density in Russian Word Production: Evidence from Children and Adults

SSRN Electronic Journal, 2018

Phonological neighbourhood density (PND) refers to the number of words which can be formed from a... more Phonological neighbourhood density (PND) refers to the number of words which can be formed from a given word by substituting, adding or deleting one phoneme. Thus, word with many similar sounding neighbours has a dense neighbourhood, whereas a word with few neighbours or without neighbors has a sparse neighbourhood. Previous studies have shown that dense and sparse neighbourhoods influence word production in different ways. Research in English-speaking adults demonstrated that words with dense neighbourhood are produced faster than words with sparse neighbourhood, facilitating lexical access. At the same time, sparse neighbourhood inhibits word production. Interestingly, studies in Spanish adults showed the reverse effect: dense neighbourhood inhibits word production whereas sparse neighbourhood facilitates it. This cross-linguistic difference in the PND pattern was explained in terms of morphological complexity of Spanish in comparison to English. Although there are numerous studies of the PND effect in adults, some questions remain unknown. For example, how does PND influence word production in morphologically more complex language than Spanish? Or, how does the PND pattern develop in children? The present paper aims to explore these questions.

Research paper thumbnail of Meaning Relatedness in Polysemous and Homonymous Words: An Erp Study in Russian

SSRN Electronic Journal, 2018

Previous research showed that polysemous and homonymous words are processed differently. However,... more Previous research showed that polysemous and homonymous words are processed differently. However, mechanisms underlying processing of ambiguous words are still unclear. The goal of the present study was to investigate comprehension of metonymies, metaphors, and homonyms using priming paradigm and the method of event-related potentials (ERPs). We asked participants to read two-word phrases with ambiguous words and make a sensicality judgement. The results demonstrated the difference between metonymic and metaphorical senses of polysemous words in the amount of priming for the literal sense. The priming effect between metonymic and literal senses supports the idea that these senses share a single representation in the mental lexicon. In contrast, metaphorical senses of polysemous words showed a very limited priming effect on literal senses of the same words. Similar results were observed for different meanings of homonymous words. We conclude that metaphorical senses should have separate representations in the mental lexicon similarly to homonyms. JEL Classification: Z.

Research paper thumbnail of Word Sense Frequency Estimation for Russian: Verbs, Adjectives, and Different Dictionaries

In this paper we investigate several extensions to our prior work on sense frequency estimation f... more In this paper we investigate several extensions to our prior work on sense frequency estimation for Russian. Our method is based on semantic vectors and is able to achieve good accuracy for sense frequency estimation traine d on dictionary entries from the Active Dictionary of Russian and unannotated corpora. We apply our method to verbs and adjectives to obtain sense frequencies for 329 verbs and 256 adjectives in an academic corpus and a web-based corpus. We compare frequency distributions against dictionary sense ordering and between two corpora and find that the first dictionary sense is not the most frequent for almost half of the words we studied. Evaluation of verbs and adjectives shows that frequency estimation error is lower than 15%. We investigate the effect of sense granularity, evaluating how the accuracy of our method changes when applied to more coarse-grained senses. We also investigate if our method can be applied to other dictionaries with less elaborate sense desc...

Research paper thumbnail of Global reading processes in children with high risk of dyslexia: a scanpath analysis

Annals of Dyslexia, 2022

The study presents the first systematic comparison of the global reading processes via scanpath a... more The study presents the first systematic comparison of the global reading processes via scanpath analysis in Russian-speaking children with and without reading difficulties. First, we compared basic eye-movement characteristics in reading sentences in two groups of children in grades 1 to 5 (N = 72 in high risk of developmental dyslexia group and N = 72 in the control group). Next, using the scanpath method, we investigated which global reading processes these children adopt to read the entire sentence and how these processes differ between the groups. Finally, we were interested in the timeframe of the change in the global reading processes from the 1st to the 5th grades for both groups. We found that the main difference in word-level measures between groups was the reading speed reflected in fixation durations. However, the examination of the five identified global reading processes revealed qualitative similarities in reading patterns between groups. Children in the control group ...

Research paper thumbnail of Morphosyntactic but not lexical corpus-based probabilities can substitute for cloze probabilities in reading experiments

PLOS ONE, 2021

During reading or listening, people can generate predictions about the lexical and morphosyntacti... more During reading or listening, people can generate predictions about the lexical and morphosyntactic properties of upcoming input based on available context. Psycholinguistic experiments that study predictability or control for it conventionally rely on a human-based approach and estimate predictability via the cloze task. Our study investigated an alternative corpus-based approach for estimating predictability via language predictability models. We obtained cloze and corpus-based probabilities for all words in 144 Russian sentences, correlated the two measures, and found a strong correlation between them. Importantly, we estimated how much variance in eye movements registered while reading the same sentences was explained by each of the two probabilities and whether the two probabilities explain the same variance. Along with lexical predictability (the activation of a particular word form), we analyzed morphosyntactic predictability (the activation of morphological features of words)...

Research paper thumbnail of Good-enough language processing in speech perception: the impact of age and noise

Good-enough approach to language comprehension assumes that people do not always engage in full d... more Good-enough approach to language comprehension assumes that people do not always engage in full detailed processing of linguistic input. Rather, the parser forms shallow representations when confronted with some difficulty such as complex syntactic structure or noisy input. Although the good-enough approach has been studied for some time, we still do not know what factors trigger this type of processing. In this study, we investigate two factors that might influence the reliance on the good-enough language processing strategy in oral speech perception — age and noise.

Research paper thumbnail of Phonological and orthographic parafoveal processing during silent reading in Russian children and adults

Studies on German and English showed that children and adults can rely on phonological and orthog... more Studies on German and English showed that children and adults can rely on phonological and orthographic information from the parafovea during reading, but this reliance differs between ages and languages. In the present study, we investigated the development of phonological and orthographic parafoveal processing during silent reading in Russian-speaking 8-year-old children, 10-year-old children and adults using gaze-contingent boundary paradigm. The participants read sentences with embedded nouns which were presented in original, pseudohomophone, control for pseudohomophone, transposed-letter and control for transposed-letter conditions in the parafoveal area to assess phonological and orthographic preview benefit effects. The results revealed that 8-year-old children already relied on orthographic information, which was stable in 10-year-old children and adults. The evidence for phonological parafoveal processing was found only in adults, which indicates the development of phonolog...

Research paper thumbnail of Global reading processes in children with high-risk of dyslexia: A scanpath analysis

Research paper thumbnail of Corpus-Based Probabilities Can Substitute for Cloze Probabilities in Reading Experiments

During reading or listening, people can generate predictions about lexical and morphosyntactic pr... more During reading or listening, people can generate predictions about lexical and morphosyntactic properties of the upcoming input based on available context. Psycholinguistic experiments that study predictability or control for it conventionally rely on human-based approach and estimate predictability via the cloze task. Despite its ubiquitous use, the cloze task is criticized for its lexical biases and the lack of information about very improbable continuations. The present study investigated the alternative corpus-based approach for estimating predictability via language predictability models. First, we compared 5-gram and LSTM predictability models, trained on three text corpora and found that the LSTM performed better. Then we obtained cloze and corpus-based probabilities for all words in 144 Russian sentences, correlated the two measures, and found a strong correlation between them. Finally, we estimated how much variance in eye movements registered while reading the same 144 sen...

Research paper thumbnail of The effects of phonological neighborhood density in childhood word production and recognition in Russian are opposite to English

Studies with English-speaking adults showed that phonological neighborhood density (PND) influenc... more Studies with English-speaking adults showed that phonological neighborhood density (PND) influenced both word production and recognition: a dense neighborhood facilitated production but inhibited recognition. Importantly, these effects are not universal across languages. A strong reverse PND pattern was found in Spanish-speaking adults: a dense neighborhood inhibited production but facilitated recognition, indicating that PND effects depend on morphological complexity. Although there are investigations on how PND influences word production and recognition in adults, effects are largely unknown for young children, especially in languages with a rich morphological system. This study aims to explore how PND affects word production and recognition in four-to-six-year-old Russian children in comparison to adults. Our results are in line with the study in Spanish: Russian pre-schoolers show a tendency to the adult-like positive PND effect in word production and a strong adult-like PND eff...

Research paper thumbnail of Meaning structure of cognate words in English and Russian: comparing word sense frequency *

Polysemy is a key issue in theoretical semantics and lexicography as well as in computational lin... more Polysemy is a key issue in theoretical semantics and lexicography as well as in computational linguistics. When words have several senses, it is important to describe them properly in the dictionary (a lexicographic task) and to be able to distinguish between them in given context (a computational linguistics task, known as WSD). Recently attention has been drawn to the fact that different senses normally have different frequencies in corpora. Elsewhere we reported on our research into that issue and introduced several techniques for determining sense frequency based on dictionary entries matched with data from large corpora. Information about word sense frequency may enrich language learning resources and help lexicographers order senses within a word according to frequency, if needed. When learning a foreign language, a student may encounter a word that exists in his/her native language (as a borrowing or an international word), and is tempted to assume that the foreign word and i...

Research paper thumbnail of Regular polysemy: from sense vectors to sense patterns

Regular polysemy was extensively investigated in lexical semantics, but this phenomenon has been ... more Regular polysemy was extensively investigated in lexical semantics, but this phenomenon has been very little studied in distributional semantics. We propose a model for regular polysemy detection that is based on sense vectors and allows to work directly with senses in semantic vector space. Our method is able to detect polysemous words that have the same regular sense alternation as in a given example (a word with two automatically induced senses that represent one polysemy pattern, such as ANIMAL / FOOD). The method works equally well for nouns, verbs and adjectives and achieves average recall of 0.55 and average precision of 0.59 for ten different polysemy patterns.

Research paper thumbnail of Word Sense Induction Methods: Which One Is Better for Russian

The topic of this study is word sense induction (WSI), that is the automatic discovery of the pos... more The topic of this study is word sense induction (WSI), that is the automatic discovery of the possible senses of a word in text corpora. WSI is a challenging task as there are few examples of WSI being successfully deployed in end-user applications. Our aim is to apply WSI to Russian lexicography as a supporting tool for linguists. For this purpose, we compared the methods previously applied to English: Adaptive Skip-gram (Adagram), Latent Dirichlet Allocation (LDA) as well as several clustering techniques based on word2vec-clustering of contexts, clustering of context words and clustering of synonyms. In this study we quantitatively and a qualitatively evaluated the aforemen-tioned WSI methods for Russian nouns and verbs. For the quantitative evaluation, we measured the similarity of the suggested clustering to the existing dictionary senses with Adjusted Rand Index (ARI) and V-measure scores, using labeled contexts. For the qualitative evaluation, we assessed the interpretability of the derived senses, the number of duplicate senses, the number of mixed senses and derivation of rare senses. The study was performed on 15 nouns using RuWac Internet corpus.

Research paper thumbnail of Can Heritage Speakers predict lexical and morphosyntactic in-formation in reading?

Research paper thumbnail of Monolingual and Bilingual Reading Processes in Russian: An Exploratory Scanpath Analysis

Reading Research Quarterly, 2021

Research paper thumbnail of Can Heritage Speakers Predict Lexical and Morphosyntactic Information in Reading?

Languages, 2022

Ample evidence suggests that monolingual adults can successfully generate lexical and morphosynta... more Ample evidence suggests that monolingual adults can successfully generate lexical and morphosyntactic predictions in reading and that correct predictions facilitate sentence comprehension. In this eye-tracking corpus reading study, we investigate whether the same is true for reading in heritage language. Specifically, we ask whether heritage speakers (HSs) of Russian are able to anticipate lexical and/or morphosyntactic information of the upcoming words in the sentence and whether they differ in the predictions from monolingual children and L2 learners. We are also interested in whether the literacy level (i.e., Russian literacy experience or reading fluency in English) influences lexical and morphosyntactic prediction. Our results indicate that HSs as well as other groups were able to anticipate the specific lexical item, and the ability was contingent on the Russian literacy experience and reading fluency in dominant English as evident in some of the early and late eye-tracking me...

Research paper thumbnail of Receptive Language in Primary-School-Aged Children with Autism Spectrum Disorder

Клиническая и специальная психология, 2021

The objective of the present study is to investigate the relationship between the receptive langu... more The objective of the present study is to investigate the relationship between the receptive language, and the index of non-verbal intelligence and the level of severity of autistic disorders in primary-school-aged children with Autism spectrum disorder. One of the main areas influenced by autistic disorders is communication. Therefore, the study of the language abilities of such children and factors that affect them provides a better approach to the therapy and education. The sample included 50 children aged 7–11 years diagnosed with autism spectrum disorders. Children were tested using the KORABLIK method (basic linguistic skills), the Kaufman Assessment Battery for Children (KABC-II) or the Wechsler Intelligence Scale for Children ― Third Edition (WISC-III) (non-verbal intelligence), the Autism Diagnosis Observation Schedule ― Second Edition (ADOS-II) (autistic traits). The results support the hypothesis of the relationship between receptive language skills, the index of non-verba...

Research paper thumbnail of Russian Child Language Assessment Battery (RuCLAB) and its Application in Primary School Children with ASD

Autism and Developmental Disorders, 2021

In speech-language pathology practice, standardized language assessment tools are used to evaluat... more In speech-language pathology practice, standardized language assessment tools are used to evaluate the level of language development and to specify the details of language impairment. For Russian language, a novel Russian Child Language Assessment Battery (RuCLAB) was developed. The RuCLAB provides the assessment of phonology, vocabulary, morphosyntax, and discourse in production and comprehension. Present study aims to describe RuCLAB in detail and to report its application in 7—11 years-old children with Autism Spectrum Disorder (ASD). The results revealed between-group differences in children with and without ASD as well as highlighted some individual features in the group of children with ASD: for example, expressive and receptive patterns differed depending on the linguistic level and non-verbal IQ; also children with ASD (as children with complex language disorders) better acquired nouns in comparison to verbs, and the words’ frequency influenced the accuracy in sentence repet...

Research paper thumbnail of The Impact of Phonological and Orthographic Processing on Reading Speed in Russian-Speaking Children

The Russian Journal of Cognitive Science, 2021

Phonological and orthographic processing are reported to be among the strongest predictors of rea... more Phonological and orthographic processing are reported to be among the strongest predictors of reading development across different Indo-European languages. The relative impact of these factors can be modulated by cross-linguistic script and orthographic differences, as evidenced by many studies in European languages. The present study investigates the effect of phonological and orthographic processing on reading speed in 6- to 12-year-old (1 – 5 grades) Russian-speaking children (N = 117), taking into account age as a factor as well. Phonological and orthographic processing were assessed with behavioral tests. The results revealed that both skills predict reading speed in Russian. Moreover, the age of young readers can also be a non-linguistic predictor of reading speed in Russian, especially in children between 6 and 10 years old. Children aged 10 to 12 also demonstrated some variability in reading speed, although an increase in reading speed was no longer observed.

Research paper thumbnail of Metaphor Is Between Metonymy and Homonymy: Evidence From Event-Related Potentials

Frontiers in Psychology, 2020

Research paper thumbnail of Phonological Neighbourhood Density in Russian Word Production: Evidence from Children and Adults

SSRN Electronic Journal, 2018

Phonological neighbourhood density (PND) refers to the number of words which can be formed from a... more Phonological neighbourhood density (PND) refers to the number of words which can be formed from a given word by substituting, adding or deleting one phoneme. Thus, word with many similar sounding neighbours has a dense neighbourhood, whereas a word with few neighbours or without neighbors has a sparse neighbourhood. Previous studies have shown that dense and sparse neighbourhoods influence word production in different ways. Research in English-speaking adults demonstrated that words with dense neighbourhood are produced faster than words with sparse neighbourhood, facilitating lexical access. At the same time, sparse neighbourhood inhibits word production. Interestingly, studies in Spanish adults showed the reverse effect: dense neighbourhood inhibits word production whereas sparse neighbourhood facilitates it. This cross-linguistic difference in the PND pattern was explained in terms of morphological complexity of Spanish in comparison to English. Although there are numerous studies of the PND effect in adults, some questions remain unknown. For example, how does PND influence word production in morphologically more complex language than Spanish? Or, how does the PND pattern develop in children? The present paper aims to explore these questions.

Research paper thumbnail of Meaning Relatedness in Polysemous and Homonymous Words: An Erp Study in Russian

SSRN Electronic Journal, 2018

Previous research showed that polysemous and homonymous words are processed differently. However,... more Previous research showed that polysemous and homonymous words are processed differently. However, mechanisms underlying processing of ambiguous words are still unclear. The goal of the present study was to investigate comprehension of metonymies, metaphors, and homonyms using priming paradigm and the method of event-related potentials (ERPs). We asked participants to read two-word phrases with ambiguous words and make a sensicality judgement. The results demonstrated the difference between metonymic and metaphorical senses of polysemous words in the amount of priming for the literal sense. The priming effect between metonymic and literal senses supports the idea that these senses share a single representation in the mental lexicon. In contrast, metaphorical senses of polysemous words showed a very limited priming effect on literal senses of the same words. Similar results were observed for different meanings of homonymous words. We conclude that metaphorical senses should have separate representations in the mental lexicon similarly to homonyms. JEL Classification: Z.

Research paper thumbnail of Word Sense Frequency Estimation for Russian: Verbs, Adjectives, and Different Dictionaries

In this paper we investigate several extensions to our prior work on sense frequency estimation f... more In this paper we investigate several extensions to our prior work on sense frequency estimation for Russian. Our method is based on semantic vectors and is able to achieve good accuracy for sense frequency estimation traine d on dictionary entries from the Active Dictionary of Russian and unannotated corpora. We apply our method to verbs and adjectives to obtain sense frequencies for 329 verbs and 256 adjectives in an academic corpus and a web-based corpus. We compare frequency distributions against dictionary sense ordering and between two corpora and find that the first dictionary sense is not the most frequent for almost half of the words we studied. Evaluation of verbs and adjectives shows that frequency estimation error is lower than 15%. We investigate the effect of sense granularity, evaluating how the accuracy of our method changes when applied to more coarse-grained senses. We also investigate if our method can be applied to other dictionaries with less elaborate sense desc...

Research paper thumbnail of Global reading processes in children with high risk of dyslexia: a scanpath analysis

Annals of Dyslexia, 2022

The study presents the first systematic comparison of the global reading processes via scanpath a... more The study presents the first systematic comparison of the global reading processes via scanpath analysis in Russian-speaking children with and without reading difficulties. First, we compared basic eye-movement characteristics in reading sentences in two groups of children in grades 1 to 5 (N = 72 in high risk of developmental dyslexia group and N = 72 in the control group). Next, using the scanpath method, we investigated which global reading processes these children adopt to read the entire sentence and how these processes differ between the groups. Finally, we were interested in the timeframe of the change in the global reading processes from the 1st to the 5th grades for both groups. We found that the main difference in word-level measures between groups was the reading speed reflected in fixation durations. However, the examination of the five identified global reading processes revealed qualitative similarities in reading patterns between groups. Children in the control group ...

Research paper thumbnail of Morphosyntactic but not lexical corpus-based probabilities can substitute for cloze probabilities in reading experiments

PLOS ONE, 2021

During reading or listening, people can generate predictions about the lexical and morphosyntacti... more During reading or listening, people can generate predictions about the lexical and morphosyntactic properties of upcoming input based on available context. Psycholinguistic experiments that study predictability or control for it conventionally rely on a human-based approach and estimate predictability via the cloze task. Our study investigated an alternative corpus-based approach for estimating predictability via language predictability models. We obtained cloze and corpus-based probabilities for all words in 144 Russian sentences, correlated the two measures, and found a strong correlation between them. Importantly, we estimated how much variance in eye movements registered while reading the same sentences was explained by each of the two probabilities and whether the two probabilities explain the same variance. Along with lexical predictability (the activation of a particular word form), we analyzed morphosyntactic predictability (the activation of morphological features of words)...

Research paper thumbnail of Good-enough language processing in speech perception: the impact of age and noise

Good-enough approach to language comprehension assumes that people do not always engage in full d... more Good-enough approach to language comprehension assumes that people do not always engage in full detailed processing of linguistic input. Rather, the parser forms shallow representations when confronted with some difficulty such as complex syntactic structure or noisy input. Although the good-enough approach has been studied for some time, we still do not know what factors trigger this type of processing. In this study, we investigate two factors that might influence the reliance on the good-enough language processing strategy in oral speech perception — age and noise.

Research paper thumbnail of Phonological and orthographic parafoveal processing during silent reading in Russian children and adults

Studies on German and English showed that children and adults can rely on phonological and orthog... more Studies on German and English showed that children and adults can rely on phonological and orthographic information from the parafovea during reading, but this reliance differs between ages and languages. In the present study, we investigated the development of phonological and orthographic parafoveal processing during silent reading in Russian-speaking 8-year-old children, 10-year-old children and adults using gaze-contingent boundary paradigm. The participants read sentences with embedded nouns which were presented in original, pseudohomophone, control for pseudohomophone, transposed-letter and control for transposed-letter conditions in the parafoveal area to assess phonological and orthographic preview benefit effects. The results revealed that 8-year-old children already relied on orthographic information, which was stable in 10-year-old children and adults. The evidence for phonological parafoveal processing was found only in adults, which indicates the development of phonolog...

Research paper thumbnail of Global reading processes in children with high-risk of dyslexia: A scanpath analysis

Research paper thumbnail of Corpus-Based Probabilities Can Substitute for Cloze Probabilities in Reading Experiments

During reading or listening, people can generate predictions about lexical and morphosyntactic pr... more During reading or listening, people can generate predictions about lexical and morphosyntactic properties of the upcoming input based on available context. Psycholinguistic experiments that study predictability or control for it conventionally rely on human-based approach and estimate predictability via the cloze task. Despite its ubiquitous use, the cloze task is criticized for its lexical biases and the lack of information about very improbable continuations. The present study investigated the alternative corpus-based approach for estimating predictability via language predictability models. First, we compared 5-gram and LSTM predictability models, trained on three text corpora and found that the LSTM performed better. Then we obtained cloze and corpus-based probabilities for all words in 144 Russian sentences, correlated the two measures, and found a strong correlation between them. Finally, we estimated how much variance in eye movements registered while reading the same 144 sen...

Research paper thumbnail of The effects of phonological neighborhood density in childhood word production and recognition in Russian are opposite to English

Studies with English-speaking adults showed that phonological neighborhood density (PND) influenc... more Studies with English-speaking adults showed that phonological neighborhood density (PND) influenced both word production and recognition: a dense neighborhood facilitated production but inhibited recognition. Importantly, these effects are not universal across languages. A strong reverse PND pattern was found in Spanish-speaking adults: a dense neighborhood inhibited production but facilitated recognition, indicating that PND effects depend on morphological complexity. Although there are investigations on how PND influences word production and recognition in adults, effects are largely unknown for young children, especially in languages with a rich morphological system. This study aims to explore how PND affects word production and recognition in four-to-six-year-old Russian children in comparison to adults. Our results are in line with the study in Spanish: Russian pre-schoolers show a tendency to the adult-like positive PND effect in word production and a strong adult-like PND eff...

Research paper thumbnail of Meaning structure of cognate words in English and Russian: comparing word sense frequency *

Polysemy is a key issue in theoretical semantics and lexicography as well as in computational lin... more Polysemy is a key issue in theoretical semantics and lexicography as well as in computational linguistics. When words have several senses, it is important to describe them properly in the dictionary (a lexicographic task) and to be able to distinguish between them in given context (a computational linguistics task, known as WSD). Recently attention has been drawn to the fact that different senses normally have different frequencies in corpora. Elsewhere we reported on our research into that issue and introduced several techniques for determining sense frequency based on dictionary entries matched with data from large corpora. Information about word sense frequency may enrich language learning resources and help lexicographers order senses within a word according to frequency, if needed. When learning a foreign language, a student may encounter a word that exists in his/her native language (as a borrowing or an international word), and is tempted to assume that the foreign word and i...

Research paper thumbnail of Regular polysemy: from sense vectors to sense patterns

Regular polysemy was extensively investigated in lexical semantics, but this phenomenon has been ... more Regular polysemy was extensively investigated in lexical semantics, but this phenomenon has been very little studied in distributional semantics. We propose a model for regular polysemy detection that is based on sense vectors and allows to work directly with senses in semantic vector space. Our method is able to detect polysemous words that have the same regular sense alternation as in a given example (a word with two automatically induced senses that represent one polysemy pattern, such as ANIMAL / FOOD). The method works equally well for nouns, verbs and adjectives and achieves average recall of 0.55 and average precision of 0.59 for ten different polysemy patterns.

Research paper thumbnail of Regular polysemy: from sense vectors to sense patterns

Regular polysemy was extensively investigated in lexical semantics, but this phenomenon has been ... more Regular polysemy was extensively investigated in lexical semantics, but this phenomenon has been very little studied in distributional semantics. We propose a model for regular polysemy detection that is based on sense vectors and allows to work directly with senses in semantic vector space. Our method is able to detect polysemous words that have the same regular sense alternation as in a given example (a word with two automatically induced senses that represent one polysemy pattern, such as ANIMAL / FOOD). The method works equally well for nouns, verbs and adjectives and achieves an average recall of 0.55 and an average precision of 0.59 for ten different polysemy patterns.

Research paper thumbnail of Automated Word Sense Frequency Estimation for Russian Nouns

According to Zipf's observation there is a strong correlation between word frequency and polysemy... more According to Zipf's observation there is a strong correlation between word frequency and polysemy, and yet word sense frequency distribution is a neglected area in computational linguistics. Furthermore, the study of sense frequency has theoretical interest and practical applications for lexicography and word sense disambiguation. Though WordNet and SemCor contain some information about sense frequency in English, it is not enough for either practical or research purposes. For Russian, even this information is lacking. To fill this gap, we develop and test an automated system based on semantic vectors that deals with the problem of sense frequency for Russian nouns. The model is first trained unsupervised on large corpora and then supplied with contexts and collocations from the Active Dictionary of Russian. Dictionary examples are used either for supervised post-­training, or for automatic labeling of clusters that are learnt unsupervised. This allows us to reach a frequency estimation error of 11­15% on different corpora without any additional labeled data. Word sense frequency distributions for 440 nouns are available online.

Research paper thumbnail of Автоматическое выделение значений слов: семантические векторы вместо лексикографов?

Вопрос о том, как определить значение слова, был и остается одним из самых сложных и спорных в ли... more Вопрос о том, как определить значение слова, был и остается одним из самых сложных и спорных в лингвистике. Одно из возможных его решений основывается на принципах дистрибутивной семантики — “you shall know a word by the company it keeps” (Firth 1957). Эти принципы легли в основу методов автоматического извлечения значений слов из корпусов текстов при помощи семантических векторов (word sense induction).
В докладе речь пойдет об автоматическом выделении значений слов для русского языка. В первой части мы сравним четыре метода: word2vec neighbours и context clustering, основанные на векторном представлении слов и контекстов, метод AdaGram, использующий векторное представление значений слов, и latent Dirichlet allocation. Кроме того, мы сопоставим результаты автоматического извлечения значений с данными словаря и результатами психолингвистических экспериментов. Во второй части мы расскажем об исследованиях автоматического извлечения слов с регулярной полисемией и покажем, как векторы значений позволяют решить эту задачу более простым и естественным способом.