Claudia Marzi - Academia.edu (original) (raw)
Papers by Claudia Marzi
2020 6th IEEE Congress on Information Science and Technology (CiSt)
The enormous potential of Information and Communication Technologies (ICT) for addressing critica... more The enormous potential of Information and Communication Technologies (ICT) for addressing critical educational issues is generally acknowledged, but its use in the assessment of the complex skills of reading and understanding a text has been very limited to date. The paper contrasts traditional reading assessment protocols with ReadLet, an ICT platform with a tablet front-end, designed to support online monitoring of silent and oral reading abilities in early graders. ReadLet makes use of cloud computing and mobile technology for large-scale data collection and allows the time alignment of the child's reading behaviour with texts tagged using Natural Language Processing (NLP) tools. Initial findings replicate established benchmarks from the psycholinguistic literature on reading in both typically and atypically developing children, making the application a new ground-breaking approach in the evaluation of reading skills.
Proceedings of the First Italian Conference on Computational Linguistics CLiC-it 2014 and of the Fourth International Workshop EVALITA 2014 9-11 December 2014, Pisa, 2014
The dynamic nature of modern human social interactions, and the increasing capability of wireless... more The dynamic nature of modern human social interactions, and the increasing capability of wireless and mobile devices for creating and sharing contents, open up the opportunity for a wide dissemination of information through complex knowledge sharing systems. As the shared knowledge components build cognitive ties, there is no real sharing of knowledge without a common understanding of it. In this article, particular emphasis is laid on technologies in Natural Language understanding and knowledge management for providing structured, intelligent access to the continuously evolving content, generated on-line in a pervasive collaborative environment. In detail, robust automated techniques for term extraction and knowledge acquisition are used to tap the information density and the global coherence of text excerpts sampled from both general-purpose and subject-specific social networks. We show empirically that the two sources may exhibit considerable differences in terms of content acces...
The conventionally accepted definition of Grey Literature, as Information produced and distribute... more The conventionally accepted definition of Grey Literature, as Information produced and distributed by non-commercial publishing, does not take into consideration either the increasing availability of forms of grey knowledge, or the growing importance of computer-based encoding and management as the standard mode of creating and developing grey literature. Semi-automated terminological analysis of almost twenty years of terminological creativity in the proceedings of eleven GL International Conferences offers the opportunity to pave the way to a bottomup redefinition of Grey Literature stemming from attested terminological creativity and lexical innovation. In this paper, we focus on a set of automatically-acquired terms obtained by subjecting our reference Corpus to a number of pre-processing steps of automated text analysis, such as concordances, frequency lists and lexical association scores. Acquired terms allow us to throw in sharp relief developing trends and important shifts o...
2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2019
Current research in the emotion recognition field is exploring the possibility of merging the inf... more Current research in the emotion recognition field is exploring the possibility of merging the information from physiological signals, behavioural data, and speech. Electrodermal activity (EDA) is amongst the main psychophysiological arousal indicators. Nonetheless, it is quite difficult to be analyzed in ecological scenarios, like, for instance, when the subject is speaking. On the other hand, speech carries relevant information of subject emotional state and its potential in the field of affective computing is still to be fully exploited. In this work, we aim at exploring the possibility of merging the information from electrodermal activity (EDA) and speech to improve the recognition of human arousal level during the pronunciation of single affective words. Unlike the majority of studies in the literature, we focus on speakers’ arousal rather than the emotion conveyed by the spoken word. Specifically, a support vector machine with recursive feature elimination strategy (SVM-RFE) is trained and tested on three datasets, i.e using the two channels (i.e., speech and EDA) separately and then jointly. The results show that the merging of EDA and speech information significantly improves the marginal classifier (+11.64%). The six selected features by the RFE procedure will be used for the development of a future multivariate model of emotions.
Biomedical Signal Processing and Control, 2021
Abstract In this paper, we explore the possibility of building a model of subject arousal by expl... more Abstract In this paper, we explore the possibility of building a model of subject arousal by exploiting the acquisition and the analysis of speech and electrodermal activity (EDA). Several issues have to be addressed to reach this goal as the estimation of the relationship between arousal and behavioral measures and the reliability of EDA signal during speech production. To accomplish this task, we will investigate the relation among EDA, speech activity and subject arousal, during isolated affective word pronunciation. Our results show that significant information on subject arousal can be obtained by analyzing EDA during the processing of out-of-context words with an emotional content in a reading aloud task. Based on a sample of eighteen Italian participants, we observed a significant relation between EDA features and self-reported arousal scores. Quantitative models relating EDA- and speech-derived features are proposed and discussed. We found that increasing values of tonic and phasic components of EDA signals correspond to increasing self-assessed arousal scores; Mel-frequency cepstral analysis of speech was also shown to carry relevant information about subject arousal, with a significant inverse relation to self-assessed scores. Our results suggest how the analysis of concurrent acquisition of EDA and speech features may offer a valid approach for the prediction of subject arousal during speech production, as well as a method for validating self-assessment ratings themselves.
Word Knowledge and Word Usage
What is inflection? Is it part of language morphology, syntax or both? What are the basic units o... more What is inflection? Is it part of language morphology, syntax or both? What are the basic units of inflection and how do speakers acquire and process them? How do they vary across languages? Are some inflection systems somewhat more complex than others, and does inflectional complexity affect the way speakers process words? This chapter addresses these and other related issues from an interdisciplinary perspective. Our main goal is to map out the place of inflection in our current understanding of the grammar architecture. In doing that, we will embark on an interdisciplinary tour, which will touch upon theoretical, psychological, typological, historical and computational issues in morphology, with a view to looking for points of methodological and substantial convergence from a rather heterogeneous array of scientific approaches and theoretical perspectives. The main upshot is that we can learn more from this than just an additive medley of domain-specific results. In the end, a cross-domain survey can help us look at traditional issues in a surprisingly novel light.
Word Knowledge and Word Usage
Over the last decades, a growing body of evidence on the mechanisms governing lexical storage, ac... more Over the last decades, a growing body of evidence on the mechanisms governing lexical storage, access, acquisition and processing has questioned traditional models of language architecture and word usage based on the hypothesis of a direct correspondence between modular components of grammar competence (lexicon vs. rules), processing correlates (memory vs. computation) and neuro-anatomical localizations (prefrontal vs. temporo-parietal perisylvian areas of the left hemisphere). In the present chapter, we explore the empirical and theoretical consequences of a distributed, integrative model of the mental lexicon, whereby words are seen as emergent properties of the functional interaction between basic, language-independent processing principles and the language-specific nature and organization of the input. From this perspective, language learning appears to be inextricably related to the way language is processed and internalized by the speakers, and key to an interdisciplinary understanding of such a way, in line with Tomaso Poggio's suggestion that the development of a cognitive skill is causally and ontogenetically prior to its execution (and sits "on top of it"). In particular, we discuss conditions, potential and prospects of the epistemological continuity between psycholinguistic and computational modelling of word learning, and illustrate the yet largely untapped potential of their integration. We use David Marr's hierarchy to clarify the complementarity of the two viewpoints. Psycholinguistic models are informative about how speakers learn to use language (interfacing Marr's levels 1 and 2). When we move from the psycholinguistic analysis of the functional operations involved in language learning to an algorithmic description of how they are computed, computer simulations can help us explore the relation between speakers' behavior and general learning principles in more detail. In the end, psycho-computational
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
Italian Journal of Computational Linguistics
Italian Journal of Computational Linguistics
Paradigm-based approaches to word processing/learning assume that word forms are not acquired in ... more Paradigm-based approaches to word processing/learning assume that word forms are not acquired in isolation, but through associative relations linking members of the same word family (e.g. a paradigm, or a set of forms filling the same paradigm cell). Principles of correlative learning offer a set of equations that are key to modelling this complex dynamic at a considerable level of detail. We use these equations to simulate acquisition of Modern Greek conjugation, and we compare the results with evidence from German and Italian. Simulations show that different Greek verb classes are processed and acquired differentially, as a function of their degrees of formal transparency and predictability. We relate these results to psycholinguistic evidence of Modern Greek word processing, and interpret our findings as supporting a view of the mental lexicon as an emergent integrative system.
Information
The paper focuses on what two different types of Recurrent Neural Networks, namely a recurrent Lo... more The paper focuses on what two different types of Recurrent Neural Networks, namely a recurrent Long Short-Term Memory and a recurrent variant of self-organizing memories, a Temporal Self-Organizing Map, can tell us about speakers’ learning and processing a set of fully inflected verb forms selected from the top-frequency paradigms of Italian and German. Both architectures, due to the re-entrant layer of temporal connectivity, can develop a strong sensitivity to sequential patterns that are highly attested in the training data. The main goal is to evaluate learning and processing dynamics of verb inflection data in the two neural networks by focusing on the effects of morphological structure on word production and word recognition, as well as on word generalization for untrained verb forms. For both models, results show that production and recognition, as well as generalization, are facilitated for verb forms in regular paradigms. However, the two models are differently influenced by...
Journal of King Saud University - Computer and Information Sciences, 2016
Aim of the present study is to model the human mental lexicon, by focussing on storage and proces... more Aim of the present study is to model the human mental lexicon, by focussing on storage and processing dynamics, as lexical organisation relies on the process of input recoding and adaptive strategies for longterm memory organisation. A fundamental issue in word processing is represented by the emergence of the morphological organisation level in the lexicon, based on paradigmatic relations between fully-stored word forms. Morphology induction can be defined as the task of perceiving and identifying morphological formatives within morphologically complex word forms, as a function of the dynamic interaction between lexical representations and distribution and degrees of regularity in lexical data. In the computational framework we propose here (TSOMs), based on Self-Organising Maps with Hebbian connections defined over a temporal layer, the identification/perception of surface morphological relations involves the alignment of recoded representations of morphologically-related input words. Facing a non-concatenative morphology such as the Arabic inflectional system prompts a reappraisal of morphology induction through adaptive organisation strategies, which affect both lexical representations and long-term storage. We will show how a strongly adaptive self-organisation during training is conducive to emergent relations between word forms, which are concurrently, redundantly and competitively stored in human mental lexicon, and to generalising knowledge of stored words to unknown forms.
Frontiers in Communication
Due to the typological diversity of their inflectional processes, some languages are intuitively ... more Due to the typological diversity of their inflectional processes, some languages are intuitively more difficult than other languages. Yet, finding a single measure to quantitatively assess the comparative complexity of an inflectional system proves an exceedingly difficult endeavor. In this paper we propose to investigate the issue from a processing-oriented standpoint, using data processed by a type of recurrent neural network to quantitatively model the dynamic of word processing and learning in different input conditions. We evaluate the relative complexity of a set of typologically different inflectional systems (Greek, Italian, Spanish, German, English and Standard Modern Arabic) by training a Temporal Self-Organizing Map (TSOM), a recurrent variant of Kohonen's Self-Organizing Maps, on a fixed set of verb forms from top-frequency verb paradigms, with no information about the morphosemantic and morphosyntactic content conveyed by the forms. After training, the behavior of each language-specific TSOM is assessed on different tasks, looking at self-organizing patterns of temporal connectivity and functional responses. Our simulations show that word processing is facilitated by maximally contrastive inflectional systems, where verb forms exhibit the earliest possible point of lexical discrimination. Conversely, word learning is favored by a maximally generalizable system, where forms are inferred from the smallest possible number of their paradigm companions. Based on evidence from the literature and our own data, we conjecture that the resulting balance is the outcome of the interaction between form frequency and morphological regularity. Big families of stem-sharing, regularly inflected forms are the productive core of an inflectional system. Such a core is easier to learn but slower to discriminate. In contrast, less predictable verb forms, based on alternating and possibly suppletive stems, are easier to process but are learned by rote. Inflection systems thus strike a balance between these conflicting processing and communicative requirements, while staying within tight learnability bounds, in line with Ackermann and Malouf's Low Conditional Entropy Conjecture. Our quantitative investigation supports a discriminative view of morphological inflection as a collective, emergent system, whose global self-organization rests on a surprisingly small handful of language-independent principles of word coactivation and competition.
The Grey Journal, 2010
The Grey Journal is a flagship journal for the grey literature community. It crosses continents, ... more The Grey Journal is a flagship journal for the grey literature community. It crosses continents, disciplines, and sectors both public and private. The Grey Journal not only deals with the topic of grey literature but is itself a document type classified as grey literature. It is akin to other grey serial publications, such as conference proceedings, reports, working papers, etc.
2020 6th IEEE Congress on Information Science and Technology (CiSt)
The enormous potential of Information and Communication Technologies (ICT) for addressing critica... more The enormous potential of Information and Communication Technologies (ICT) for addressing critical educational issues is generally acknowledged, but its use in the assessment of the complex skills of reading and understanding a text has been very limited to date. The paper contrasts traditional reading assessment protocols with ReadLet, an ICT platform with a tablet front-end, designed to support online monitoring of silent and oral reading abilities in early graders. ReadLet makes use of cloud computing and mobile technology for large-scale data collection and allows the time alignment of the child's reading behaviour with texts tagged using Natural Language Processing (NLP) tools. Initial findings replicate established benchmarks from the psycholinguistic literature on reading in both typically and atypically developing children, making the application a new ground-breaking approach in the evaluation of reading skills.
Proceedings of the First Italian Conference on Computational Linguistics CLiC-it 2014 and of the Fourth International Workshop EVALITA 2014 9-11 December 2014, Pisa, 2014
The dynamic nature of modern human social interactions, and the increasing capability of wireless... more The dynamic nature of modern human social interactions, and the increasing capability of wireless and mobile devices for creating and sharing contents, open up the opportunity for a wide dissemination of information through complex knowledge sharing systems. As the shared knowledge components build cognitive ties, there is no real sharing of knowledge without a common understanding of it. In this article, particular emphasis is laid on technologies in Natural Language understanding and knowledge management for providing structured, intelligent access to the continuously evolving content, generated on-line in a pervasive collaborative environment. In detail, robust automated techniques for term extraction and knowledge acquisition are used to tap the information density and the global coherence of text excerpts sampled from both general-purpose and subject-specific social networks. We show empirically that the two sources may exhibit considerable differences in terms of content acces...
The conventionally accepted definition of Grey Literature, as Information produced and distribute... more The conventionally accepted definition of Grey Literature, as Information produced and distributed by non-commercial publishing, does not take into consideration either the increasing availability of forms of grey knowledge, or the growing importance of computer-based encoding and management as the standard mode of creating and developing grey literature. Semi-automated terminological analysis of almost twenty years of terminological creativity in the proceedings of eleven GL International Conferences offers the opportunity to pave the way to a bottomup redefinition of Grey Literature stemming from attested terminological creativity and lexical innovation. In this paper, we focus on a set of automatically-acquired terms obtained by subjecting our reference Corpus to a number of pre-processing steps of automated text analysis, such as concordances, frequency lists and lexical association scores. Acquired terms allow us to throw in sharp relief developing trends and important shifts o...
2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2019
Current research in the emotion recognition field is exploring the possibility of merging the inf... more Current research in the emotion recognition field is exploring the possibility of merging the information from physiological signals, behavioural data, and speech. Electrodermal activity (EDA) is amongst the main psychophysiological arousal indicators. Nonetheless, it is quite difficult to be analyzed in ecological scenarios, like, for instance, when the subject is speaking. On the other hand, speech carries relevant information of subject emotional state and its potential in the field of affective computing is still to be fully exploited. In this work, we aim at exploring the possibility of merging the information from electrodermal activity (EDA) and speech to improve the recognition of human arousal level during the pronunciation of single affective words. Unlike the majority of studies in the literature, we focus on speakers’ arousal rather than the emotion conveyed by the spoken word. Specifically, a support vector machine with recursive feature elimination strategy (SVM-RFE) is trained and tested on three datasets, i.e using the two channels (i.e., speech and EDA) separately and then jointly. The results show that the merging of EDA and speech information significantly improves the marginal classifier (+11.64%). The six selected features by the RFE procedure will be used for the development of a future multivariate model of emotions.
Biomedical Signal Processing and Control, 2021
Abstract In this paper, we explore the possibility of building a model of subject arousal by expl... more Abstract In this paper, we explore the possibility of building a model of subject arousal by exploiting the acquisition and the analysis of speech and electrodermal activity (EDA). Several issues have to be addressed to reach this goal as the estimation of the relationship between arousal and behavioral measures and the reliability of EDA signal during speech production. To accomplish this task, we will investigate the relation among EDA, speech activity and subject arousal, during isolated affective word pronunciation. Our results show that significant information on subject arousal can be obtained by analyzing EDA during the processing of out-of-context words with an emotional content in a reading aloud task. Based on a sample of eighteen Italian participants, we observed a significant relation between EDA features and self-reported arousal scores. Quantitative models relating EDA- and speech-derived features are proposed and discussed. We found that increasing values of tonic and phasic components of EDA signals correspond to increasing self-assessed arousal scores; Mel-frequency cepstral analysis of speech was also shown to carry relevant information about subject arousal, with a significant inverse relation to self-assessed scores. Our results suggest how the analysis of concurrent acquisition of EDA and speech features may offer a valid approach for the prediction of subject arousal during speech production, as well as a method for validating self-assessment ratings themselves.
Word Knowledge and Word Usage
What is inflection? Is it part of language morphology, syntax or both? What are the basic units o... more What is inflection? Is it part of language morphology, syntax or both? What are the basic units of inflection and how do speakers acquire and process them? How do they vary across languages? Are some inflection systems somewhat more complex than others, and does inflectional complexity affect the way speakers process words? This chapter addresses these and other related issues from an interdisciplinary perspective. Our main goal is to map out the place of inflection in our current understanding of the grammar architecture. In doing that, we will embark on an interdisciplinary tour, which will touch upon theoretical, psychological, typological, historical and computational issues in morphology, with a view to looking for points of methodological and substantial convergence from a rather heterogeneous array of scientific approaches and theoretical perspectives. The main upshot is that we can learn more from this than just an additive medley of domain-specific results. In the end, a cross-domain survey can help us look at traditional issues in a surprisingly novel light.
Word Knowledge and Word Usage
Over the last decades, a growing body of evidence on the mechanisms governing lexical storage, ac... more Over the last decades, a growing body of evidence on the mechanisms governing lexical storage, access, acquisition and processing has questioned traditional models of language architecture and word usage based on the hypothesis of a direct correspondence between modular components of grammar competence (lexicon vs. rules), processing correlates (memory vs. computation) and neuro-anatomical localizations (prefrontal vs. temporo-parietal perisylvian areas of the left hemisphere). In the present chapter, we explore the empirical and theoretical consequences of a distributed, integrative model of the mental lexicon, whereby words are seen as emergent properties of the functional interaction between basic, language-independent processing principles and the language-specific nature and organization of the input. From this perspective, language learning appears to be inextricably related to the way language is processed and internalized by the speakers, and key to an interdisciplinary understanding of such a way, in line with Tomaso Poggio's suggestion that the development of a cognitive skill is causally and ontogenetically prior to its execution (and sits "on top of it"). In particular, we discuss conditions, potential and prospects of the epistemological continuity between psycholinguistic and computational modelling of word learning, and illustrate the yet largely untapped potential of their integration. We use David Marr's hierarchy to clarify the complementarity of the two viewpoints. Psycholinguistic models are informative about how speakers learn to use language (interfacing Marr's levels 1 and 2). When we move from the psycholinguistic analysis of the functional operations involved in language learning to an algorithmic description of how they are computed, computer simulations can help us explore the relation between speakers' behavior and general learning principles in more detail. In the end, psycho-computational
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
Italian Journal of Computational Linguistics
Italian Journal of Computational Linguistics
Paradigm-based approaches to word processing/learning assume that word forms are not acquired in ... more Paradigm-based approaches to word processing/learning assume that word forms are not acquired in isolation, but through associative relations linking members of the same word family (e.g. a paradigm, or a set of forms filling the same paradigm cell). Principles of correlative learning offer a set of equations that are key to modelling this complex dynamic at a considerable level of detail. We use these equations to simulate acquisition of Modern Greek conjugation, and we compare the results with evidence from German and Italian. Simulations show that different Greek verb classes are processed and acquired differentially, as a function of their degrees of formal transparency and predictability. We relate these results to psycholinguistic evidence of Modern Greek word processing, and interpret our findings as supporting a view of the mental lexicon as an emergent integrative system.
Information
The paper focuses on what two different types of Recurrent Neural Networks, namely a recurrent Lo... more The paper focuses on what two different types of Recurrent Neural Networks, namely a recurrent Long Short-Term Memory and a recurrent variant of self-organizing memories, a Temporal Self-Organizing Map, can tell us about speakers’ learning and processing a set of fully inflected verb forms selected from the top-frequency paradigms of Italian and German. Both architectures, due to the re-entrant layer of temporal connectivity, can develop a strong sensitivity to sequential patterns that are highly attested in the training data. The main goal is to evaluate learning and processing dynamics of verb inflection data in the two neural networks by focusing on the effects of morphological structure on word production and word recognition, as well as on word generalization for untrained verb forms. For both models, results show that production and recognition, as well as generalization, are facilitated for verb forms in regular paradigms. However, the two models are differently influenced by...
Journal of King Saud University - Computer and Information Sciences, 2016
Aim of the present study is to model the human mental lexicon, by focussing on storage and proces... more Aim of the present study is to model the human mental lexicon, by focussing on storage and processing dynamics, as lexical organisation relies on the process of input recoding and adaptive strategies for longterm memory organisation. A fundamental issue in word processing is represented by the emergence of the morphological organisation level in the lexicon, based on paradigmatic relations between fully-stored word forms. Morphology induction can be defined as the task of perceiving and identifying morphological formatives within morphologically complex word forms, as a function of the dynamic interaction between lexical representations and distribution and degrees of regularity in lexical data. In the computational framework we propose here (TSOMs), based on Self-Organising Maps with Hebbian connections defined over a temporal layer, the identification/perception of surface morphological relations involves the alignment of recoded representations of morphologically-related input words. Facing a non-concatenative morphology such as the Arabic inflectional system prompts a reappraisal of morphology induction through adaptive organisation strategies, which affect both lexical representations and long-term storage. We will show how a strongly adaptive self-organisation during training is conducive to emergent relations between word forms, which are concurrently, redundantly and competitively stored in human mental lexicon, and to generalising knowledge of stored words to unknown forms.
Frontiers in Communication
Due to the typological diversity of their inflectional processes, some languages are intuitively ... more Due to the typological diversity of their inflectional processes, some languages are intuitively more difficult than other languages. Yet, finding a single measure to quantitatively assess the comparative complexity of an inflectional system proves an exceedingly difficult endeavor. In this paper we propose to investigate the issue from a processing-oriented standpoint, using data processed by a type of recurrent neural network to quantitatively model the dynamic of word processing and learning in different input conditions. We evaluate the relative complexity of a set of typologically different inflectional systems (Greek, Italian, Spanish, German, English and Standard Modern Arabic) by training a Temporal Self-Organizing Map (TSOM), a recurrent variant of Kohonen's Self-Organizing Maps, on a fixed set of verb forms from top-frequency verb paradigms, with no information about the morphosemantic and morphosyntactic content conveyed by the forms. After training, the behavior of each language-specific TSOM is assessed on different tasks, looking at self-organizing patterns of temporal connectivity and functional responses. Our simulations show that word processing is facilitated by maximally contrastive inflectional systems, where verb forms exhibit the earliest possible point of lexical discrimination. Conversely, word learning is favored by a maximally generalizable system, where forms are inferred from the smallest possible number of their paradigm companions. Based on evidence from the literature and our own data, we conjecture that the resulting balance is the outcome of the interaction between form frequency and morphological regularity. Big families of stem-sharing, regularly inflected forms are the productive core of an inflectional system. Such a core is easier to learn but slower to discriminate. In contrast, less predictable verb forms, based on alternating and possibly suppletive stems, are easier to process but are learned by rote. Inflection systems thus strike a balance between these conflicting processing and communicative requirements, while staying within tight learnability bounds, in line with Ackermann and Malouf's Low Conditional Entropy Conjecture. Our quantitative investigation supports a discriminative view of morphological inflection as a collective, emergent system, whose global self-organization rests on a surprisingly small handful of language-independent principles of word coactivation and competition.
The Grey Journal, 2010
The Grey Journal is a flagship journal for the grey literature community. It crosses continents, ... more The Grey Journal is a flagship journal for the grey literature community. It crosses continents, disciplines, and sectors both public and private. The Grey Journal not only deals with the topic of grey literature but is itself a document type classified as grey literature. It is akin to other grey serial publications, such as conference proceedings, reports, working papers, etc.