Chuyuan Li | Wuhan University of Technology (original) (raw)
Papers by Chuyuan Li
HAL (Le Centre pour la Communication Scientifique Directe), Sep 7, 2022
Dialogue & Discourse
The main aim of this paper is to provide a characterization of the response space for questions u... more The main aim of this paper is to provide a characterization of the response space for questions using a taxonomy grounded in a dialogical formal semantics. As a starting point we take the typology for responses in the form of questions provided in \cite{lupginz-jlm}. This work develops a wide coverage taxonomy for question/question sequences observable in corpora including the BNC, CHILDES, and BEE, as well as formal modeling of all the postulated classes. Our aim is to extend this work to cover \emph{all} responses to questions. We present the extended typology of responses to questions based on a corpus studies of BNC, BEE, Maptask and CornellMovie with include 506, 262, 467, and 678 question/response pairs respectively. We compare the data for English with data from Polish using the Spokes corpus (694 question/response pairs). We discuss annotation reliability and disagreement analysis. We sketch how each class can be formalized using a dialogical semantics appropriate for dialog...
HAL (Le Centre pour la Communication Scientifique Directe), Jun 20, 2022
This paper describes the continuation of a project that aims at establishing an interoperable ann... more This paper describes the continuation of a project that aims at establishing an interoperable annotation scheme for quantification phenomena as part of the ISO suite of standards for semantic annotation, known as the Semantic Annotation Framework. After a break, caused by the Covid-19 pandemic, the project was relaunched in early 2022 with a second working draft, which deals with certain issues in the annotation of quantification in a more satisfactory way than the original first working draft.
Nous presentons des experiences visant a identifier automatiquement des patients presentant des s... more Nous presentons des experiences visant a identifier automatiquement des patients presentant des symptomes de schizophrenie dans des conversations controlees entre patients et psychotherapeutes. Nous fusionnons l’ensemble des tours de parole de chaque interlocuteur et entrainons des modeles de classification utilisant des informations lexicales, morphologiques et syntaxiques. Cette etude est la premiere du genre sur le francais et obtient des resultats comparables a celles sur l’anglais. Nos premieres experiences tendent a montrer que la parole des personnes avec schizophrenie se distingue de celle des temoins : le meilleur modele obtient une exactitude de 93,66%. Des informations plus riches seront cependant necessaires pour parvenir a un modele robuste.
We propose the annotation of 7 sentences out of the 31 provided in the ISA-17 shared task, accord... more We propose the annotation of 7 sentences out of the 31 provided in the ISA-17 shared task, according to our understanding of the guidelines. We include here several remarks to improve the annotation and provide some tools to make the task easier.
Proceedings of the 2nd Workshop on Computational Approaches to Discourse, 2021
We investigate linguistic markers associated with schizophrenia in clinical conversations by dete... more We investigate linguistic markers associated with schizophrenia in clinical conversations by detecting predictive features among Frenchspeaking patients. Dealing with humanhuman dialogues makes for a realistic situation, but it calls for strategies to represent the context and face data sparsity. We compare different approaches for data representation -from individual speech turns to entire conversations -, and data modeling, using lexical, morphological, syntactic, and discourse features, dimensions presumed to be tightly connected to the language of schizophrenia. Previous English models were mostly lexical and reached high performance, here replicated (93.7% acc.). However, our analysis reveals that these models are heavily biased, which probably concerns most datasets on this task. Our new delexicalized models are more general and robust, with the best accuracy score at 77.9%.
Nous présentons des expériences visant à identifier automatiquement des patients présentant des s... more Nous présentons des expériences visant à identifier automatiquement des patients présentant des symptômes de schizophrénie dans des conversations contrôlées entre patients et psychothérapeutes. Nous fusionnons l’ensemble des tours de parole de chaque interlocuteur et entraînons des modèles de classification utilisant des informations lexicales, morphologiques et syntaxiques. Cette étude est la première du genre sur le français et obtient des résultats comparables à celles sur l’anglais. Nos premières expériences tendent à montrer que la parole des personnes avec schizophrénie se distingue de celle des témoins : le meilleur modèle obtient une exactitude de 93,66%. Des informations plus riches seront cependant nécessaires pour parvenir à un modèle robuste.
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
The main aim of this paper is to provide a characterization of the response space for questions u... more The main aim of this paper is to provide a characterization of the response space for questions using a taxonomy grounded in a dialogical formal semantics. As a starting point we take the typology for responses in the form of questions provided in (Łupkowski and Ginzburg, 2016). This work develops a wide coverage taxonomy for question/question sequences observable in corpora including the BNC, CHILDES, and BEE, as well as formal modelling of all the postulated classes. Our aim is to extend this work to cover all responses to questions. We present the extended typology of responses to questions based on a corpus studies of BNC, BEE and Maptask with include 506, 262, and 467 question/response pairs respectively. We compare the data for English with data from Polish using the Spokes corpus (205 question/response pairs). We discuss annotation reliability and disagreement analysis. We sketch how each class can be formalized using a dialogical semantics appropriate for dialogue management.
HAL (Le Centre pour la Communication Scientifique Directe), Sep 7, 2022
Dialogue & Discourse
The main aim of this paper is to provide a characterization of the response space for questions u... more The main aim of this paper is to provide a characterization of the response space for questions using a taxonomy grounded in a dialogical formal semantics. As a starting point we take the typology for responses in the form of questions provided in \cite{lupginz-jlm}. This work develops a wide coverage taxonomy for question/question sequences observable in corpora including the BNC, CHILDES, and BEE, as well as formal modeling of all the postulated classes. Our aim is to extend this work to cover \emph{all} responses to questions. We present the extended typology of responses to questions based on a corpus studies of BNC, BEE, Maptask and CornellMovie with include 506, 262, 467, and 678 question/response pairs respectively. We compare the data for English with data from Polish using the Spokes corpus (694 question/response pairs). We discuss annotation reliability and disagreement analysis. We sketch how each class can be formalized using a dialogical semantics appropriate for dialog...
HAL (Le Centre pour la Communication Scientifique Directe), Jun 20, 2022
This paper describes the continuation of a project that aims at establishing an interoperable ann... more This paper describes the continuation of a project that aims at establishing an interoperable annotation scheme for quantification phenomena as part of the ISO suite of standards for semantic annotation, known as the Semantic Annotation Framework. After a break, caused by the Covid-19 pandemic, the project was relaunched in early 2022 with a second working draft, which deals with certain issues in the annotation of quantification in a more satisfactory way than the original first working draft.
Nous presentons des experiences visant a identifier automatiquement des patients presentant des s... more Nous presentons des experiences visant a identifier automatiquement des patients presentant des symptomes de schizophrenie dans des conversations controlees entre patients et psychotherapeutes. Nous fusionnons l’ensemble des tours de parole de chaque interlocuteur et entrainons des modeles de classification utilisant des informations lexicales, morphologiques et syntaxiques. Cette etude est la premiere du genre sur le francais et obtient des resultats comparables a celles sur l’anglais. Nos premieres experiences tendent a montrer que la parole des personnes avec schizophrenie se distingue de celle des temoins : le meilleur modele obtient une exactitude de 93,66%. Des informations plus riches seront cependant necessaires pour parvenir a un modele robuste.
We propose the annotation of 7 sentences out of the 31 provided in the ISA-17 shared task, accord... more We propose the annotation of 7 sentences out of the 31 provided in the ISA-17 shared task, according to our understanding of the guidelines. We include here several remarks to improve the annotation and provide some tools to make the task easier.
Proceedings of the 2nd Workshop on Computational Approaches to Discourse, 2021
We investigate linguistic markers associated with schizophrenia in clinical conversations by dete... more We investigate linguistic markers associated with schizophrenia in clinical conversations by detecting predictive features among Frenchspeaking patients. Dealing with humanhuman dialogues makes for a realistic situation, but it calls for strategies to represent the context and face data sparsity. We compare different approaches for data representation -from individual speech turns to entire conversations -, and data modeling, using lexical, morphological, syntactic, and discourse features, dimensions presumed to be tightly connected to the language of schizophrenia. Previous English models were mostly lexical and reached high performance, here replicated (93.7% acc.). However, our analysis reveals that these models are heavily biased, which probably concerns most datasets on this task. Our new delexicalized models are more general and robust, with the best accuracy score at 77.9%.
Nous présentons des expériences visant à identifier automatiquement des patients présentant des s... more Nous présentons des expériences visant à identifier automatiquement des patients présentant des symptômes de schizophrénie dans des conversations contrôlées entre patients et psychothérapeutes. Nous fusionnons l’ensemble des tours de parole de chaque interlocuteur et entraînons des modèles de classification utilisant des informations lexicales, morphologiques et syntaxiques. Cette étude est la première du genre sur le français et obtient des résultats comparables à celles sur l’anglais. Nos premières expériences tendent à montrer que la parole des personnes avec schizophrénie se distingue de celle des témoins : le meilleur modèle obtient une exactitude de 93,66%. Des informations plus riches seront cependant nécessaires pour parvenir à un modèle robuste.
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
The main aim of this paper is to provide a characterization of the response space for questions u... more The main aim of this paper is to provide a characterization of the response space for questions using a taxonomy grounded in a dialogical formal semantics. As a starting point we take the typology for responses in the form of questions provided in (Łupkowski and Ginzburg, 2016). This work develops a wide coverage taxonomy for question/question sequences observable in corpora including the BNC, CHILDES, and BEE, as well as formal modelling of all the postulated classes. Our aim is to extend this work to cover all responses to questions. We present the extended typology of responses to questions based on a corpus studies of BNC, BEE and Maptask with include 506, 262, and 467 question/response pairs respectively. We compare the data for English with data from Polish using the Spokes corpus (205 question/response pairs). We discuss annotation reliability and disagreement analysis. We sketch how each class can be formalized using a dialogical semantics appropriate for dialogue management.