Claire Gardent - Academia.edu (original) (raw)
Papers by Claire Gardent
In this paper, we introduce SEMTAG, a toolbox for TAG-based parsing and gen- eration. This enviro... more In this paper, we introduce SEMTAG, a toolbox for TAG-based parsing and gen- eration. This environment supports the development of wide-coverage grammars and differs from existing environments for TAG such as XTAG, (XTAG-Research- Group, 2001) in that it includes a semantic dimension. SEMTAG is open-source and freely available.
Dans cet article, nous présentons une architecture logicielle libre et ouverte pour le développem... more Dans cet article, nous présentons une architecture logicielle libre et ouverte pour le développement de grammaires d'arbres adjoints à portée sémantique. Cette architecture utilise un compilateur de métagrammaires afin de faciliter l'extension et la maintenance de la grammaire, et intègre un module de construction sémantique permettant de vérifier la couverture aussi bien syntaxique que sémantique de la grammaire. Ce module utilise un analyseur syntaxique tabulaire généré automatiquement à partir de la grammaire par le système DyALog. Nous présentons également les résultats de l'évaluation d'une grammaire du français développée au moyen de cette architecture.
Proceedings of the COLING/ACL on Main conference poster sessions -, 2006
We claim that existing specification languages for tree based grammars fail to adequately support... more We claim that existing specification languages for tree based grammars fail to adequately support identifier managment. We then show that XMG (eXtensible Meta-Grammar) provides a sophisticated treatment of identifiers which is effective in supporting a linguist-friendly grammar design.
To cite this version: Claire Gardent. Integrating a unification-based semantics in a large scale ... more To cite this version: Claire Gardent. Integrating a unification-based semantics in a large scale Lexicalised Tree Ad-
We present GENSEM, a tool for generating input semantic representations for two sentence generato... more We present GENSEM, a tool for generating input semantic representations for two sentence generators based on the same reversible Tree Adjoining Grammar. We then show how GENSEM can be used to produced large and controlled benchmarks and test the relative performance of these generators.
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific ... more HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. SemTAG: a platform for specifying Tree Adjoining Grammars and
We developed a general framework for the development of a symbolic (hand-written) feature-based l... more We developed a general framework for the development of a symbolic (hand-written) feature-based lexicalised tree-adjoining grammar (FB-LTAG). We choose natural language generation, surface realisation in particular, to question the capabilities of the grammar in terms of both accuracy and robustness. Our framework combines an optimised surface realiser with efficient error mining techniques. While generating from a large data set provided by the Generation Challenge Surface Realisation task, we improve both accuracy and robustness of our grammar significantly.
Transactions of the Association for Computational Linguistics, 2021
The metrics standardly used to evaluate Natural Language Generation (NLG) models, such as BLEU or... more The metrics standardly used to evaluate Natural Language Generation (NLG) models, such as BLEU or METEOR, fail to provide information on which linguistic factors impact performance. Focusing on Surface Realization (SR), the task of converting an unordered dependency tree into a well-formed sentence, we propose a framework for error analysis which permits identifying which features of the input affect the models’ results. This framework consists of two main components: (i) correlation analyses between a wide range of syntactic metrics and standard performance metrics and (ii) a set of techniques to automatically identify syntactic constructs that often co-occur with low performance scores. We demonstrate the advantages of our framework by performing error analysis on the results of 174 system runs submitted to the Multilingual SR shared tasks; we show that dependency edge accuracy correlate with automatic metrics thereby providing a more interpretable basis for evaluation; and we sug...
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017
Seq2seq models based on Recurrent Neural Networks (RNNs) have recently received a lot of attentio... more Seq2seq models based on Recurrent Neural Networks (RNNs) have recently received a lot of attention in the domain of Semantic Parsing. While in principle they can be trained directly on pairs (natural language utterances, logical forms), their performance is limited by the amount of available data. To alleviate this problem, we propose to exploit various sources of prior knowledge: the well-formedness of the logical forms is modeled by a weighted context-free grammar; the likelihood that certain entities present in the input utterance are also present in the logical form is modeled by weighted finite-state automata. The grammar and automata are combined together through an efficient intersection algorithm to form a soft guide (“background”) to the RNN.We test our method on an extension of the Overnight dataset and show that it not only strongly improves over an RNN baseline, but also outperforms non-RNN models based on rich sets of hand-crafted features.
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016), 2016
Recent deep learning approaches to Natural Language Generation mostly rely on sequence-to-sequenc... more Recent deep learning approaches to Natural Language Generation mostly rely on sequence-to-sequence models. In these approaches, the input is treated as a sequence whereas in most cases, input to generation usually is either a tree or a graph. In this paper, we describe an experiment showing how enriching a sequential input with structural information improves results and help support the generation of paraphrases.
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016), 2016
Finding the natural language equivalent of structured data is both a challenging and promising ta... more Finding the natural language equivalent of structured data is both a challenging and promising task. In particular, an efficient alignment of knowledge bases with texts would benefit many applications, including natural language generation, information retrieval and text simplification. In this paper, we present an approach to build a dataset of triples aligned with equivalent sentences written in natural language. Our approach consists of three main steps. First, target sentences are annotated automatically with knowledge base (KB) concepts and instances. The triples linking these elements in the KB are extracted as candidate facts to be aligned with the annotated sentence. Second, we use textual mentions referring to the subject and object of these facts to semantically simplify the target sentence via crowdsourcing. Third, the sentences provided by different contributors are post-processed to keep only the most relevant simplifications for the alignment with KB facts. We present different filtering methods, and share the constructed datasets in the public domain. These datasets contain 1,050 sentences aligned with 1,885 triples. They can be used to train natural language generators as well as semantic or contextual text simplifiers.
Proceedings of the Seventh International Natural Language Generation Conference, May 30, 2012
While in Computer Science, grammar engineering has led to the development of various tools for ch... more While in Computer Science, grammar engineering has led to the development of various tools for checking grammar coherence, completion, under-and over-generation, in Natural Langage Processing, most approaches developed to improve a grammar have focused on detecting under-generation and to a much lesser extent, over-generation. We argue that generation can be exploited to address other issues that are relevant to grammar engineering such as in particular, detecting grammar incompleteness, identifying sources of overgeneration and analysing the linguistic coverage of the grammar. We present an algorithm that implements these functionalities and we report on experiments using this algorithm to analyse a Feature-Based Lexicalised Tree Adjoining Grammar consisting of roughly 1500 elementary trees.
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG), 2015
We present a method for automatically generating descriptions of biological events encoded in the... more We present a method for automatically generating descriptions of biological events encoded in the KB BIO 101 Knowledge base. We evaluate our approach on a corpus of 336 event descriptions, provide a qualitative and quantitative analysis of the results obtained and discuss possible directions for further work.
Computational Linguistics, 2015
In parsing with Tree Adjoining Grammar (TAG), independent derivations have been shown by Schabes ... more In parsing with Tree Adjoining Grammar (TAG), independent derivations have been shown by Schabes and Shieber (1994) to be essential for correctly supporting syntactic analysis, semantic interpretation, and statistical language modeling. However, the parsing algorithm they propose is not directly applicable to Feature-Based TAGs (FB-TAG). We provide a recognition algorithm for FB-TAG that supports both dependent and independent derivations. The resulting algorithm combines the benefits of independent derivations with those of Feature-Based grammars. In particular, we show that it accounts for a range of interactions between dependent vs. independent derivation on the one hand, and syntactic constraints, linear ordering, and scopal vs. nonscopal semantic dependencies on the other hand.
Human Language Technology Challenges for Computer Science and Linguistics, 2014
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific re... more HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Lecture Notes in Computer Science, 2005
We present an intelligent NLI interface, namely Quelo NLI, for querying and exploring semantic da... more We present an intelligent NLI interface, namely Quelo NLI, for querying and exploring semantic data. Its intelligence lies in the use of reasoning services over an ontology. These support the intentional navigation of the underlying datasource and the formulation of queries that are consistent with respect to it. Its Natural Language Generation (NLG) module masks the formulation of queries as the composition of English text and generates descriptions of query answers. An important feature of Quelo NLI is that it is portable as it is not bound to an ontology of a specific domain. We describe Quelo NLI functionality and present a grammar-based natural language generation approach that better supports the domain-independent generation of fluent queries and naturally extends for the generation of answers descriptions. We concentrate on describing the generation resources, namely a domain-independent handwritten grammar and a lexicon that is automatically extracted from concepts and relations of the underlying ontology.
Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms - TAGRF '06, 2006
Surface realisation from flat semantic formulae is known to be exponential in the length of the i... more Surface realisation from flat semantic formulae is known to be exponential in the length of the input. In this paper, we argue that TAG naturally supports the integration of three main ways of reducing complexity: polarity filtering, delayed adjunction and empty semantic items elimination. We support these claims by presenting some preliminary results of the TAG-based surface realiser GenI.
Proceedings of the Eleventh European Workshop on Natural Language Generation - ENLG '07, 2007
We present a method for quickly spotting overgeneration suspects (i.e., likely cause of overgener... more We present a method for quickly spotting overgeneration suspects (i.e., likely cause of overgeneration) in hand-coded grammars. The method is applied to a medium size Tree Adjoining Grammar (TAG) for French and is shown to help reduce the number of outputs by 70% almost all of it being overgeneration.
Proceedings of the 2nd Workshop on Text Meaning and Interpretation - TextMean '04, 2004
Arguably, grammars which associate natural language expressions not only with a syntactic but als... more Arguably, grammars which associate natural language expressions not only with a syntactic but also with a semantic representation, should do so in a way that capture paraphrasing relations between sentences whose core semantics are equivalent. Yet existing semantic grammars fail to do so. In this paper, we describe an ongoing project whose aim is the production of a "paraphrastic grammar" that is, a grammar which associates paraphrases with identical semantic representations. We begin by proposing a typology of paraphrases. We then show how this typology can be used to simultaneously guide the development of a grammar and of a testsuite designed to support the evaluation of this grammar.
In this paper, we introduce SEMTAG, a toolbox for TAG-based parsing and gen- eration. This enviro... more In this paper, we introduce SEMTAG, a toolbox for TAG-based parsing and gen- eration. This environment supports the development of wide-coverage grammars and differs from existing environments for TAG such as XTAG, (XTAG-Research- Group, 2001) in that it includes a semantic dimension. SEMTAG is open-source and freely available.
Dans cet article, nous présentons une architecture logicielle libre et ouverte pour le développem... more Dans cet article, nous présentons une architecture logicielle libre et ouverte pour le développement de grammaires d'arbres adjoints à portée sémantique. Cette architecture utilise un compilateur de métagrammaires afin de faciliter l'extension et la maintenance de la grammaire, et intègre un module de construction sémantique permettant de vérifier la couverture aussi bien syntaxique que sémantique de la grammaire. Ce module utilise un analyseur syntaxique tabulaire généré automatiquement à partir de la grammaire par le système DyALog. Nous présentons également les résultats de l'évaluation d'une grammaire du français développée au moyen de cette architecture.
Proceedings of the COLING/ACL on Main conference poster sessions -, 2006
We claim that existing specification languages for tree based grammars fail to adequately support... more We claim that existing specification languages for tree based grammars fail to adequately support identifier managment. We then show that XMG (eXtensible Meta-Grammar) provides a sophisticated treatment of identifiers which is effective in supporting a linguist-friendly grammar design.
To cite this version: Claire Gardent. Integrating a unification-based semantics in a large scale ... more To cite this version: Claire Gardent. Integrating a unification-based semantics in a large scale Lexicalised Tree Ad-
We present GENSEM, a tool for generating input semantic representations for two sentence generato... more We present GENSEM, a tool for generating input semantic representations for two sentence generators based on the same reversible Tree Adjoining Grammar. We then show how GENSEM can be used to produced large and controlled benchmarks and test the relative performance of these generators.
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific ... more HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. SemTAG: a platform for specifying Tree Adjoining Grammars and
We developed a general framework for the development of a symbolic (hand-written) feature-based l... more We developed a general framework for the development of a symbolic (hand-written) feature-based lexicalised tree-adjoining grammar (FB-LTAG). We choose natural language generation, surface realisation in particular, to question the capabilities of the grammar in terms of both accuracy and robustness. Our framework combines an optimised surface realiser with efficient error mining techniques. While generating from a large data set provided by the Generation Challenge Surface Realisation task, we improve both accuracy and robustness of our grammar significantly.
Transactions of the Association for Computational Linguistics, 2021
The metrics standardly used to evaluate Natural Language Generation (NLG) models, such as BLEU or... more The metrics standardly used to evaluate Natural Language Generation (NLG) models, such as BLEU or METEOR, fail to provide information on which linguistic factors impact performance. Focusing on Surface Realization (SR), the task of converting an unordered dependency tree into a well-formed sentence, we propose a framework for error analysis which permits identifying which features of the input affect the models’ results. This framework consists of two main components: (i) correlation analyses between a wide range of syntactic metrics and standard performance metrics and (ii) a set of techniques to automatically identify syntactic constructs that often co-occur with low performance scores. We demonstrate the advantages of our framework by performing error analysis on the results of 174 system runs submitted to the Multilingual SR shared tasks; we show that dependency edge accuracy correlate with automatic metrics thereby providing a more interpretable basis for evaluation; and we sug...
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017
Seq2seq models based on Recurrent Neural Networks (RNNs) have recently received a lot of attentio... more Seq2seq models based on Recurrent Neural Networks (RNNs) have recently received a lot of attention in the domain of Semantic Parsing. While in principle they can be trained directly on pairs (natural language utterances, logical forms), their performance is limited by the amount of available data. To alleviate this problem, we propose to exploit various sources of prior knowledge: the well-formedness of the logical forms is modeled by a weighted context-free grammar; the likelihood that certain entities present in the input utterance are also present in the logical form is modeled by weighted finite-state automata. The grammar and automata are combined together through an efficient intersection algorithm to form a soft guide (“background”) to the RNN.We test our method on an extension of the Overnight dataset and show that it not only strongly improves over an RNN baseline, but also outperforms non-RNN models based on rich sets of hand-crafted features.
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016), 2016
Recent deep learning approaches to Natural Language Generation mostly rely on sequence-to-sequenc... more Recent deep learning approaches to Natural Language Generation mostly rely on sequence-to-sequence models. In these approaches, the input is treated as a sequence whereas in most cases, input to generation usually is either a tree or a graph. In this paper, we describe an experiment showing how enriching a sequential input with structural information improves results and help support the generation of paraphrases.
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016), 2016
Finding the natural language equivalent of structured data is both a challenging and promising ta... more Finding the natural language equivalent of structured data is both a challenging and promising task. In particular, an efficient alignment of knowledge bases with texts would benefit many applications, including natural language generation, information retrieval and text simplification. In this paper, we present an approach to build a dataset of triples aligned with equivalent sentences written in natural language. Our approach consists of three main steps. First, target sentences are annotated automatically with knowledge base (KB) concepts and instances. The triples linking these elements in the KB are extracted as candidate facts to be aligned with the annotated sentence. Second, we use textual mentions referring to the subject and object of these facts to semantically simplify the target sentence via crowdsourcing. Third, the sentences provided by different contributors are post-processed to keep only the most relevant simplifications for the alignment with KB facts. We present different filtering methods, and share the constructed datasets in the public domain. These datasets contain 1,050 sentences aligned with 1,885 triples. They can be used to train natural language generators as well as semantic or contextual text simplifiers.
Proceedings of the Seventh International Natural Language Generation Conference, May 30, 2012
While in Computer Science, grammar engineering has led to the development of various tools for ch... more While in Computer Science, grammar engineering has led to the development of various tools for checking grammar coherence, completion, under-and over-generation, in Natural Langage Processing, most approaches developed to improve a grammar have focused on detecting under-generation and to a much lesser extent, over-generation. We argue that generation can be exploited to address other issues that are relevant to grammar engineering such as in particular, detecting grammar incompleteness, identifying sources of overgeneration and analysing the linguistic coverage of the grammar. We present an algorithm that implements these functionalities and we report on experiments using this algorithm to analyse a Feature-Based Lexicalised Tree Adjoining Grammar consisting of roughly 1500 elementary trees.
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG), 2015
We present a method for automatically generating descriptions of biological events encoded in the... more We present a method for automatically generating descriptions of biological events encoded in the KB BIO 101 Knowledge base. We evaluate our approach on a corpus of 336 event descriptions, provide a qualitative and quantitative analysis of the results obtained and discuss possible directions for further work.
Computational Linguistics, 2015
In parsing with Tree Adjoining Grammar (TAG), independent derivations have been shown by Schabes ... more In parsing with Tree Adjoining Grammar (TAG), independent derivations have been shown by Schabes and Shieber (1994) to be essential for correctly supporting syntactic analysis, semantic interpretation, and statistical language modeling. However, the parsing algorithm they propose is not directly applicable to Feature-Based TAGs (FB-TAG). We provide a recognition algorithm for FB-TAG that supports both dependent and independent derivations. The resulting algorithm combines the benefits of independent derivations with those of Feature-Based grammars. In particular, we show that it accounts for a range of interactions between dependent vs. independent derivation on the one hand, and syntactic constraints, linear ordering, and scopal vs. nonscopal semantic dependencies on the other hand.
Human Language Technology Challenges for Computer Science and Linguistics, 2014
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific re... more HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Lecture Notes in Computer Science, 2005
We present an intelligent NLI interface, namely Quelo NLI, for querying and exploring semantic da... more We present an intelligent NLI interface, namely Quelo NLI, for querying and exploring semantic data. Its intelligence lies in the use of reasoning services over an ontology. These support the intentional navigation of the underlying datasource and the formulation of queries that are consistent with respect to it. Its Natural Language Generation (NLG) module masks the formulation of queries as the composition of English text and generates descriptions of query answers. An important feature of Quelo NLI is that it is portable as it is not bound to an ontology of a specific domain. We describe Quelo NLI functionality and present a grammar-based natural language generation approach that better supports the domain-independent generation of fluent queries and naturally extends for the generation of answers descriptions. We concentrate on describing the generation resources, namely a domain-independent handwritten grammar and a lexicon that is automatically extracted from concepts and relations of the underlying ontology.
Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms - TAGRF '06, 2006
Surface realisation from flat semantic formulae is known to be exponential in the length of the i... more Surface realisation from flat semantic formulae is known to be exponential in the length of the input. In this paper, we argue that TAG naturally supports the integration of three main ways of reducing complexity: polarity filtering, delayed adjunction and empty semantic items elimination. We support these claims by presenting some preliminary results of the TAG-based surface realiser GenI.
Proceedings of the Eleventh European Workshop on Natural Language Generation - ENLG '07, 2007
We present a method for quickly spotting overgeneration suspects (i.e., likely cause of overgener... more We present a method for quickly spotting overgeneration suspects (i.e., likely cause of overgeneration) in hand-coded grammars. The method is applied to a medium size Tree Adjoining Grammar (TAG) for French and is shown to help reduce the number of outputs by 70% almost all of it being overgeneration.
Proceedings of the 2nd Workshop on Text Meaning and Interpretation - TextMean '04, 2004
Arguably, grammars which associate natural language expressions not only with a syntactic but als... more Arguably, grammars which associate natural language expressions not only with a syntactic but also with a semantic representation, should do so in a way that capture paraphrasing relations between sentences whose core semantics are equivalent. Yet existing semantic grammars fail to do so. In this paper, we describe an ongoing project whose aim is the production of a "paraphrastic grammar" that is, a grammar which associates paraphrases with identical semantic representations. We begin by proposing a typology of paraphrases. We then show how this typology can be used to simultaneously guide the development of a grammar and of a testsuite designed to support the evaluation of this grammar.