Albert Gatt | University of Malta (original) (raw)
Papers by Albert Gatt
Abstract. Many real-world applications that reason about events obtained from raw data must deal ... more Abstract. Many real-world applications that reason about events obtained from raw data must deal with the problem of temporal uncertainty, which arises due to error or inaccuracy in data. Uncertainty also compromises reasoning where relationships between events need to be inferred.
Abstract A substantial amount of recent work in natural language generation has focused on the ge... more Abstract A substantial amount of recent work in natural language generation has focused on the generation of ''one-shot''referring expressions whose only aim is to identify a target referent. Dale and Reiter's Incremental Algorithm (IA) is often thought to be the best algorithm for maximizing the similarity to referring expressions produced by people. We test this hypothesis by eliciting referring expressions from human subjects and computing the similarity between the expressions elicited and the ones generated by algorithms.
Welcome to the Seventh International Natural Language Generation Conference (INLG 2012). INLG 201... more Welcome to the Seventh International Natural Language Generation Conference (INLG 2012). INLG 2012 is the biennial meeting of the ACL Special Interest Group on Natural Language Generation (SIGGEN). The INLG conference provides the premier forum for the discussion, dissemination, and archiving of research and results in the field of Natural Language Generation. Previous INLG conferences have been held in Ireland, the USA, Australia, the UK and Israel.
We are pleased to introduce the technical program of the Fifth International Natural Language Gen... more We are pleased to introduce the technical program of the Fifth International Natural Language Generation Conference (INLG 2008), the Biennial Meeting of SIGGEN, the ACL Special Interest Group in Natural Language Generation. INLG is the leading international conference on research into natural language generation. It has been held in Sydney (Australia) in 2006, at Brockenhurst (UK) in 2004, in Harriman (New York, USA) in 2002, and in Mitzpe Ramon (Israel) in 2000.
Abstract This paper discusses the ongoing development of a new Maltese spell checker, highlightin... more Abstract This paper discusses the ongoing development of a new Maltese spell checker, highlighting the methodologies which would best suit such a language. We thus discuss several previous attempts, highlighting what we believe to be their weakest point: a lack of attention to context. Two developments are of particular interest, both of which concern the availability of language resources relevant to spellchecking:(i) the Maltese Language Resource Server (MLRS) which now includes a representative corpus of c.
Abstract This paper explores the role of semantic similarity in content selection and aggregation... more Abstract This paper explores the role of semantic similarity in content selection and aggregation of expressions referring to sets. Similarity plays a role in ensuring that a referring expression corresponds to a coherent conceptual gestalt. On the basis of corpusbased and experimental evidence we propose an algorithm which (a) separates content selection and aggregation to avoid a combinatorial explosion;(b) uses similarity between entities to prioritise among search alternatives.
Abstract In referring to a target referent, speakers need to choose a set of properties that join... more Abstract In referring to a target referent, speakers need to choose a set of properties that jointly distinguish it from its distractors. Current computational models view this as a search process in which the decision to include a property requires checking how many distractors it excludes. Thus, these models predict that identifying descriptions should take longer to produce the larger the distractor set is, independent of how many properties are required to identify a target.
Abstract Evaluations of NLG systems generally are quantiative, that is, based on corpus compariso... more Abstract Evaluations of NLG systems generally are quantiative, that is, based on corpus comparison statistics and/or results of experiments with people. Outcomes of such evaluations are important in demonstrating whether or not an NLG system is successful, but leave gaps in understanding why this is the case. Alternatively, qualitative evaluations carried out by experts provide knowledge on where a system needs to be improved.
User Modeling and User-Adapted Interaction, Jan 1, 2006
Reference production and generation by Albert Gatt
Maltese noun phrases exhibit a form of 'definiteness agreement' between head noun and modifier. W... more Maltese noun phrases exhibit a form of 'definiteness agreement' between head noun and modifier. When the noun is definite, an adjectival modifier is often overtly marked as definite as well. However, the status of this phenomenon as a case of true morphosyntactic agreement has been disputed, given its apparent optionality. Not all definite nps have modifiers which are overtly marked as definite. Some authors have argued that definiteness marking on the adjective is in fact pragmatically licensed. The present paper presents a corpus-based study of the distribution of adjectives with and without definite marking, and then tests the pragmatic licensing claim through a production study. Speakers were found to be more likely to use definite adjectives in referential noun phrases when the adjectives had a specifically contrastive function. This result is discussed in the context of both theoretical and psycholinguistic work on the pragmatics of referentiality.
Proceedings of ACL-08: HLT, Short Papers, Jan 1, 2008
Proceedings of the Fifth International Natural …, Jan 1, 2008
Unpublished PhD thesis, University of Aberdeen, Jan 1, 2007
What constitutes an adequate reference to a set of objects? Despite intensive research on the Gen... more What constitutes an adequate reference to a set of objects? Despite intensive research on the Generation of Referring Expressions (gre), many gre algorithms either lack empirical backing, or are motivated by concerns which arguably shift their focus away from the crucial problem, which is to generate natural descriptions, much as a person would generate them in a comparable situation. This problem becomes much more pronounced in the case of plural reference, where even psycholinguistic research is lacking. This thesis focuses on ...
Journal of Logic, Language and Information, Jan 1, 2007
Proceedings of the pre- …, Jan 1, 2009
Empirical methods in natural language …, Jan 1, 2011
Proceedings of the COLING/ACL on Main …, Jan 1, 2006
Abstract. Many real-world applications that reason about events obtained from raw data must deal ... more Abstract. Many real-world applications that reason about events obtained from raw data must deal with the problem of temporal uncertainty, which arises due to error or inaccuracy in data. Uncertainty also compromises reasoning where relationships between events need to be inferred.
Abstract A substantial amount of recent work in natural language generation has focused on the ge... more Abstract A substantial amount of recent work in natural language generation has focused on the generation of ''one-shot''referring expressions whose only aim is to identify a target referent. Dale and Reiter's Incremental Algorithm (IA) is often thought to be the best algorithm for maximizing the similarity to referring expressions produced by people. We test this hypothesis by eliciting referring expressions from human subjects and computing the similarity between the expressions elicited and the ones generated by algorithms.
Welcome to the Seventh International Natural Language Generation Conference (INLG 2012). INLG 201... more Welcome to the Seventh International Natural Language Generation Conference (INLG 2012). INLG 2012 is the biennial meeting of the ACL Special Interest Group on Natural Language Generation (SIGGEN). The INLG conference provides the premier forum for the discussion, dissemination, and archiving of research and results in the field of Natural Language Generation. Previous INLG conferences have been held in Ireland, the USA, Australia, the UK and Israel.
We are pleased to introduce the technical program of the Fifth International Natural Language Gen... more We are pleased to introduce the technical program of the Fifth International Natural Language Generation Conference (INLG 2008), the Biennial Meeting of SIGGEN, the ACL Special Interest Group in Natural Language Generation. INLG is the leading international conference on research into natural language generation. It has been held in Sydney (Australia) in 2006, at Brockenhurst (UK) in 2004, in Harriman (New York, USA) in 2002, and in Mitzpe Ramon (Israel) in 2000.
Abstract This paper discusses the ongoing development of a new Maltese spell checker, highlightin... more Abstract This paper discusses the ongoing development of a new Maltese spell checker, highlighting the methodologies which would best suit such a language. We thus discuss several previous attempts, highlighting what we believe to be their weakest point: a lack of attention to context. Two developments are of particular interest, both of which concern the availability of language resources relevant to spellchecking:(i) the Maltese Language Resource Server (MLRS) which now includes a representative corpus of c.
Abstract This paper explores the role of semantic similarity in content selection and aggregation... more Abstract This paper explores the role of semantic similarity in content selection and aggregation of expressions referring to sets. Similarity plays a role in ensuring that a referring expression corresponds to a coherent conceptual gestalt. On the basis of corpusbased and experimental evidence we propose an algorithm which (a) separates content selection and aggregation to avoid a combinatorial explosion;(b) uses similarity between entities to prioritise among search alternatives.
Abstract In referring to a target referent, speakers need to choose a set of properties that join... more Abstract In referring to a target referent, speakers need to choose a set of properties that jointly distinguish it from its distractors. Current computational models view this as a search process in which the decision to include a property requires checking how many distractors it excludes. Thus, these models predict that identifying descriptions should take longer to produce the larger the distractor set is, independent of how many properties are required to identify a target.
Abstract Evaluations of NLG systems generally are quantiative, that is, based on corpus compariso... more Abstract Evaluations of NLG systems generally are quantiative, that is, based on corpus comparison statistics and/or results of experiments with people. Outcomes of such evaluations are important in demonstrating whether or not an NLG system is successful, but leave gaps in understanding why this is the case. Alternatively, qualitative evaluations carried out by experts provide knowledge on where a system needs to be improved.
User Modeling and User-Adapted Interaction, Jan 1, 2006
Maltese noun phrases exhibit a form of 'definiteness agreement' between head noun and modifier. W... more Maltese noun phrases exhibit a form of 'definiteness agreement' between head noun and modifier. When the noun is definite, an adjectival modifier is often overtly marked as definite as well. However, the status of this phenomenon as a case of true morphosyntactic agreement has been disputed, given its apparent optionality. Not all definite nps have modifiers which are overtly marked as definite. Some authors have argued that definiteness marking on the adjective is in fact pragmatically licensed. The present paper presents a corpus-based study of the distribution of adjectives with and without definite marking, and then tests the pragmatic licensing claim through a production study. Speakers were found to be more likely to use definite adjectives in referential noun phrases when the adjectives had a specifically contrastive function. This result is discussed in the context of both theoretical and psycholinguistic work on the pragmatics of referentiality.
Proceedings of ACL-08: HLT, Short Papers, Jan 1, 2008
Proceedings of the Fifth International Natural …, Jan 1, 2008
Unpublished PhD thesis, University of Aberdeen, Jan 1, 2007
What constitutes an adequate reference to a set of objects? Despite intensive research on the Gen... more What constitutes an adequate reference to a set of objects? Despite intensive research on the Generation of Referring Expressions (gre), many gre algorithms either lack empirical backing, or are motivated by concerns which arguably shift their focus away from the crucial problem, which is to generate natural descriptions, much as a person would generate them in a comparable situation. This problem becomes much more pronounced in the case of plural reference, where even psycholinguistic research is lacking. This thesis focuses on ...
Journal of Logic, Language and Information, Jan 1, 2007
Proceedings of the pre- …, Jan 1, 2009
Empirical methods in natural language …, Jan 1, 2011
Proceedings of the COLING/ACL on Main …, Jan 1, 2006
Proc. International Conference on …, Jan 1, 2007
Proceedings of the Fifth International Natural …, Jan 1, 2008
Proceedings of the 11th Meeting of the EACL, Jan 1, 2006
Proceedings of the 12th European Workshop …, Jan 1, 2009
Proceedings of the Fifth International Natural …, Jan 1, 2008
Proceedings of UCNLG+ MT: Language Generation …, Jan 1, 2007
Proceedings of the …, Jan 1, 2007
Proceedings of the Fourth …, Jan 1, 2006
Abstract Referring expressions (such as the red chair facing right) often show evidence of prefer... more Abstract Referring expressions (such as the red chair facing right) often show evidence of preferences (Pechmann, 1989; Belke & Meyer, 2002), with some attributes (eg colour) being more frequent and more often included when they are not required, leading to overspecified references. This observation underlies many computational models of Referring Expression Generation, especially those influenced by Dale & Reiter's (1995) Incremental Algorithm.
The past decade1, has witnessed renewed interest in the Generation of Referring Expressions (GRE)... more The past decade1, has witnessed renewed interest in the Generation of Referring Expressions (GRE)[23, 24, 8, 9, 10, 12, 22]. Broadening the scope beyond earlier work [3, 4, 5], recent proposals involve algorithms that refer to sets as well as individuals, using operations such as set union ('the cat and the dogs') and complementation ('the dog that is not black'). As a consequence, it has become more difficult for a generator to choose among alternative expressions that may be coextensive.
In this paper, we discuss the evaluation measures proposed in a number of recent papers associate... more In this paper, we discuss the evaluation measures proposed in a number of recent papers associated with the TUNA project1, and which have become an important component of the First NLG Shared Task and Evaluation Campaign (STEC) on attribute selection for referring expressions generation. Focusing on reference to individual objects, we discuss what such evaluation measures should be expected to achieve, and what alternative measures merit consideration.
Generation of Referring Expressions (GRE), eg, Dale and Reiter (1995), is one of the core tasks o... more Generation of Referring Expressions (GRE), eg, Dale and Reiter (1995), is one of the core tasks of Natural Language Generation (NLG) systems. Usually it is formulated as an identification problem: given a domain representing entities and their properties, construct a referring expression for a target referent or set of target referents which singles it out from its distractors.
This paper surveys the current state of the art in Natural Language Generation (nlg), defined as ... more This paper surveys the current state of the art in Natural Language Generation (nlg), defined as the task of generating text or speech from non-linguistic input. A survey of nlg is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of nlg technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in nlg and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between nlg and other areas of artificial intelligence; (c) draw attention to the challenges in nlg evaluation, relating them to similar challenges faced in other areas of nlp, with an emphasis on different evaluation methods and the relationships between them.
Abstract Starting in 2007, the field of natural language generation (NLG) has organised shared-ta... more Abstract Starting in 2007, the field of natural language generation (NLG) has organised shared-task evaluation events every year, under the Generation Challenges umbrella. In the course of these shared tasks, a wealth of data has been created, along with associated task definitions and evaluation regimes. In other contexts too, sharable NLG data is now being created.
AI …, Jan 1, 2009
Contemporary Neonatal Intensive Care Units collect vast amounts of patient data in various format... more Contemporary Neonatal Intensive Care Units collect vast amounts of patient data in various formats, making efficient processing of information by medical professionals difficult. Moreover, different stakeholders in the neonatal scenario, which include parents as well as staff occupying different roles, have different information requirements. This paper describes recent and ongoing work on building systems that automatically generate textual summaries of
Proceedings of the Fifth …, Jan 1, 2008
Proceedings of the 12th European Workshop on …, Jan 1, 2009
Artificial Intelligence, Jan 1, 2009
Abstract. Shared Task Evaluation Challenges (stecs) have only recently begun in the field of nlg.... more Abstract. Shared Task Evaluation Challenges (stecs) have only recently begun in the field of nlg. The tuna stecs, which focused on Referring Expression Generation (reg), have been part of this development since its inception. This chapter looks back on the experience of organising the three tuna Challenges, which came to an end in 2009.
Abstract An important question in the evaluation of Natural Language Generation systems concerns ... more Abstract An important question in the evaluation of Natural Language Generation systems concerns the relationship between textual characteristics and task performance. If the results of task-based evaluation can be correlated to properties of the text, there are better prospects for improving the system.
Abstract This paper investigates the relationship between the results of an extrinsic, task-based... more Abstract This paper investigates the relationship between the results of an extrinsic, task-based evaluation of an NLG system and various metrics measuring both surface and deep semantic textual properties, including relevance. The latter rely heavily on domain knowledge. We show that they correlate systematically with some measures of performance. The core argument of this paper is that more domain knowledge-based metrics shed more light on the relationship between deep semantic properties of a text and task performance.
Page 1. Hunter BT-Nurse BT-Nurse: Computer Generation of Natural Language Shift Summaries from Co... more Page 1. Hunter BT-Nurse BT-Nurse: Computer Generation of Natural Language Shift Summaries from Complex Heterogeneous Medical Data 1. James Hunter, Department of Computing Science, University of Aberdeen, DPhil 2. Yvonne Freer, Simpson Centre for Reproductive Health, Royal Infirmary of Edinburgh, PhD 3. Albert Gatt, Institute of Linguistics, Centre for Communication Technology, University of Malta, PhD 4.
INTRODUCTION: Our objective was to determine whether and how a computer system could automaticall... more INTRODUCTION: Our objective was to determine whether and how a computer system could automatically generate helpful natural language nursing shift summaries solely from an electronic patient record system, in a neonatal intensive care unit (NICU). METHODS: A system was developed which automatically generates partial NICU shift summaries (for the respiratory and cardiovascular systems), using data-to-text technology. It was evaluated for 2 months in the NICU at the Royal Infirmary of Edinburgh, under supervision.
Abstract. In the drive to improve patient safety, patients in modern intensive care units are clo... more Abstract. In the drive to improve patient safety, patients in modern intensive care units are closely monitored with the generation of very large volumes of data. Unless the data are further processed, it is difficult for medical and nursing staff to assimilate what is important. It has been demonstrated that data summarization in natural language has the potential to improve clinical decision making; we have implemented and evaluated a prototype system which generates such textual summaries automatically.
Abstract Temporal uncertainty in raw data can impede the inference of temporal and causal relatio... more Abstract Temporal uncertainty in raw data can impede the inference of temporal and causal relationships between events and compromise the output of data-to-text NLG systems. In this paper, we introduce a framework to reason with and represent temporal uncertainty from the raw data to the generated text, in order to provide a faithful picture to the user of a particular situation. The model is grounded in experimental data from multiple languages, shedding light on the generality of the approach.
Abstract It has been shown that summarizing complex multichannel physiological and discrete data ... more Abstract It has been shown that summarizing complex multichannel physiological and discrete data in natural language (text) can lead to better decision-making in the intensive care unit (ICU). As part of the BabyTalk project, we describe a prototype system (BT-45) which can generate such textual summaries automatically.
Résumé Notre société génère une masse d'information toujours croissante, que ce soit en médecine,... more Résumé Notre société génère une masse d'information toujours croissante, que ce soit en médecine, en météorologie, etc. La méthode la plus employée pour analyser ces données est de les résumer sous forme graphique. Cependant, il a été démontré qu'un résumé textuel est aussi un mode de présentation efficace.
Third Arabic Natural Language Processing Workshop (WANLP'17)
Maltese is a morphologically rich language with a hybrid morphological system which features both... more Maltese is a morphologically rich language with a hybrid morphological system which features both concatenative and non-concatenative processes. This paper analyses the impact of this hy-bridity on the performance of machine learning techniques for morphological labelling and clustering. In particular, we analyse a dataset of morphologically related word clusters to evaluate the difference in results for concatenative and non-concatenative clusters. We also describe research carried out in morphological labelling , with a particular focus on the verb category. Two evaluations were carried out, one using an unseen dataset, and another one using a gold standard dataset which was manually labelled. The gold standard dataset was split into concatena-tive and non-concatenative to analyse the difference in results between the two morphological systems.
Among the derivational processes that have been adopted into Maltese based on the Romance model, ... more Among the derivational processes that have been adopted into Maltese based on the Romance model, there are processes to derive nouns from verbs which are relatively recent developments. Examples include the use of the suffix -ar(e.g., spara/sparar `shoot'/`(the) shooting'), and the use of -(z)zjoni (e.g., spjega/spjegazzjoni `explain'/`explanantion'). This paper discusses these processes in the context of Maltese derivation in general. After a brief theoretical exposition and an overview of Maltes derivation, we present a corpus-based analysis of the productivity of -Var and -(z)zjoni derivations, followed by an analysis of the evidence for indirect borrowing in these two cases, based on the work of Seifart (2015). We show that, while there is evidence that both are productive, the statistical evidence suggests that -Var processes are more likely to result in novel forms. By the same token, -Var nominalisations are more clearly represent cases of indirect borrowing, as evidenced by the greater number of types which have corresponding simplex forms, and by the greater probability that the simplex forms are more frequent than the nominalisations.
Page 1. Albert Gatt L-Universit{ ta' Malta Michael Spagnol Universität Konstanz Labile verbs in M... more Page 1. Albert Gatt L-Universit{ ta' Malta Michael Spagnol Universität Konstanz Labile verbs in Maltese Page 2. 2 Background Page 3. 3 The causative-inchoative alternation 1. Ħija kisser il-vażun 'My brother broke the vase' 2.
Abstract Recent research using the rapid serial visual presentation (RSVP) paradigm with English ... more Abstract Recent research using the rapid serial visual presentation (RSVP) paradigm with English sentences that included words with letter transpositions (eg, jugde) has shown that participants can readily reproduce the correctly spelled sentences with little cost; in contrast, there is a dramatic reading cost with root-derived Hebrew words (Velan & Frost, Psychonomic Bulletin & Review 14: 913–918, 2007, Cognition 118: 141–156, 2011).