Aurelie Herbelot | Universidad Pompeu Fabra (original) (raw)

Uploads

Papers by Aurelie Herbelot

Research paper thumbnail of Underquantification: an application to mass terms

This paper claims that all subject noun phrases can be given a unique underspecified... more This paper claims that all subject noun phrases can be given a unique underspecified formalisation in terms of quantification (which we term underquantification). The claim has consequences for our ontological and linguistic understanding of mass terms. We suggest
that it is possible to categorise mass terms in relation to divisibility
: whether we can findelementary parts for them which are the same nature as their whole (a drop of water is water, so water is divisible; it is perhaps less clear what a part of progress is). We argue for an underspecified logical form which can be applied to both divisible and indivisible noun phrases (although the domain of quantification is not the same in both cases), satisfying our requirement for a unique quantificational formalisation of count nouns and mass terms.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Predictability of Distributional Semantics in Derivational Word Formation

Compositional distributional semantic models (CDSMs) have successfully been applied to the task o... more Compositional distributional semantic models (CDSMs) have successfully been applied to the task of predicting the meaning of a range of linguistic constructions. Their performance on semi-compositional word formation process of (morphological) derivation, however, has been extremely variable, with no large-scale empirical investigation to date. This paper fills that gap, performing an analysis of CDSM predictions on a large dataset (over 30,000 German derivationally related word pairs). We use linear regression models to analyze CDSM performance and obtain insights into the linguistic factors that influence how predictable the distributional context of a derived word is going to be. We identify various such factors, notably part of speech, argument structure, and semantic regularity.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of 'Calling on the classical phone': a distributional model of adjective-noun errors in learners' English

In this paper we discuss three key points related to error detection (ED) in learners' English. W... more In this paper we discuss three key points related to error detection (ED) in learners' English. We focus on content word ED as one of the most challenging tasks in this area, illustrating our claims on adjective–noun (AN) combinations. In particular, we (1) investigate the role of context in accurately capturing semantic anomalies and implement a system based on distributional topic coherence, which achieves state-of-the-art accuracy on a standard test set; (2) thoroughly investigate our system's performance across individual adjective classes, concluding that a class-dependent approach is beneficial to the task; (3) discuss the data size bottleneck in this area, and highlight the challenges of automatic error generation for content words.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of " Look, some green circles! " : Learning to quantify from images

In this paper, we investigate whether a neural network model can learn the meaning of natural lan... more In this paper, we investigate whether a neural network model can learn the meaning of natural language quantifiers (no, some and all) from their use in visual contexts. We show that memory networks perform well in this task, and that explicit counting is not necessary to the system's performance, supporting psycholinguistic evidence on the acquisition of quantifiers.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Building a shared world: Mapping distributional to model-theoretic semantic spaces

In this paper, we introduce an approach to automatically map a standard distributional semantic s... more In this paper, we introduce an approach to automatically map a standard distributional semantic space onto a set-theoretic model. We predict that there is a functional relationship between distributional information and vectorial concept representations in which dimensions are predicates and weights are generalised quantifiers. In order to test our prediction, we learn a model of such relationship over a publicly available dataset of feature norms
annotated with natural language quantifiers. Our initial experimental results show that, at least for domain-specific data, we can indeed map between formalisms, and generate high-quality
vector representations which encapsulate set overlap information.
We further investigate the generation of natural language quantifiers from such vectors.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Mr Darcy and Mr Toad, gentlemen: distributional names and their kinds

Bookmarks Related papers MentionsView impact

Research paper thumbnail of The semantics of poetry: a distributional reading

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Measuring semantic content in distributional vectors

Bookmarks Related papers MentionsView impact

Research paper thumbnail of What is in a text, what isn’t, and what this has to do with lexical semantics

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Formalising and specifying underquantification

Page 1. Formalising and specifying underquantification Aurelie Herbelot University of Cambridge a... more Page 1. Formalising and specifying underquantification Aurelie Herbelot University of Cambridge ah433@cam.ac.uk ... Instead of talking of ambiguous quantification, we will talk of underspecified quantification, or underquantifi-cation. ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of "Distributional techniques for philosophical enquiry" (with Aurélie Herbelot and Johanna Müller) in: Proceedings of the 6th European Association for Computational Linguistics Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, Avignon 2012.

This paper illustrates the use of distributional techniques, as investigated in computational sem... more This paper illustrates the use of distributional techniques, as investigated in computational semantics, for supplying data from large-scale corpora to areas of the humanities which focus on the analysis of concepts. We suggest that the distributional notion of ‘characteristic context’ can be seen as evidence for some representative tendencies of general discourse. We present a case study where distributional data is used by philosophers working in the areas of gen- der studies and intersectionality as confirmation of certain trends described in previous work. Further, we highlight that different models of phrasal distributions can be compared to support the claim of intersectionality theory that ‘there is more to a phrase than the intersection of its parts’.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Finding Word Substitutions Using a Distributional Similarity Baseline and Immediate Context Overlap

This paper deals with the task of finding generally applicable substitutions for a given input te... more This paper deals with the task of finding generally applicable substitutions for a given input term. We show that the output of a distributional similarity system baseline can be filtered to obtain terms that are not simply similar but frequently substitutable. Our filter relies on the fact that when two terms are in a common entailment relation, it should be possible to substitute one for the other in their most frequent surface contexts. Using the Google
5-gram corpus to find such characteristic contexts, we show that for the given task, our filter improves the precision of a distributional similarity system from 41% to 56% on a test set comprising common transitive verbs.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Acquiring Ontological Relationships from Wikipedia using RMRS

We investigate the extraction of ontologies from biological text using a semantic representation ... more We investigate the extraction of ontologies from biological text using a semantic representation derived from a robust parser. The use of a semantic representation avoids the problems that traditional pattern-based approaches have with complex syntactic constructions and long-distance dependencies. The discovery of taxonomic relationships is explored in a corpus consisting of 12,200 animal-related articles from the online encyclopaedia Wikipedia. The semantic representation used is Robust Minimal Recursion Semantics (RMRS). Initial experiments show good results in systematising extraction across a variety of hyponymic
constructions.

Bookmarks Related papers MentionsView impact

Book chapters by Aurelie Herbelot

Research paper thumbnail of Intuitions and illusions: From explanation and experiment to assessment

This paper pioneers the use of methods and findings from psycholinguistics in experimental philos... more This paper pioneers the use of methods and findings from psycholinguistics in experimental philosophy’s ‘sources project’. On this basis, it clarifies the epistemological relevance of empirical findings about intuitions – a key methodological challenge to experimental philosophy. The sources project (aka ‘cognitive epistemology of intuitions’) seeks to develop psychological explanations of philosophically relevant intuitions, which help us assess their evidentiary value. One approach seeks explanations which trace relevant intuitions back to automatic cognitive processes that are generally reliable but predictably generate cognitive illusions under specific vitiating circumstances. The paper develops and experimentally tests such an explanation for intuitions at the root of a historically influential paradox about perception (‘argument from illusion’). The explanation traces these intuitions to stereotype-driven amplification, an automatic process routinely involved in language comprehension (e.g., understanding philosophical case-descriptions). Distributional semantics analysis and a forced-choice plausibility ranking task are employed to establish the relevant verb-associated stereotypes. The paper argues that the inferences facilitated by these stereotypes are generally reliable, but shows that vitiating circumstances obtain in the formulation of the targeted paradox. On this basis, the paper explores two complementary strategies for assessing the evidentiary value of intuitive judgments.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of  Annotating Genericity: How Do Humans Decide? (A Case Study in Ontology Extraction)

Bookmarks Related papers MentionsView impact

Technical reports by Aurelie Herbelot

Research paper thumbnail of Largescale syntactic processing: Parsing the web

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Underquantification: an application to mass terms

This paper claims that all subject noun phrases can be given a unique underspecified... more This paper claims that all subject noun phrases can be given a unique underspecified formalisation in terms of quantification (which we term underquantification). The claim has consequences for our ontological and linguistic understanding of mass terms. We suggest
that it is possible to categorise mass terms in relation to divisibility
: whether we can findelementary parts for them which are the same nature as their whole (a drop of water is water, so water is divisible; it is perhaps less clear what a part of progress is). We argue for an underspecified logical form which can be applied to both divisible and indivisible noun phrases (although the domain of quantification is not the same in both cases), satisfying our requirement for a unique quantificational formalisation of count nouns and mass terms.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Predictability of Distributional Semantics in Derivational Word Formation

Compositional distributional semantic models (CDSMs) have successfully been applied to the task o... more Compositional distributional semantic models (CDSMs) have successfully been applied to the task of predicting the meaning of a range of linguistic constructions. Their performance on semi-compositional word formation process of (morphological) derivation, however, has been extremely variable, with no large-scale empirical investigation to date. This paper fills that gap, performing an analysis of CDSM predictions on a large dataset (over 30,000 German derivationally related word pairs). We use linear regression models to analyze CDSM performance and obtain insights into the linguistic factors that influence how predictable the distributional context of a derived word is going to be. We identify various such factors, notably part of speech, argument structure, and semantic regularity.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of 'Calling on the classical phone': a distributional model of adjective-noun errors in learners' English

In this paper we discuss three key points related to error detection (ED) in learners' English. W... more In this paper we discuss three key points related to error detection (ED) in learners' English. We focus on content word ED as one of the most challenging tasks in this area, illustrating our claims on adjective–noun (AN) combinations. In particular, we (1) investigate the role of context in accurately capturing semantic anomalies and implement a system based on distributional topic coherence, which achieves state-of-the-art accuracy on a standard test set; (2) thoroughly investigate our system's performance across individual adjective classes, concluding that a class-dependent approach is beneficial to the task; (3) discuss the data size bottleneck in this area, and highlight the challenges of automatic error generation for content words.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of " Look, some green circles! " : Learning to quantify from images

In this paper, we investigate whether a neural network model can learn the meaning of natural lan... more In this paper, we investigate whether a neural network model can learn the meaning of natural language quantifiers (no, some and all) from their use in visual contexts. We show that memory networks perform well in this task, and that explicit counting is not necessary to the system's performance, supporting psycholinguistic evidence on the acquisition of quantifiers.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Building a shared world: Mapping distributional to model-theoretic semantic spaces

In this paper, we introduce an approach to automatically map a standard distributional semantic s... more In this paper, we introduce an approach to automatically map a standard distributional semantic space onto a set-theoretic model. We predict that there is a functional relationship between distributional information and vectorial concept representations in which dimensions are predicates and weights are generalised quantifiers. In order to test our prediction, we learn a model of such relationship over a publicly available dataset of feature norms
annotated with natural language quantifiers. Our initial experimental results show that, at least for domain-specific data, we can indeed map between formalisms, and generate high-quality
vector representations which encapsulate set overlap information.
We further investigate the generation of natural language quantifiers from such vectors.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Mr Darcy and Mr Toad, gentlemen: distributional names and their kinds

Bookmarks Related papers MentionsView impact

Research paper thumbnail of The semantics of poetry: a distributional reading

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Measuring semantic content in distributional vectors

Bookmarks Related papers MentionsView impact

Research paper thumbnail of What is in a text, what isn’t, and what this has to do with lexical semantics

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Formalising and specifying underquantification

Page 1. Formalising and specifying underquantification Aurelie Herbelot University of Cambridge a... more Page 1. Formalising and specifying underquantification Aurelie Herbelot University of Cambridge ah433@cam.ac.uk ... Instead of talking of ambiguous quantification, we will talk of underspecified quantification, or underquantifi-cation. ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of "Distributional techniques for philosophical enquiry" (with Aurélie Herbelot and Johanna Müller) in: Proceedings of the 6th European Association for Computational Linguistics Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, Avignon 2012.

This paper illustrates the use of distributional techniques, as investigated in computational sem... more This paper illustrates the use of distributional techniques, as investigated in computational semantics, for supplying data from large-scale corpora to areas of the humanities which focus on the analysis of concepts. We suggest that the distributional notion of ‘characteristic context’ can be seen as evidence for some representative tendencies of general discourse. We present a case study where distributional data is used by philosophers working in the areas of gen- der studies and intersectionality as confirmation of certain trends described in previous work. Further, we highlight that different models of phrasal distributions can be compared to support the claim of intersectionality theory that ‘there is more to a phrase than the intersection of its parts’.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Finding Word Substitutions Using a Distributional Similarity Baseline and Immediate Context Overlap

This paper deals with the task of finding generally applicable substitutions for a given input te... more This paper deals with the task of finding generally applicable substitutions for a given input term. We show that the output of a distributional similarity system baseline can be filtered to obtain terms that are not simply similar but frequently substitutable. Our filter relies on the fact that when two terms are in a common entailment relation, it should be possible to substitute one for the other in their most frequent surface contexts. Using the Google
5-gram corpus to find such characteristic contexts, we show that for the given task, our filter improves the precision of a distributional similarity system from 41% to 56% on a test set comprising common transitive verbs.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Acquiring Ontological Relationships from Wikipedia using RMRS

We investigate the extraction of ontologies from biological text using a semantic representation ... more We investigate the extraction of ontologies from biological text using a semantic representation derived from a robust parser. The use of a semantic representation avoids the problems that traditional pattern-based approaches have with complex syntactic constructions and long-distance dependencies. The discovery of taxonomic relationships is explored in a corpus consisting of 12,200 animal-related articles from the online encyclopaedia Wikipedia. The semantic representation used is Robust Minimal Recursion Semantics (RMRS). Initial experiments show good results in systematising extraction across a variety of hyponymic
constructions.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Intuitions and illusions: From explanation and experiment to assessment

This paper pioneers the use of methods and findings from psycholinguistics in experimental philos... more This paper pioneers the use of methods and findings from psycholinguistics in experimental philosophy’s ‘sources project’. On this basis, it clarifies the epistemological relevance of empirical findings about intuitions – a key methodological challenge to experimental philosophy. The sources project (aka ‘cognitive epistemology of intuitions’) seeks to develop psychological explanations of philosophically relevant intuitions, which help us assess their evidentiary value. One approach seeks explanations which trace relevant intuitions back to automatic cognitive processes that are generally reliable but predictably generate cognitive illusions under specific vitiating circumstances. The paper develops and experimentally tests such an explanation for intuitions at the root of a historically influential paradox about perception (‘argument from illusion’). The explanation traces these intuitions to stereotype-driven amplification, an automatic process routinely involved in language comprehension (e.g., understanding philosophical case-descriptions). Distributional semantics analysis and a forced-choice plausibility ranking task are employed to establish the relevant verb-associated stereotypes. The paper argues that the inferences facilitated by these stereotypes are generally reliable, but shows that vitiating circumstances obtain in the formulation of the targeted paradox. On this basis, the paper explores two complementary strategies for assessing the evidentiary value of intuitive judgments.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of  Annotating Genericity: How Do Humans Decide? (A Case Study in Ontology Extraction)

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Largescale syntactic processing: Parsing the web

Bookmarks Related papers MentionsView impact