Roberto Zamparelli | University of Trento (original) (raw)
Uploads
Papers by Roberto Zamparelli
Language Resources and Evaluation, Jan 11, 2016
The SICK data set consists of about 10,000 English sentence pairs, generated starting from two ex... more The SICK data set consists of about 10,000 English sentence pairs, generated starting from two existing sets: the 8K ImageFlickr data set and the SemEval 2012 STS MSR-Video Description data set. We randomly selected a subset of sentence pairs from each of these sources and we applied a 3-step generation process: first, the original sentences were normalized to remove unwanted linguistic phenomena; the normalized sentences were then expanded to obtain up to three new sentences with specific characteristics suitable to CDSM evaluation; as a last step, all the sentences generated in the expansion phase were paired with the normalized sentences in order to obtain the final data set. Each sentence pair was annotated for relatedness and entailment by means of crowdsourcing techniques. The <strong>sentence relatedness score</strong> (on a 5-point rating scale) provides a direct way to evaluate CDSMs, insofar as their outputs are meant to quantify the degree of semantic relatedn...
Language Resources and Evaluation, 2016
De Gruyter eBooks, Apr 11, 2016
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), 2022
Frontiers in Human Neuroscience, 2019
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2019
The Impact of Pronominal Form on Interpretation, 2016
Language Faculty and Beyond, 2020
Linguistic Issues in Language Technology, 2014
The lexicon of any natural language encodes a huge number of distinct word meanings. Just to unde... more The lexicon of any natural language encodes a huge number of distinct word meanings. Just to understand this article, you will need to know what thousands of words mean. The space of possible sentential meanings is infinite: In this article alone, you will encounter many sentences that express ideas you have never heard before, we hope. Statistical semantics has addressed the issue of the vastness of word meaning by proposing methods to harvest meaning automatically from large collections of text (corpora). Formal semantics in the Fregean tradition has developed methods to account for the infinity of sentential meaning based on the crucial insight of compositionality, the idea that meaning of sentences is built incrementally by combining the meanings of their constituents. This article sketches a new approach to semantics that brings together ideas from statistical and formal semantics to account, in parallel, for the richness of lexical meaning and the combinatorial power of senten...
Language Resources and Evaluation, Jan 11, 2016
The SICK data set consists of about 10,000 English sentence pairs, generated starting from two ex... more The SICK data set consists of about 10,000 English sentence pairs, generated starting from two existing sets: the 8K ImageFlickr data set and the SemEval 2012 STS MSR-Video Description data set. We randomly selected a subset of sentence pairs from each of these sources and we applied a 3-step generation process: first, the original sentences were normalized to remove unwanted linguistic phenomena; the normalized sentences were then expanded to obtain up to three new sentences with specific characteristics suitable to CDSM evaluation; as a last step, all the sentences generated in the expansion phase were paired with the normalized sentences in order to obtain the final data set. Each sentence pair was annotated for relatedness and entailment by means of crowdsourcing techniques. The <strong>sentence relatedness score</strong> (on a 5-point rating scale) provides a direct way to evaluate CDSMs, insofar as their outputs are meant to quantify the degree of semantic relatedn...
Language Resources and Evaluation, 2016
De Gruyter eBooks, Apr 11, 2016
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), 2022
Frontiers in Human Neuroscience, 2019
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2019
The Impact of Pronominal Form on Interpretation, 2016
Language Faculty and Beyond, 2020
Linguistic Issues in Language Technology, 2014
The lexicon of any natural language encodes a huge number of distinct word meanings. Just to unde... more The lexicon of any natural language encodes a huge number of distinct word meanings. Just to understand this article, you will need to know what thousands of words mean. The space of possible sentential meanings is infinite: In this article alone, you will encounter many sentences that express ideas you have never heard before, we hope. Statistical semantics has addressed the issue of the vastness of word meaning by proposing methods to harvest meaning automatically from large collections of text (corpora). Formal semantics in the Fregean tradition has developed methods to account for the infinity of sentential meaning based on the crucial insight of compositionality, the idea that meaning of sentences is built incrementally by combining the meanings of their constituents. This article sketches a new approach to semantics that brings together ideas from statistical and formal semantics to account, in parallel, for the richness of lexical meaning and the combinatorial power of senten...
The paper addresses the problem of nouns which have frequent mass and count uses. After a review ... more The paper addresses the problem of nouns which have frequent mass and count uses. After a review of the literature and a brief classification of possible meaning shifts, it focuses on the multifarious class of abstract nouns (e.g. "fear/fears"), analyzing their relation with kinds and the presence of an extent readings ("a certain speed" = a certain AMOUNT of speed), drawing from corpus searches and the Bochum Countability Lexicon.
Comparing various nominal structures in English, Italian and Chinese and focusing on semantic con... more Comparing various nominal structures in English, Italian and Chinese and focusing on semantic contrasts occurring in argument and predicative position we argue that the difference between `anaphoric' and `uniqueness-based' definites can be reduced to the existence of two DP projections, one denoting an individual, the other a unique/maximal property. Language tends to insert the smallest projection which (a) has the required semantic type; (b) can host all the lexical material in the numeration. The argument is based on data from possessives, predicate nominals and bare-noun coordination.