PaVeDa - Pavia Verbs Database: Challenges and Perspectives (original) (raw)

Introducing PaVeDa -Pavia Verbs Database: Valency Patterns and Pattern Comparison in Ancient Indo-European Languages

The paper introduces PaVeDa (Pavia Verbs Database), a resource that builds on the ValPaL database of verbs' valency patterns and alternations by adding a number of ancient languages (completely absent from ValPaL) and a number of new features that enable direct comparison, both diachronic and synchronic. For each verb, ValPaL contains the basic frame and ideally all possible valency alternations allowed by the verb (e.g. passive, causative, reflexive etc.). In order to enable comparison among alternations, an additional level has been added, the alternation class, that overcomes the issue of comparing language specific alternations which were added by individual contributors of ValPaL. The ValPaL had as its main aim typological comparison, and data collection was variously carried out using questionnaires, secondary sources and largely drawing on native speaker intuition by contributors. Working with ancient languages entails a methodological change, as the data is extracted from corpora. This has led to rethinking the notion of valency as a usage-based feature of verbs and to planning future addition of corpus data to modern languages in the database. It further shows the impact of ancient languages on theoretical reflection.

Digital databases for the study of verb argument realization in diachrony

The aim of our study is to show how two new and complementary digital databases for Ancient Greek syntax work. In order to do so, we run a case study on the change in second argument realization (see Levin 1993; Levin and Rappaport Hovav 2005) of three Ancient Greek verbs in diachrony. The first database is focused on Homeric verbs based on the Homeric texts lemmatized and morpho-syntactically annotated within the Perseus Project (, which includes all argument realizations attested in Homer for each verb. The second one is the REGLA (Rección y complementación del griego antiguo y del latín) Ancient Greek and Latin database (Torrego et al. 2007), a work in progress based on a selection of Classical authors which contains the argument realizations and the semantic classification of some of the most frequent Ancient Greek and Latin verbs. The two databases are the product of two different methodologies. The Homeric database is the result of the semi-automatic extraction of argument realizations; it explores and exploits the information present at the morpho-syntactic layer of annotation, and contains a list of all verbs attested in Homer with the correspondent list of arguments and their realization. The Classical authors database, differently, is not based on a morpho-syntactically pre-annotated corpus; it collects all occurrences of some of the most frequent verbs of the Greek classical literature in a selection of some classical authors. For each verb all attested argument realizations are given and an attempt for a semantic classification is provided. In addition, in this database one can find semantic information about the arguments and a final argument structure template will be available for each verb when the work will be completed. Our paper aims to show how, in spite of the differences, it is possible to use these two databases in a complementary way in order to study the verb argument structure and realization in Ancient Greek in diachrony. We will focus, in particular, on second argument case marking in Homer and its change in diachrony: we will analyze three semantically different verbs (ágamai ‘admire, be jealous’, antiboléō ‘meet, be present’ and methiḗmi ‘dismiss, let go’) which show in Homer case alternation for the second argument, respectively Acc(usative)/Dat(ive), Gen(itive)/Dat(ive) and Acc(usative)/Gen(itive). Starting from these case alternations attested in Homer, we study the syntactic behavior of these verbs in the post-Homeric Greek (6th, 5th and 4th century BC) using the second database. Thus, we explore the syntactic productivity in diachrony (Barđdal 2008). References: BARĐDAL, J., 2008. Productivity. Evidence from Case and Argument Structure in Icelandic. Amsterdam/Philadelphia: John Benjamins Publishing Company. LEVIN, B., 1993. English verb classes and alternation. A preliminary investigation. Chicago: The University of Chicago Press. LEVIN, B. AND M. RAPPAPORT HOVAV, 2005. Argument realization. Cambridge: Cambridge University Press Perseus Digital Library. Ed. Gregory R. Crane. Tufts University. http://www.perseus.tufts.edu (accessed APRIL 17, 2012) TORREGO, M.E., J.M. BAÑOS, C. CABRILLANA AND J.V. MÉNDEZ DOSUNA, 2007. Praedicativa II: esquemas de complementación verbal en griego antiguo y en latín. Zaragoza: Universidad de Zaragoza.

Argument structures, verb patterns and dictionaries

Atti Del Xii Congresso Internazionale Di Lessicografia Torino 6 9 Settembre 2006 Vol 2 2006 Isbn 88 7694 918 6 Pags 1169 1180, 2006

This paper addresses the following two questions: 1) how much can research on the interactions between the syntactic behaviour of verbs and their lexical semantic properties be relevant from a lexicographic point of view?; 2) how far can the integration of lexicological research and lexicographic practise go in this respect? After pointing out some of the main difficulties that theoretical studies still confront, I discuss concrete problems that arise when valency-based models are adopted in the presentation of specific verb classes in Italian monolingual dictionaries. With the help of the analysis of these specific cases, I intend to draw conclusions that are valid from a general perspective.

Latin preverbs and verb argument structure: New insights from new methods

This paper presents a corpus-based study on the argument structure of Latin verbs that are prefixed with spatial preverbs. Preverbation involves prefixing verbs, and is therefore a morphological phenomenon; however, studying the argument structure of preverbed verbs is a good chance to explore the syntax-semantics and syntax-lexicon interfaces. Through a diachronic investigation of the interactions between the morpho-syntactic realisations of the arguments of preverbed verbs and their lexical-semantic properties, I aim at demonstrating the merits of an original, corpus-based quantitative approach. The results on preverbs partially support a more general trend from Latin synthetic case-based morpho-syntax to the analytic syntax of the Romance languages, although they also show that this trend is not unidirectional and linear. The source data for the analysis cover Early, Classical and Medieval Latin and are drawn from state-of-the-art computational resources for Latin.

Constructing a Multilingual Database of Verb Valence

2013

We show the initial stage of an incremental on-line multilingual valence pattern demo, presently populated with two languages, Norwegian and Ga. The procedure for establishing the Norwegian part of the valence database resides in reusing material available in the computational HPSGgrammar Norsource, which has a rich array of lexical information, in part developed from earlier existing lexical resources for Norwegian. The procedure used for Ga is based on a Toolbox lexicon for Ga, with a first stage of processing enabling its data to join the conversion strategy used for Norwegian. A common template is used for the valence information display, although neither source fills in the template completely, reflecting their original differences in content. Essential among these is the availability of example sentences illustrating each valence option for each verb – this is available for Ga, but not for Norwegian. The results are implemented but not yet widely published, serving at the mome...

Baisa, V., Moze, S., Renau, I. Multilingual CPA: linking verb patterns across languages

Proceedings of the XVII Euralex International Congress, 2016

This paper presents the results of a pilot study in linking corresponding English and Spanish verb patterns using both automatic and manual procedures. Our work is rooted in Corpus Pattern Analysis (CPA) (Hanks 2004, 2013), a corpus-driven technique that was used in the creation of existing monolingual pattern dictionaries of English and Spanish verbs, which were used in our experiment to design a gold standard of manually annotated verb pattern pairs. Research in CPA has inspired parallel projects in English, Spanish, Italian and German. Our study represents the first attempt to build a multilingual lexical resource by linking verb patterns in these languages. Verb have special difficulties related to grammar and argument structure that we do not find in other parts-of-speech, and for that reason we think that it is necessary to create a specific resource for them. After applying the automatic matching to a set of 87 Spanish verbs linked to 176 English verbs, an evaluation of a random selection of 50 of these pairs show 80% precision.

LatInfLexi: an Inflected Lexicon of Latin Verbs

Proceedings of the 5th Italian Conference on Computational Linguistics (CLiC-it 2020), 2018

English. We present a paradigm-based inflected lexicon of Latin verbs built to provide empirical evidence supporting an entropy-based estimation of the degree of uncertainty in inflectional paradigms. The lexicon contains information on the inflected forms that occupy the 254 morphologically possible paradigm cells of 3,348 verbal lexemes extracted from a frequency lexicon of Latin. The resource also includes annotation of vowel length and the frequency of each form in different epochs. Italiano. Presentiamo un lessico di forme flesse basato sui paradigmi per i verbi latini, costruito per fornire evidenza empirica che permetta di quantificare il grado di incertez-za nei paradigmi flessivi tramite l'entropia. Il lessico contiene informazioni sulle forme flesse che occupano le 254 celle possibili dal punto di vista morfologico di 3.348 lessemi verbali estratti da un dizionario frequenziale del latino. La risorsa include anche l'annotazione della lunghezza vocalica e la frequenza di ogni forma in diverse epoche.

Carving verb classes from corpora*

In this paper, we discuss some methodological problems arising from the use of corpus data for semantic verb classification. In particular, we present a computational framework to describe the distributional properties of Italian verbs using linguistic data automatically extracted from a large corpus. This information is used to build a distribution-based classification of a set of Italian verbs. Its small scale notwithstanding, this case study will provide evidence for the complex interplay between syntactic and semantic verb features.