Proceedings of The Workshop on Deep Language Processing for Quality Machine Translation (DeepLP4QMT) (original) (raw)

Developing a Deep Linguistic Databank Supporting a Collection of Treebanks: the CINTIL DeepGramBank

2010

Corpora of sentences annotated with grammatical information have been deployed by extending the basic lexical and morphological data with increasingly complex information, such as phrase constituency, syntactic functions, semantic roles, etc. As these corpora grow in size and the linguistic information to be encoded reaches higher levels of sophistication, the utilization of annotation tools and, above all, supporting computational grammars appear no longer as a matter of convenience but of necessity. In this paper, we report on the design features, the development conditions and the methodological options of a deep linguistic databank, the CINTIL DeepGramBank. In this corpus, sentences are annotated with fully fledged linguistically informed grammatical representations that are produced by a deep linguistic processing grammar, thus consistently integrating morphological, syntactic and semantic information. We also report on how such corpus permits to straightforwardly obtain a whole range of past generation annotated corpora (POS, NER and morphology), current generation treebanks (constituency treebanks, dependency banks, propbanks) and next generation databanks (logical form banks) simply by means of a very residual selection/extraction effort to get the appropriate "views" exposing the relevant layers of information.

A robust and hybrid deep-linguistic theory applied to large-scale parsing

Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data - ROMAND '04, 2004

Modern statistical parsers are robust and quite fast, but their output is relatively shallow when compared to formal grammar parsers. We suggest to extend statistical approaches to a more deep-linguistic analysis while at the same time keeping the speed and low complexity of a statistical parser. The resulting parsing architecture suggested, implemented and evaluated here is highly robust and hybrid on a number of levels, combining statistical and rule-based approaches, constituency and dependency grammar, shallow and deep processing, full and nearfull parsing. With its parsing speed of about 300,000 words per hour and state-of-the-art performance the parser is reliable for a number of large-scale applications discussed in the article.

Data-driven deep-syntactic dependency parsing

Natural Language Engineering, 2015

'Deep-syntactic' dependency structures that capture the argumentative, attributive and coordinative relations between full words of a sentence have a great potential for a number of NLP-applications. The abstraction degree of these structures is in between the output of a syntactic dependency parser (connected trees defined over all words of a sentence and language-specific grammatical functions) and the output of a semantic parser (forests of trees defined over individual lexemes or phrasal chunks and abstract semantic role labels which capture the frame structures of predicative elements and drop all attributive and coordinative dependencies). We propose a parser that provides deep-syntactic structures. The parser has been tested on Spanish, English and Chinese. † We would like to thank the reviewers for their insightful comments and Alicia Burga for her help with the revision of the paper. The work reported on in this paper has been partially funded by the European Commission under the contract numbers FP7-ICT-610411 (MULTISENSOR) and H2020-645012-RIA (KRISTINA). 1 The first language understanding approaches dealt with abstract conceptual meaning representations that could be mapped onto LFs; see, among others, Bobrow and Webber (1981), Dahlgren (1988), Kasper and Hovy (1990).

Assigning Deep Lexical Types

Lecture Notes in Computer Science, 2012

Deep linguistic grammars provide complex grammatical representations of sentences, capturing, for instance, long-distance dependencies and returning semantic representations, making them suitable for advanced natural language processing. However, they lack robustness in that they do not gracefully handle words missing from the lexicon of the grammar. Several approaches have been taken to handle this problem, one of which consists in pre-annotating the input to the grammar with shallow processing machine-learning tools. This is usually done to speed-up parsing (supertagging) but it can also be used as a way of handling unknown words in the input. These pre-processing tools, however, must be able to cope with the vast tagset required by a deep grammar. We investigate the training and evaluation of several supertaggers for a deep linguistic processing grammar and report on it in this paper.

Verb valency semantic representation for deep linguistic processing

Proceedings of the Workshop on Deep Linguistic Processing - DeepLP '07, 2007

This workshop was conceived with the aim of bringing together the different computational linguistic subcommunities which model language predominantly by way of theoretical syntax, either in the form of a particular theory (e.g. CCG, HPSG, LFG, TAG or the Prague School) or a more general framework which draws on theoretical and descriptive linguistics. We characterise this style of computational linguistic research as deep linguistic processing, due to it aspiring to model the complexity of natural language in rich linguistic representations. Aspects of this research have in the past had their own separate fora, such as the ACL 2005 workshop on deep lexical acquisition, as well as TAG+, Alpino, ParGram and DELPH-IN meetings. However, since the fundamental approach of building a linguistically-founded system, as well as many of the techniques used to engineer efficient systems, are common across these projects and independent of the specific grammar formalism chosen, we felt the need for a common meeting in which experiences could be shared among a wider community.

Workshop on Deep Learning and Neural Approaches for Linguistic Data - Book of abstracts

2021

This publication is based upon work from COST Action CA18209-European network for Web-centred linguistic data science, supported by COST (European Cooperation in Science and Technology). COST (European Cooperation in Science and Technology) is a funding agency for research and innovation networks. Our Actions help connect research initiatives across Europe and enable scientists to grow their ideas by sharing them with their peers. This boosts their research, career and innovation.

Fips, a "deep" linguistic multilingual parser

Proceedings of the Workshop on Deep Linguistic Processing - DeepLP '07, 2007

The development of robust "deep" linguistic parsers is known to be a difficult task. Few such systems can claim to satisfy the needs of large-scale NLP applications in terms of robustness, efficiency, granularity or precision. Adapting such systems to more than one language makes the task even more challenging. This paper describes some of the properties of Fips, a multilingual parsing system that has been for a number of years (and still is) under development at LATL. Based on Chomsky's generative grammar for its grammatical aspects, and on objectoriented (OO) sofware engineering techniques for its implementation, Fips is designed to efficiently parse the four Swiss "national" languages (German, French, Italian and English) to which we also added Spanish and (more recently) Greek.

Parser evaluation over local and non-local deep dependencies in a large corpus

2011

In order to obtain a fine-grained evaluation of parser accuracy over naturally occurring text, we study 100 examples each of ten reasonably frequent linguistic phenomena, randomly selected from a parsed version of the English Wikipedia. We construct a corresponding set of gold-standard target dependencies for these 1000 sentences, operationalize mappings to these targets from seven state-of-theart parsers, and evaluate the parsers against this data to measure their level of success in identifying these dependencies.

Combining shallow and deep processing for a robust, fast, deep-linguistic dependency parser

2004

This paper describes Pro3Gres, a fast, robust, broad-coverage parser that delivers deep-linguistic grammatical relation structures as output, which are closer to predicate-argument structures and more informative than pure constituency structures. The parser stays as shallow as is possible for each task, combining shallow and deep-linguistic methods by integrating chunking and by expressing the majority of long-distance dependencies in a context-free way. It combines statistical and rule-based approaches, different linguistic grammar theories and different linguistic resources. Preliminary evaluations indicate that the parsers performance is state-of-the-art.