Investigating lexical substitution scoring for subtitle generation (original) (raw)

Multi-lingual dependency parsing at NAIST

Proceedings of the Tenth …, 2006

CoNLL has turned ten! With a mix of pride and amazement over how time flies, we now celebrate the tenth time that ACL's special interest group on natural language learning, SIGNLL, holds its yearly conference.

The CoNLL-2009 shared task

Proceedings of the Thirteenth Conference on Computational Natural Language Learning Shared Task - CoNLL '09, 2009

For the 11th straight year, the Conference on Computational Natural Language Learning has been accompanied by a shared task whose purpose is to promote natural language processing applications and evaluate them in a standard setting. In 2009, the shared task was dedicated to the joint parsing of syntactic and semantic dependencies in multiple langauges. This shared task combines the shared tasks of the previous five years under a unique dependency-based formalism similar to the 2008 task. In this paper, we define the shared task, describe how the data sets were created, report and analyze the results and summarize the approaches of the participating systems.

The EVALITA Dependency Parsing Task: From 2007 to 2011

Lecture Notes in Computer Science, 2013

Established in 2007, EVALITA (http://www.evalita.it) is the evaluation campaign of Natural Language Processing and Speech Technologies for the Italian language, organized around shared tasks focusing on the analysis of written and spoken language respectively. EVALITA's shared tasks are aimed at contributing to the development and dissemination of natural language resources and technologies by proposing a shared context for training and evaluation. Following the success of previous editions, we organized EVALITA 2014, the fourth evaluation campaign with the aim of continuing to provide a forum for the comparison and evaluation of research outcomes as far as Italian is concerned from both academic institutions and industrial organizations. The event has been supported by the NLP Special Interest Group of the Italian Association for Artificial Intelligence (AI*IA) and by the Italian Association of Speech Science (AISV). The novelty of this year is that the final workshop of EVALITA is co-located with the 1st Italian Conference of Computational Linguistics (CLiC-it, http://clic.humnet.unipi.it/), a new event aiming to establish a reference forum for research on Computational Linguistics of the Italian community with contributions from a wide range of disciplines going from Computational Linguistics, Linguistics and Cognitive Science to Machine Learning, Computer Science, Knowledge Representation, Information Retrieval and Digital Humanities. The co-location with CLiC-it potentially widens the potential audience of EVALITA. The final workshop, held in Pisa on the 11th December 2014 within the context of the XIII AI*IA Symposium on Artificial Intelligence (Pisa, 10-12 December 2014, http://aiia2014.di.unipi.it/), gathers the results of 8 tasks, 4 of which focusing on written language and 4 on speech technologies. In this EVALITA edition, we received 30 expressions of interest, 55 registrations and 43 actual submissions to 8 proposed tasks distributed as follows:

2 . 1 The CoNLL – X Shared Task

2012

This paper addresses the problem of optimizing the training treebank data because the size and quality of the data has always been a bottleneck for the purposes of training. In previous studies we realized that current corpora used for training machine learning–based dependency parsers contain a significant proportion of redundant information at the syntactic structure level. Since the development of such training corpora involves a big effort, we argue that an appropriate process for selecting the sentences to be included in them can result in having parsing models as accurate as the ones given when training with bigger – non optimized corpora (or alternatively, bigger accuracy for an equivalent annotation effort). This argument is supported by the results of the study we carried out, which is presented in this paper. Therefore, this paper demonstrates that the training corpora contain more information than needed for training accurate data–driven dependency parsers.

Introduction to the CoNLL-2005 shared task

Proceedings of the Ninth Conference on Computational Natural Language Learning - CONLL '05, 2005

In this paper we describe the CoNLL-2005 shared task on Semantic Role Labeling. We introduce the specification and goals of the task, describe the data sets and evaluation methods, and present a general overview of the 19 systems that have contributed to the task, providing a comparative description and results.

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, June 22-27, 2014, Baltimore, MD, USA, Volume 2: Short Papers

The Association for Computational Linguistics, 2014

This tutorial discusses a framework for incremental left-to-right structured predication, which makes use of global discriminative learning and beam-search decoding. The method has been applied to a wide range of NLP tasks in recent years, and achieved competitive accuracies and efficiencies. We give an introduction to the algorithms and efficient implementations, and discuss their applications to a range of NLP tasks.

The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages

2009

Abstract For the 11th straight year, the Conference on Computational Natural Language Learning has been accompanied by a shared task whose purpose is to promote natural language processing applications and evaluate them in a standard setting. In 2009, the shared task was dedicated to the joint parsing of syntactic and semantic dependencies in multiple languages. This shared task combines the shared tasks of the previous five years under a unique dependency-based formalism similar to the 2008 task.

The CoNLL 2007 shared task on dependency parsing

2007

The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In this paper, we define the tasks of the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results. 2 Task Definition In this section, we provide the task definitions that were used in the two tracks of the CoNLL 2007 Shard Task, the multilingual track and the domain adaptation track, together with some background and motivation for the design choices made. First of all, we give a brief description of the data format and evaluation metrics, which were common to the two tracks. 2.1 Data Format and Evaluation Metrics

Mixing and blending syntactic and semantic dependencies

Proceedings of the Twelfth Conference on Computational Natural Language Learning - CoNLL '08, 2008

Our system for the CoNLL 2008 shared task uses a set of individual parsers, a set of stand-alone semantic role labellers, and a joint system for parsing and semantic role labelling, all blended together. The system achieved a macro averaged labelled F 1score of 79.79 (WSJ 80.92, Brown 70.49) for the overall task. The labelled attachment score for syntactic dependencies was 86.63 (WSJ 87.36, Brown 80.77) and the labelled F 1-score for semantic dependencies was 72.94 (WSJ 74.47, Brown 60.18).

Experiments with a multilanguage non-projective dependency parser

Proceedings of the Tenth Conference on Computational Natural Language Learning - CoNLL-X '06, 2006

In this presentation, I will look back at 10 years of CoNLL conferences and the state of the art of machine learning of language that is evident from this decade of research. My conclusion, intended to provoke discussion, will be that we currently lack a clear motivation or "mission" to survive as a discipline. I will suggest that a new mission for the field could be found in a renewed interest for theoretical work (which learning algorithms have a bias that matches the properties of language?, what is the psycholinguistic relevance of learner design issues?), in more sophisticated comparative methodology, and in solving the problem of transfer, reusability, and adaptation of learned knowledge. ing corpus size on classifier performance for natural language processing. In HLT '01: Proceedings of the first international conference on Human language technology research, pages 1-5, Morristown, NJ, USA. Association for Computational Linguistics.

Models for improved tractability and accuracy in dependency parsing

2013

were wonderful advisors and I am truly grateful to both of them for taking me on as a student. Mitch's patience and unwavering support allowed me the freedom to find this topic for this thesis. His wide-ranging knowledge of both linguistics and computer science was incredibly valuable. Sampath inspired me to strive for clean solutions to problems. His clarity helped me to identify the crucial pieces of any result and greatly improved the material presented here. I am grateful to my thesis committee: Chris Callison-Burch, Mike Collins, Mark Liberman, and Ben Taskar. I have benefited from Chris's advice throughout graduate school, at both Penn and JHU. I have tried to emulate his clear presentation style in both my talks and my writing. Mike's work in parsing served as inspiration for me for much of the work in this thesis. His detailed questions have both improved this document and pointed the way towards future directions. Mark's questions at CLunch over the years always brought up subtle but important details. Discussions of treebank representations in this document are largely due to Mark's influence. Ben helped me identify interesting directions at a formative stage in this work. Throughout the process, Ben encouraged me to simultaneously investigate both theoretical and empirical issues. A large number of people contributed to my undertaking this dissertation and following through. Thanks especially to Jerry Berry and Stephen Rose at Thomas Jefferson High School for Science and Technology for introducing me to programming and computer science; Charles Yang, at Yale and Penn, for introducing me to computational linguistics; Dana Angluin and Brian Scassellati at Yale for encouraging me to pursue graduate school; iii and Ken Church, Dekang Lin, and Ani Nenkova for guiding me as I started research. I benefited tremendously from two summers with Ken: he patiently taught me how to write papers and how to approach problems from multiple perspectives. Dekang has been very supportive of me and my research ever since working together at a JHU CLSP Summer Workshop in 2009. Dekang introduced me to research in parsing and applications of parsing. Ani Nenkova and Annie Louis were wonderful collaborators. Annie Louis was my constant partner throughout graduate school; it was a pleasure to go through courses, research, internships, and other milestones together. Thanks to Katerina Fragkiadaki, Jenny Gillenwater, Arun Raghavan, and David Weiss for providing feedback on drafts and practice talks;

The ParisNLP entry at the ConLL UD Shared Task 2017: A Tale of a #ParsingTragedyÉric

We present the ParisNLP entry at the UD CoNLL 2017 parsing shared task. In addition to the UDpipe models provided, we built our own data-driven tokenization models, sentence segmenter and lexicon-based morphological analyzers. All of these were used with a range of different parsing models (neural or not, feature-rich or not, transition or graph-based, etc.) and the best combination for each language was selected. Unfortunately, a glitch in the shared task's Matrix led our model selector to run generic, weakly lexicalized models , tailored for surprise languages, instead of our dataset-specific models. Because of this #ParsingTragedy, we officially ranked 27th, whereas our real models finally unofficially ranked 6th.

The evalita 2011 parsing task: the dependency track

2011

The aim of Evalita Parsing Task is at defining and extending Italian state of the art parsing by encouraging the application of existing models and approaches. As in the Evalita'07 and '09, the Task is organized around two tracks, i.e. Dependency Parsing and Constituency Parsing. In this paper, we describe only the Dependency Parsing track by presenting the data sets for development and testing, and reporting the test results.

Building a large annotated corpus of English: the penn treebank

Computational Linguistics, 1994

There is a growing consensus that significant, rapid progress can be made in both text understanding and spoken language understanding by investigating those phenomena that occur most centrally in naturally occurring unconstrained materials and by attempting to automatically extract information about language from very large corpora. Such corpora are beginning to serve as important research tools for investigators in natural language processing, speech recognition, and integrated spoken language systems, as well as in theoretical linguistics. Annotated corpora promise to be valuable for enterprises as diverse as the automatic construction of statistical models for the grammar of the written and the colloquial spoken language, the development of explicit formal theories of the differing grammars of writing and speech, the investigation of prosodic phenomena in speech, and the evaluation and comparison of the adequacy of parsing models.

The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies

Proceedings of the Twelfth Conference on Computational Natural Language Learning - CoNLL '08, 2008

The Conference on Computational Natural Language Learning is accompanied every year by a shared task whose purpose is to promote natural language processing applications and evaluate them in a standard setting. In 2008 the shared task was dedicated to the joint parsing of syntactic and semantic dependencies. This shared task not only unifies the shared tasks of the previous four years under a unique dependency-based formalism, but also extends them significantly: this year's syntactic dependencies include more information such as named-entity boundaries; the semantic dependencies model roles of both verbal and nominal predicates. In this paper, we define the shared task and describe how the data sets were created. Furthermore, we report and analyze the results and describe the approaches of the participating systems.