Active learning for deep semantic parsing (original) (raw)
Related papers
Active Learning for Natural Language Parsing and Information Extraction
1999
In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, existing results for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to two non-classification tasks in natural language processing: semantic parsing and information extraction. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance for these complex tasks.
arXiv (Cornell University), 2023
Multilingual semantic parsing aims to leverage the knowledge from the high-resource languages to improve low-resource semantic parsing, yet commonly suffers from the data imbalance problem. Prior works propose to utilize the translations by either humans or machines to alleviate such issues. However, human translations are expensive, while machine translations are cheap but prone to error and bias. In this work, we propose an active learning approach that exploits the strengths of both human and machine translations by iteratively adding small batches of human translations into the machine-translated training set. Besides, we propose novel aggregated acquisition criteria that help our active learning method select utterances to be manually translated. Our experiments demonstrate that an ideal utterance selection can significantly reduce the error and bias in the translated data, resulting in higher parser accuracies than the parsers merely trained on the machine-translated data.
Active Learning for Dependency Parsing by A Committee of Parsers
2013
Data-driven dependency parsers need a large annotated corpus to learn how to generate dependency graph of a given sentence. But annotations on structured corpora are expensive to collect and requires a labor intensive task. Active learning is a machine learning approach that allows only informative examples to be selected for annotation and is usually used when the number of annotated data is abundant and acquisition of more labeled data is expensive. We will provide a novel framework in which a committee of dependency parsers collaborate to improve their efficiency using active learning techniques. Queries are made up only from uncertain tokens, and the annotations of the remaining tokens of selected sentences are voted among committee members.
Active Learning for Building a Corpus of Questions for Parsing
2010
This paper describes how we built a dependency Treebank for questions. The questions for the Treebank were drawn from questions from the TREC 10 QA task and from Yahoo! Answers. Among the uses for the corpus is to train a dependency parser achieving good accuracy on parsing questions without hurting its overall accuracy. We also explore active learning techniques to determine the suitable size for a corpus of questions in order to achieve adequate accuracy while minimizing the annotation efforts.
Ensemble-based active learning for parse selection
2004
Supervised estimation methods are widely seen as being superior to semi and fully unsupervised methods. However, supervised methods crucially rely upon training sets that need to be manually annotated. This can be very expensive, especially when skilled annotators are required. Active learning (AL) promises to help reduce this annotation cost. Within the complex domain of HPSG parse selection, we show that ideas from ensemble learning can help further reduce the cost of annotation. Our main results show that at times, an ensemble model trained with randomly sampled examples can outperform a single model trained using AL. However, converting the single-model AL method into an ensemble-based AL method shows that even this much stronger baseline model can be improved upon. Our best results show a
Active learning for HPSG parse selection
2003
We describe new features and algorithms for HPSG parse selection models and address the task of creating annotated material to train them. We evaluate the ability of several sample selection methods to reduce the number of annotated sentences necessary to achieve a given level of performance. Our best method achieves a 60% reduction in the amount of training material without any loss in accuracy.
MultiTask Active Learning for Linguistic Annotations
2008
We extend the classical single-task active learning (AL) approach. In the multi-task active learning (MTAL) paradigm, we select examples for several annotation tasks rather than for a single one as usually done in the context of AL. We introduce two MTAL metaprotocols, alternating selection and rank combination, and propose a method to implement them in practice. We experiment with a twotask annotation scenario that includes named entity and syntactic parse tree annotations on three different corpora. MTAL outperforms random selection and a stronger baseline, onesided example selection, in which one task is pursued using AL and the selected examples are provided also to the other task.
Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning
Proceedings of the 28th International Conference on Computational Linguistics, 2020
Recently, leveraging pre-trained Transformer based language models in down stream, task specific models has advanced state of the art results in natural language understanding tasks. However, only a little research has explored the suitability of this approach in low resource settings with less than 1,000 training data points. In this work, we explore fine-tuning methods of BERT-a pre-trained Transformer based language model-by utilizing pool-based active learning to speed up training while keeping the cost of labeling new data constant. Our experimental results on the GLUE data set show an advantage in model performance by maximizing the approximate knowledge gain of the model when querying from the pool of unlabeled data. Finally, we demonstrate and analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters, making it more suitable for low-resource settings.
2016
This paper presents TextPro-AL (Active Learning for Text Processing), a platform where human annotators can efficiently work to produce high quality training data for new domains and new languages exploiting Active Learning methodologies. TextPro-AL is a web-based application integrating four components: a machine learning based NLP pipeline, an annotation editor for task definition and text annotations, an incremental re-training procedure based on active learning selection from a large pool of unannotated data, and a graphical visualization of the learning status of the system.
Semantic Parsing with Dual Learning
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Semantic parsing converts natural language queries into structured logical forms. The paucity of annotated training samples is a fundamental challenge in this field. In this work, we develop a semantic parsing framework with the dual learning algorithm, which enables a semantic parser to make full use of data (labeled and even unlabeled) through a dual-learning game. This game between a primal model (semantic parsing) and a dual model (logical form to query) forces them to regularize each other, and can achieve feedback signals from some prior-knowledge. By utilizing the prior-knowledge of logical form structures, we propose a novel reward signal at the surface and semantic levels which tends to generate complete and reasonable logical forms. Experimental results show that our approach achieves new state-of-the-art performance on ATIS dataset and gets competitive performance on OVERNIGHT dataset.