Confidence driven unsupervised semantic parsing (original) (raw)

Exploiting the Semantic Web for Unsupervised Natural Language Semantic Parsing

In this paper, we propose to bring together the semantic web experience and statistical natural language semantic parsing modeling. The idea is that, the process for populating knowledgebases by semantically parsing structured web pages may provide very valuable implicit annotation for language understanding tasks. We mine search queries hitting to these web pages in order to semantically annotate them for building statistical unsupervised slot filling models, without even a need for a semantic annotation guideline. We present promising results demonstrating this idea for building an unsupervised slot filling model for the movies domain with some representative slots. Furthermore, we also employ unsupervised model adaptation for cases when there are some in-domain unannotated sentences available. Another key contribution of this work is using implicitly annotated natural-language-like queries for testing the performance of the models, in a totally unsupervised fashion. We believe, such an approach also ensures consistent semantic representation between the semantic parser and the backend knowledge-base.

Dependency-based Hybrid Trees for Semantic Parsing

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018

We propose a novel dependency-based hybrid tree model for semantic parsing, which converts natural language utterance into machine interpretable meaning representations. Unlike previous state-of-the-art models, the semantic information is interpreted as the latent dependency between the natural language words in our joint representation. Such dependency information can capture the interactions between the semantics and natural language words. We integrate a neural component into our model and propose an efficient dynamicprogramming algorithm to perform tractable inference. Through extensive experiments on the standard multilingual GeoQuery dataset with eight languages, we demonstrate that our proposed approach is able to achieve state-ofthe-art performance across several languages. Analysis also justifies the effectiveness of using our new dependency-based representation. 1

Context Dependent Semantic Parsing: A Survey

Proceedings of the 28th International Conference on Computational Linguistics

Semantic parsing is the task of translating natural language utterances into machine-readable meaning representations. Currently, most semantic parsing methods are not able to utilize contextual information (e.g. dialogue and comments history), which has a great potential to boost semantic parsing performance. To address this issue, context dependent semantic parsing has recently drawn a lot of attention. In this survey, we investigate progress on the methods for the context dependent semantic parsing, together with the current datasets and tasks. We then point out open problems and challenges for future research in this area. The collected resources for this topic are available at: https://github.com/zhuang-li/ Contextual-Semantic-Parsing-Paper-List.

Driving semantic parsing from the world's response

2010

Abstract Current approaches to semantic parsing, the task of converting text to a formal meaning representation, rely on annotated training data mapping sentences to logical forms. Providing this supervision is a major bottleneck in scaling semantic parsers. This paper presents a new learning paradigm aimed at alleviating the supervision burden. We develop two novel learning algorithms capable of predicting complex structures which only rely on a binary feedback signal based on the context of an external world.

Improved Fully Unsupervised Parsing with Zoomed Learning

2010

We introduce a novel training algorithm for unsupervised grammar induction, called Zoomed Learning. Given a training set T and a test set S, the goal of our algorithm is to identify subset pairs T i , S i of T and S such that when the unsupervised parser is trained on a training subset T i its results on its paired test subset S i are better than when it is trained on the entire training set T . A successful application of zoomed learning improves overall performance on the full test set S. We study our algorithm's effect on the leading algorithm for the task of fully unsupervised parsing in three different English domains, WSJ, BROWN and GENIA, and show that it improves the parser F-score by up to 4.47%.

Unsupervised Dependency Parsing: Let's Use Supervised Parsers

Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015

We present a self-training approach to unsupervised dependency parsing that reuses existing supervised and unsupervised parsing algorithms. Our approach, called 'iterated reranking' (IR), starts with dependency trees generated by an unsupervised parser, and iteratively improves these trees using the richer probability models used in supervised parsing that are in turn trained on these trees. Our system achieves 1.8% accuracy higher than the stateof-the-part parser of Spitkovsky et al. (2013) on the WSJ corpus.

Learning Structured Natural Language Representations for Semantic Parsing

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017

We introduce a neural semantic parser which is interpretable and scalable. Our model converts natural language utterances to intermediate, domain-general natural language representations in the form of predicate-argument structures, which are induced with a transition system and subsequently mapped to target domains. The semantic parser is trained end-to-end using annotated logical forms or their denotations. We achieve the state of the art on SPADES and GRAPHQUESTIONS and obtain competitive results on GEO-QUERY and WEBQUESTIONS. The induced predicate-argument structures shed light on the types of representations useful for semantic parsing and how these are different from linguistically motivated ones. 1

Effective self-training for parsing

Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics -, 2006

We present a simple, but surprisingly effective, method of self-training a twophase parser-reranker system using readily available unlabeled data. We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker. Our improved model achieves an f -score of 92.1%, an absolute 1.1% improvement (12% error reduction) over the previous best result for Wall Street Journal parsing. Finally, we provide some analysis to better understand the phenomenon.

Computational challenges in parsing by classification

2006

Abstract This paper presents a discriminative parser that does not use a generative model in any way, yet whose accuracy still surpasses a generative baseline. The parser performs feature selection incrementally during training, as opposed to a priori, which enables it to work well with minimal linguistic cleverness. The main challenge in building this parser was fitting the training data into memory. We introduce gradient sampling, which increased training speed 100-fold. Our implementation is freely available at http://nlp. cs. nyu.

Transformation-based Learning for Semantic parsing

2009

This paper presents a semantic parser that transforms an initial semantic hypothesis into the correct semantics by applying an ordered list of transformation rules. These rules are learnt automatically from a training corpus with no prior linguistic knowledge and no alignment between words and semantic concepts. The learning algorithm produces a compact set of rules which enables the parser to be very efficient while retaining high accuracy. We show that this parser is competitive with respect to the state-of-the-art semantic parsers on the ATIS and TownInfo tasks. Index Terms: spoken language understanding, semantics, natural language processing, transformation-based learning

Automatic Selection of High Quality Parses Created By a Fully Unsupervised Parser

2009

The average results obtained by unsupervised statistical parsers have greatly improved in the last few years, but on many specific sentences they are of rather low quality. The output of such parsers is becoming valuable for various applications, and it is radically less expensive to create than manually annotated training data. Hence, automatic selection of high quality parses created by unsupervised parsers is an important problem.

Using semantic resources to improve a syntactic dependency parser

2012

Probabilistic syntactic parsing has made rapid progress, but is reaching a performance ceiling. More semantic resources need to be included. We exploit a number of semantic resources to improve parsing accuracy of a dependency parser. We compare semantic lexica on this task, then we extend the back-off chain by punishing underspecified decisions. Further, a simple distributional semantics approach is tested. Selectional restrictions are employed to boost interpretations that are semantically plausible. We also show that self-training can improve parsing even without needing a re-ranker, as we can rely on a sufficiently good estimation of parsing accuracy. Parsing large amounts of data and using it in self-training allows us to learn world knowledge from the distribution of syntactic relation. We show that the performance of the parser considerably improves due to our extensions.

Enhancing Unsupervised Semantic Parsing with Distributed Contextual Representations

Findings of the Association for Computational Linguistics: ACL 2023

We extend a non-parametric Bayesian model of (Titov and Klementiev, 2011) to deal with homonymy and polysemy by leveraging distributed contextual word and phrase representations pre-trained on a large collection of unlabelled texts. Then, unsupervised semantic parsing is performed by decomposing sentences into fragments, clustering the fragments to abstract away syntactic variations of the same meaning, and predicting predicate-argument relations between the fragments. To better model the statistical dependencies between predicates and their arguments, we further conduct a hierarchical Pitman-Yor process. An improved Metropolis-Hastings merge-split sampler is proposed to speed up the mixing and convergence of Markov chains by leveraging pre-trained distributed representations. The experimental results show that the models achieve better accuracy on both question-answering and relation extraction tasks.

Improving dependency parsing with semantic classes

Proceedings of the 49th …, 2011

This paper presents the introduction of WordNet semantic classes in a dependency parser, obtaining improvements on the full Penn Treebank for the first time. We tried different combinations of some basic semantic classes and word sense disambiguation algorithms. Our experiments show that selecting the adequate combination of semantic features on development data is key for success. Given the basic nature of the semantic classes and word sense disambiguation algorithms used, we think there is ample room for future improvements.

Active learning for deep semantic parsing

ACL2018, 2018

Semantic parsing requires training data that is expensive and slow to collect. We apply active learning to both traditional and " overnight " data collection approaches. We show that it is possible to obtain good training hyperparameters from seed data which is only a small fraction of the full dataset. We show that uncertainty sampling based on least confidence score is competitive in traditional data collection but not applicable for overnight collection. We evaluate several active learning strategies for overnight data collection and show that different example selection strategies per domain perform best.

Improve Parsing Performance by Self-Learning

There are many methods to improve performances of statistical parsers. Among them, resolving structural ambiguities is a major task. In our approach, the parser produces a set of n-best trees based on a feature-extended PCFG grammar and then selects the best tree structure based on association strengths of dependency word-pairs. However, there is no sufficiently large Treebank producing reliable statistical distributions of all word-pairs. This paper aims to provide a self-learning method to resolve the problems. The word association strengths were automatically extracted and learned by parsing a giga-word corpus. Although the automatically learned word associations were not perfect, the built structure evaluation model improved the bracketed f-score from 83.09% to 86.59%. We believe that the above iterative learning processes can improve parsing performances automatically by learning word-dependence knowledge continuously from web.

Ambiguous Learning from Retrieval: Towards Zero-shot Semantic Parsing

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Current neural semantic parsers mostly take supervised approaches, which require a considerable amount of expensive training data. As a result, minimizing supervision requirements has been one of the key challenges in semantic parsing. In this paper, we propose a Retrieval as Ambiguous Supervision framework, which can effectively collect high-coverage ambiguous supervisions (i.e., the parse candidates of an utterance) via a pre-trained language modelsbased retrieval system. Then, by assuming candidates will contain the correct ones, the zeroshot task can be converted into an ambiguously supervised task. To improve the precision and coverage of such ambiguous supervision, we propose a confidence-driven self-training algorithm, in which a semantic parser is learned and exploited to disambiguate candidates iteratively. Experimental results show that our approach significantly outperforms the state-of-the-art zero-shot semantic parsing methods.

Unsupervised Dual Paraphrasing for Two-stage Semantic Parsing

2020

One daunting problem for semantic parsing is the scarcity of annotation. Aiming to reduce nontrivial human labor, we propose a two-stage semantic parsing framework, where the first stage utilizes an unsupervised paraphrase model to convert an unlabeled natural language utterance into the canonical utterance. The downstream naive semantic parser accepts the intermediate output and returns the target logical form. Furthermore, the entire training process is split into two phases: pre-training and cycle learning. Three tailored self-supervised tasks are introduced throughout training to activate the unsupervised paraphrase model. Experimental results on benchmarks Overnight and GeoGranno demonstrate that our framework is effective and compatible with supervised training.

Semantic Parsing with Dual Learning

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Semantic parsing converts natural language queries into structured logical forms. The paucity of annotated training samples is a fundamental challenge in this field. In this work, we develop a semantic parsing framework with the dual learning algorithm, which enables a semantic parser to make full use of data (labeled and even unlabeled) through a dual-learning game. This game between a primal model (semantic parsing) and a dual model (logical form to query) forces them to regularize each other, and can achieve feedback signals from some prior-knowledge. By utilizing the prior-knowledge of logical form structures, we propose a novel reward signal at the surface and semantic levels which tends to generate complete and reasonable logical forms. Experimental results show that our approach achieves new state-of-the-art performance on ATIS dataset and gets competitive performance on OVERNIGHT dataset.

Semantic parsing for high-precision semantic role labelling

Proceedings of the Twelfth Conference on Computational Natural Language Learning - CoNLL '08, 2008

In this paper, we report experiments that explore learning of syntactic and semantic representations. First, we extend a state-of-the-art statistical parser to produce a richly annotated tree that identifies and labels nodes with semantic role labels as well as syntactic labels. Secondly, we explore rule-based and learning techniques to extract predicate-argument structures from this enriched output. The learning method is competitive with previous single-system proposals for semantic role labelling, yields the best reported precision, and produces a rich output. In combination with other high recall systems it yields an F-measure of 81%.