Sister Help: Data Augmentation for Frame-Semantic Role Labeling (original) (raw)
Related papers
The dependency-parsed framenet corpus
2012
When training semantic role labeling systems, the syntax of example sentences is of particular importance. Unfortunately, for the FrameNet annotated sentences, there is no standard parsed version. The integration of the automatic parse of an annotated sentence with its semantic annotation, while conceptually straightforward, is complex in practice. We present a standard dataset that is publicly available and that can be used in future research. This dataset contains parser-generated dependency structures (with POS tags and lemmas) for all FrameNet 1.5 sentences, with nodes automatically associated with FrameNet annotations.
Semantic role labeling via FrameNet, VerbNet and PropBank
Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL - ACL '06, 2006
This article describes a robust semantic parser that uses a broad knowledge base created by interconnecting three major resources: FrameNet, VerbNet and PropBank. The FrameNet corpus contains the examples annotated with semantic roles whereas the VerbNet lexicon provides the knowledge about the syntactic behavior of the verbs. We connect VerbNet and FrameNet by mapping the FrameNet frames to the VerbNet Intersective Levin classes. The PropBank corpus, which is tightly connected to the VerbNet lexicon, is used to increase the verb coverage and also to test the effectiveness of our approach. The results indicate that our model is an interesting step towards the design of more robust semantic parsers.
Augmenting Linguistic Semi-Structured Data for Machine Learning - A Case Study using Framenet
Semantic Role Labelling (SRL) is the process of automatically finding the semantic roles of terms in a sentence. It is an essential task towards creating a machine-meaningful representation of textual information. One public linguistic resource commonly used for this task is the FrameNet Project. FrameNet is a human and machine-readable lexical database containing a considerable number of annotated sentences, those annotations link sentence fragments to semantic frames. However, while the annotations across all the documents covered in the dataset link to most of the frames, a large group of frames lack annotations in the documents pointing to them. In this paper, we present a data augmentation method for FrameNet documents that increases by over 13% the total number of annotations. Our approach relies on lexical, syntactic, and semantic aspects of the sentences to provide additional annotations. We evaluate the proposed augmentation method by comparing the performance of a state-of-the-art semantic-role-labelling system, trained using a dataset with and without augmentation.
Evaluating FrameNet-style semantic parsing: the role of coverage gaps in FrameNet
International Conference on Computational Linguistics, 2010
Supervised semantic role labeling (SRL) systems are generally claimed to have accuracies in the range of 80% and higher . These numbers, though, are the result of highly-restricted evaluations, i.e., typically evaluating on hand-picked lemmas for which training data is available. In this paper we consider performance of such systems when we evaluate at the document level rather than on the lemma level. While it is wellknown that coverage gaps exist in the resources available for training supervised SRL systems, what we have been lacking until now is an understanding of the precise nature of this coverage problem and its impact on the performance of SRL systems. We present a typology of five different types of coverage gaps in FrameNet. We then analyze the impact of the coverage gaps on performance of a supervised semantic role labeling system on full texts, showing an average oracle upper bound of 46.8%.
A comparative study on generalization of semantic roles in FrameNet
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - ACL-IJCNLP '09, 2009
A number of studies have presented machine-learning approaches to semantic role labeling with availability of corpora such as FrameNet and PropBank. These corpora define the semantic roles of predicates for each frame independently. Thus, it is crucial for the machine-learning approach to generalize semantic roles across different frames, and to increase the size of training instances. This paper explores several criteria for generalizing semantic roles in FrameNet: role hierarchy, human-understandable descriptors of roles, semantic types of filler phrases, and mappings from FrameNet roles to thematic roles of VerbNet. We also propose feature functions that naturally combine and weight these criteria, based on the training data. The experimental result of the role classification shows 19.16% and 7.42% improvements in error reduction rate and macro-averaged F1 score, respectively. We also provide in-depth analyses of the proposed criteria.
Semantic parsing based on framenet
2004
This paper describes our method based on Support Vector Machines for automatically assigning semantic roles to constituents of English sentences. This method employs four different feature sets, one of which being first reported herein. The combination of features as well as the extended training data we considered have produced in the Senseval-3 experiments an F1-score of 92.5% for the unrestricted case and of 76.3% for the restricted case. 1 The second classification represents the detection of role boundaries. The semantic parsing defined as two different classification tasks was introduced in (Gildea and Jurasfky, 2002).
The effect of syntactic representation on semantic role labeling
Proceedings of the 22nd International Conference on Computational Linguistics - COLING '08, 2008
Almost all automatic semantic role labeling (SRL) systems rely on a preliminary parsing step that derives a syntactic structure from the sentence being analyzed. This makes the choice of syntactic representation an essential design decision. In this paper, we study the influence of syntactic representation on the performance of SRL systems. Specifically, we compare constituent-based and dependencybased representations for SRL of English in the FrameNet paradigm. Contrary to previous claims, our results demonstrate that the systems based on dependencies perform roughly as well as those based on constituents: For the argument classification task, dependencybased systems perform slightly higher on average, while the opposite holds for the argument identification task. This is remarkable because dependency parsers are still in their infancy while constituent parsing is more mature. Furthermore, the results show that dependency-based semantic role classifiers rely less on lexicalized features, which makes them more robust to domain changes and makes them learn more efficiently with respect to the amount of training data.
Generalization of Semantic Roles in Automatic Semantic Role Labeling
Journal of Natural Language Processing, 2014
Numerous studies have applied machine-learning approaches to semantic role labeling with the availability of corpora such as FrameNet and PropBank. These corpora define frame-specific semantic roles for each frame, which are problematic for a machinelearning approach because the corpus contains a number of infrequent roles that hinder efficient learning. This paper focuses on the generalization problem of semantic roles in a semantic role labeling task. We compare existing generalization criteria with our novel criteria, and clarify the characteristics of each criterion. We also show that using multiple generalization criteria in a single model improves the performance of a semantic role classification. In experiments on FrameNet, we achieved 19.16% error reduction in terms of total accuracy, and 7.42% in macro-averaged F1. On PropBank, we reduced 24.07% of errors in total accuracy, and 26.39% of errors in the evaluation for unseen verbs.
Putting FrameNet data into the ISO linguistic annotation framework
Proceedings of the ACL 2003 workshop on Linguistic annotation getting the model right -, 2003
This paper describes FrameNet , an online lexical resource for English based on the principles of frame semantics , and considers the FrameNet database in reference to the proposed ISO model for linguistic annotation of language resources (ISO TC37 SC4 ) . We provide a data category specification for frame semantics and FrameNet annotations in an RDF-based language. More specifically, we provide a DAML+OIL markup for lexical units, defined as a relation between a lemma and a semantic frame, and frame-to-frame relations, namely Inheritance and Subframes. The paper includes simple examples of FrameNet annotated sentences in an XML/RDF format that references the project-specific data category specification.
Towards Robust Semantic Role Labeling
Computational Linguistics, 2008
Most research on semantic role labeling (SRL) has been focused on training and evaluating on the same corpus in order to develop the technology. This strategy, while appropriate for initiating research, can lead to over-training to the particular corpus. The work presented in this paper focuses on analyzing the robustness of an SRL system when trained on one genre of data and used to label a different genre. Our state-of-the-art semantic role labeling system, while performing well on WSJ test data, shows significant performance degradation when applied to data from the Brown corpus. We present a series of experiments designed to investigate the source of this lack of portability. These experiments are based on comparisons of performance using PropBanked WSJ data and PropBanked Brown corpus data. Our results indicate that while syntactic parses and argument identification port relatively well to a new genre, argument classification does not. Our analysis of the reasons for this is presented and generally point to the nature of the more lexical/semantic features dominating the classification task and general structural features dominating the argument identification task.