Lexical Sense Alignment using Weighted Bipartite b-Matching (original) (raw)

Steps Toward the Alignment of Complementary Lexical Resources and Knowledge Databases

Since 1997, the FrameNet project at the International Computer Science Institute has been developing a uniquely detailed lexicon of English based on Frame Semantics and manually annotated examples from a balanced corpus, and has distributed copies of the lexicon and the annotations to a wide variety of researchers in natural language processing. This contract funded a meeting to evaluate the status of the project and suggest ways in which it could be more useful to the government and other clients. Based on the suggestions of the evaluators, the FrameNet team used the opportunity to increase the coverage of the lexicon, to rapidly add new frame relations and semantic types, to develop software and techniques for full-text annotation, to improve the public website, documentation, data consistency and data distribution system, and to plan for more a more automated workflow and closer connections to WordNet and other knowledge resources, depending on future funding and collaboration wi...

WordNet–Wikipedia–Wiktionary: Construction of a three-way alignment

Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), 2014

The coverage and quality of conceptual information contained in lexical semantic resources is crucial for many tasks in natural language processing. Automatic alignment of complementary resources is one way of improving this coverage and quality; however, past attempts have always been between pairs of specific resources. In this paper we establish some set-theoretic conventions for describing concepts and their alignments, and use them to describe a method for automatically constructing n-way alignments from arbitrary pairwise alignments. We apply this technique to the production of a three-way alignment from previously published WordNet--Wikipedia and WordNet--Wiktionary alignments. We then present a quantitative and informal qualitative analysis of the aligned resource. The three-way alignment was found to have greater coverage, an enriched sense representation, and coarser sense granularity than both the original resources and their pairwise alignments, though this came at the cost of accuracy. An evaluation of the induced word sense clusters in a word sense disambiguation task showed that they were no better than random clusters of equivalent granularity. However, use of the alignments to enrich a sense inventory with additional sense glosses did significantly improve the performance of a baseline knowledge-based WSD algorithm.

Monolingual Word Sense Alignment as a Classification Problem

2021

Words are defined based on their meanings in various ways in different resources. Aligning word senses across monolingual lexicographic resources increases domain coverage and enables integration and incorporation of data. In this paper, we explore the application of classification methods using manually-extracted features along with representation learning techniques in the task of word sense alignment and semantic relationship detection. We demonstrate that the performance of classification methods dramatically varies based on the type of semantic relationships due to the nature of the task but outperforms the previous experiments.

From Word Alignment to Word Senses, via Multilingual Wordnets

2006

Most of the successful commercial applications in language processing (text and/or speech) dispense of any explicit concern on semantics, with the usual motivations stemming from the computational high costs required, in case of large volumes of data, for dealing with semantics. With recent advances in corpus linguistics and statistical-based methods in NLP, revealing useful semantic features of linguistic data is becoming cheaper and cheaper and the accuracy of this process is steadily improving. Lately, there seems to be a growing acceptance of the idea that multilingual lexical ontologies might be the key towards aligning different views on the semantic atomic units to be used in characterizing the general meaning of various and multilingual documents. Depending on the granularity at which semantic distinctions are necessary, the accuracy of the basic semantic processing (such as word sense disambiguation) can be very high with relatively low complexity computing. The paper substantiates this statement by presenting a statistical/based system for word alignment and word sense disambiguation in parallel corpora. We describe a word alignment platform which ensures text pre-processing (tokenization, POS-tagging, lemmatization, chunking, sentence and word alignment) as required by an accurate word sense disambiguation.

Automatic Domain Assignment for Word Sense Alignment.

This paper reports on the development of a hy- brid and simple method based on a machine learning classifier (Naive Bayes), Word Sense Disambiguation and rules, for the automatic assignment of WordNet Domains to nominal entries of a lexicographic dictionary, the Senso Comune De Mauro Lexicon. The system ob- tained an F1 score of 0.58, with a Precision of 0.70. We further used the automatically as- signed domains to filter out word sense align- ments between MultiWordNet and Senso Co- mune. This has led to an improvement in the quality of the sense alignments showing the validity of the approach for domain assign- ment and the importance of domain informa- tion for achieving good sense alignments.

A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment

2020

Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.

CORPUS ALIGNMENT FOR WORD SENSE DISAMBIGUATION

Machine translation convert one language to another language. Anusaaraka is a machine translation, which is an English to Indian language accessing software. Anusaaraka is a Natural Language Processing (NLP) Research and Development project undertaken by Chinmaya International Foundation (CIF). When any machine do that work they need big parallel corpus that can help for making some rules and disambiguate many senses. It is following hybrid approach but we are working on rule based approach. For this approach we needed big parallel aligned corpus. In this paper we discuss how we collect parallel corpus with the help of some shell scripts, some programs, some tool kit and other things.

Ontology Matching using BabelNet Dictionary and Word Sense Disambiguation Algorithms

Indonesian Journal of Electrical Engineering and Computer Science, 2017

Ontology matching is a discipline that means two things: first, the process of discovering correspondences between two different ontologies, and second is the result of this process, that is to say the expression of correspondences. This discipline is a crucial task to solve problems merging and evolving of heterogeneous ontologies in applications of the Semantic Web. This domain imposes several challenges, among them, the selection of appropriate similarity measures to discover the correspondences. In this article, we are interested to study algorithms that calculate the semantic similarity by using Adapted Lesk algorithm, Wu & Palmer Algorithm, Resnik Algorithm, Leacock and Chodorow Algorithm, and similarity flooding between two ontologies and BabelNet as reference ontology, we implement them, and compared experimentally. Overall, the most effective methods are Wu & Palmer and Adapted Lesk, which is widely used for Word Sense Disambiguation (WSD) in the field of Automatic Natural ...

Lrec 2012 Workshop on Language Resource Merging Workshop Programme Merging Heterogeneous Resources and Tools in a Digital Library Workshop Organizers/organizing Committee Workshop Programme Committee How to Uby – a Large-scale Unified Lexical-semantic Resource Merging Lexicons for Higher Precision S

2012

The talk will present UBY, a large-scale resource integration project based on the Lexical Markup Framework (LMF, ISO 24613:2008). Currently, nine lexicons in two languages (English and German) have been integrated: WordNet, GermaNet, FrameNet, VerbNet, Wikipedia (DE/EN), Wiktionary (DE/EN), and OmegaWiki. All resources have been mapped to the LMF-based model and imported into an SQL-DB. The UBY-API, a common Java software library, provides access to all data in the database. The nine lexicons are densely interlinked using monolingual and cross-lingual sense alignments. These sense alignments yield enriched sense representations and increased coverage. A sense alignment framework has been developed for automatically aligning any pair of resources monoor cross-lingually. As an example, the talk will report on the automatic alignment of WordNet and Wiktionary. Further information on UBY and UBY-API is available at: http://www.ukp.tu-darmstadt.de/data/lexical-resources/uby/.

A Methodology for Large-Scale, Disambiguated and Unbiased Lexical Knowledge Acquisition Based on Multilingual Word Alignment

Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

In order to be concretely effective, many NLP applications require the availability of lexical resources providing varied, broadly shared, and language-unbounded lexical information. However, state-ofthe-art knowledge models rarely adopt such a comprehensive and cross-lingual approach to semantics. In this paper, we propose a novel automatable methodology for knowledge modeling based on a multilingual word alignment mechanism that enhances the encoding of unbiased and naturally disambiguated lexical knowledge. Results from a simple implementation of the proposal show relevant outcomes that are not found in other resources.