Knowledge Graph Embedding Methods for Entity Alignment: An Experimental Review (original) (raw)

Entity Alignment For Knowledge Graphs: Progress, Challenges, and Empirical Studies

2022

Entity Alignment (EA) identifies entities across databases that refer to the same entity. Knowledge graph-based embedding methods have recently dominated EA techniques. Such methods map entities to a low-dimension space and align them based on their similarities. With the corpus of EA methodologies growing rapidly, this paper presents a comprehensive analysis of various existing EA methods, elaborating their applications and limitations. Further, we distinguish the methods based on their underlying algorithms and the information they incorporate to learn entity representations. Based on challenges in industrial datasets, we bring forward 4 research questions (RQs). These RQs empirically analyse the algorithms from the perspective of Hubness, Degree distribution, Non-isomorphic neighbourhood, and Name bias. For Hubness, where one entity turns up as the nearest neighbour of many other entities, we define an h-score to quantify its effect on the performance of various algorithms. Additionally, we try to level the playing field for algorithms that rely primarily on name-bias existing in the benchmarking open-source datasets by creating a low name bias dataset. We further create an open-source repository for 14 embedding-based EA methods and present the analysis for invoking further research motivations in the field of EA.

Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned

Lecture Notes in Computer Science

In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Variants of GCN are used in multiple state-of-the-art approaches and therefore it is important to understand the specifics and limitations of GCN-based models. Despite serious efforts, we were not able to fully reproduce the results from the original paper and after a thorough audit of the code provided by authors, we concluded, that their implementation is different from the architecture described in the paper. In addition, several tricks are required to make the model work and some of them are not very intuitive.We provide an extensive ablation study to quantify the effects these tricks and changes of architecture have on final performance. Furthermore, we examine current evaluation approaches and systematize available benchmark datasets.We believe that people interested in KG matching might profit from our work, as well as novices entering the field.

RePS: Relation, Position and Structure aware Entity Alignment

Companion Proceedings of the Web Conference 2022

Entity Alignment (EA) is the task of recognizing the same entity present in different knowledge bases. Recently, embedding-based EA techniques have established dominance where alignment is done based on closeness in latent space. Graph Neural Networks (GNN) gained popularity as the embedding module due to its ability to learn entities' representation based on their local sub-graph structures. Although GNN shows promising results, limited works have aimed to capture relations while considering their global importance and entities' relative position during EA. This paper presents Relation, Position and Structure aware Entity Alignment (RePS), a multi-faceted representation learning-based EA method that encodes local, global, and relation information for aligning entities. To capture relations and neighborhood structure, we propose a relation-based aggregation technique-Graph Relation Network (GRN) that incorporates relation importance during aggregation. To capture the position of an entity, we propose Relation-aware Position Aggregator (RPA) to capitalize entities' position in a non-Euclidean space using training labels as anchors, which provides a global view of entities. Finally, we introduce a Knowledge Aware Negative Sampling (KANS) that generates harder to distinguish negative samples for the model to learn optimal representations. We perform exhaustive experimentation on four cross-lingual datasets and report an ablation study to demonstrate the effectiveness of GRN, KANS, and position encodings.

Representation learning over multiple knowledge graphs for knowledge graphs alignment

Neurocomputing, 2018

The goal of representation learning of knowledge graph is to encode both entities and relations into a low-dimensional embedding spaces. Mostly current works have demonstrated the benefits of knowledge graph embedding in single knowledge graph completion, such as relation extraction. The most significant distinction between multiple knowledge graphs embedding and single knowledge graph embedding is that the former must consider the alignments between multiple knowledge graphs which is very helpful to some applications built on multiple KGs, such as KB-QA and KG integration. In this paper, we proposed a new automatic representation learning model over Multiple Knowledge Graphs (MGTransE) by adopting a bootstrapping method. More specifically, MGTransE

A Survey of Knowledge Graph Embedding and Their Applications

ArXiv, 2021

Knowledge Graph embedding provides a versatile technique for representing knowledge. These techniques can be used in a variety of applications such as completion of knowledge graph to predict missing information, recommender systems, question answering, query expansion, etc. The information embedded in Knowledge graph though being structured is challenging to consume in a real-world application. Knowledge graph embedding enables the real-world application to consume information to improve performance. Knowledge graph embedding is an active research area. Most of the embedding methods focus on structure-based information. Recent research has extended the boundary to include text-based information and image-based information in entity embedding. Efforts have been made to enhance the representation with context information. This paper introduces growth in the field of KG embedding from simple translation-based models to enrichment-based models. This paper includes the utility of the Kn...

InterHT: Knowledge Graph Embeddings by Interaction between Head and Tail Entities

ArXiv, 2022

Knowledge graph embedding (KGE) models learn the representation of entities and relations in knowledge graphs. Distance-based methods show promising performance on link prediction task, which predicts the result by the distance between two entity representations. However, most of these methods represent the head entity and tail entity separately, which limits the model capacity. We propose a novel distance-based method named InterHT that allows the head and tail entities to interact better and get better entity representation. Experimental results show that our proposed method achieves the best results on ogblwikikg2 dataset.

Leveraging Entity-Type Properties in the Relational Context for Knowledge Graph Embedding

IEICE Transactions on Information and Systems, 2020

Knowledge graph embedding aims to embed entities and relations of multi-relational data in low dimensional vector spaces. Knowledge graphs are useful for numerous artificial intelligence (AI) applications. However, they (KGs) are far from completeness and hence KG embedding models have quickly gained massive attention. Nevertheless, the state-of-the-art KG embedding models ignore the category specific projection of entities and the impact of entity types in relational aspect. For example, the entity "Washington" could belong to the person or location category depending on its appearance in a specific relation. In a KG, an entity usually holds many type properties. It leads us to a very interesting question: are all the type properties of an entity are meaningful for a specific relation? In this paper, we propose a KG embedding model TPRC that leverages entity-type properties in the relational context. To show the effectiveness of our model, we apply our idea to the TransE, TransR and TransD. Our approach outperforms other state-of-the-art approaches as TransE, TransD, DistMult and ComplEx. Another, important observation is: introducing entity type properties in the relational context can improve the performances of the original translation distance based models.

End-to-End Entity Linking and Disambiguation leveraging Word and Knowledge Graph Embeddings

ArXiv, 2020

Entity linking - connecting entity mentions in a natural language utterance to knowledge graph (KG) entities is a crucial step for question answering over KGs. It is often based on measuring the string similarity between the entity label and its mention in the question. The relation referred to in the question can help to disambiguate between entities with the same label. This can be misleading if an incorrect relation has been identified in the relation linking step. However, an incorrect relation may still be semantically similar to the relation in which the correct entity forms a triple within the KG; which could be captured by the similarity of their KG embeddings. Based on this idea, we propose the first end-to-end neural network approach that employs KG as well as word embeddings to perform joint relation and entity classification of simple questions while implicitly performing entity disambiguation with the help of a novel gating mechanism. An empirical evaluation shows that ...

Transition-based Knowledge Graph Embedding with Relational Mapping Properties

Many knowledge repositories nowadays contain billions of triplets, i.e. (head-entity, relationship, tail-entity), as relation instances. These triplets form a directed graph with entities as nodes and relationships as edges. However, this kind of symbolic and discrete storage structure makes it difficult for us to exploit the knowledge to enhance other intelligenceacquired applications (e.g. the Question-Answering System), as many AI-related algorithms prefer conducting computation on continuous data. Therefore, a series of emerging approaches have been proposed to facilitate knowledge computing via encoding the knowledge graph into a low-dimensional embedding space. TransE is the latest and most promising approach among them, and can achieve a higher performance with fewer parameters by modeling the relationship as a transitional vector from the head entity to the tail entity. Unfortunately, it is not flexible enough to tackle well with the various mapping properties of triplets, even though its authors spot the harm on performance. In this paper, we thus propose a superior model called TransM to leverage the structure of the knowledge graph via pre-calculating the distinct weight for each training triplet according to its relational mapping property. In this way, the optimal function deals with each triplet depending on its own weight. We carry out extensive experiments to compare TransM with the state-of-the-art method TransE and other prior arts. The performance of each approach is evaluated within two different application scenarios on several benchmark datasets. Results show that the model we proposed significantly outperforms the former ones with lower parameter complexity as TransE.

Improving Knowledge Graph Embeddings with Inferred Entity Types

2019

In this paper we study techniques to improve the performance of bilinear embedding methods for knowledge graph completion on large datasets, where at each epoch the model sees a very small percentage of the training data, and the number of generated negative examples for each positive example is limited to a small portion of the entire set of entities. We first present a heuristic method to infer the types and type constraints of entities and relations. We then use this method to construct both a joint learning model, and a straightforward method for increasing the quality of sampled negatives during training. We show that when these two techniques are combined, they give an improvement in performance of up to 5.6% for Hits@1. We find the improvement is especially significant when the batch size and the number of generated negative examples are low relative to the size of the dataset.