Russa Biswas | Saarland University (original) (raw)
Papers by Russa Biswas
arXiv (Cornell University), Jul 28, 2022
The entity type information in Knowledge Graphs (KGs) such as DBpedia, Freebase, etc. is often in... more The entity type information in Knowledge Graphs (KGs) such as DBpedia, Freebase, etc. is often incomplete due to automated generation or human curation. Entity typing is the task of assigning or inferring the semantic type of an entity in a KG. This paper presents GRAND, a novel approach for entity typing leveraging different graph walk strategies in RDF2vec together with textual entity descriptions. RDF2vec first generates graph walks and then uses a language model to obtain embeddings for each node in the graph. This study shows that the walk generation strategy and the embedding model have a significant effect on the performance of the entity typing task. The proposed approach outperforms the baseline approaches on the benchmark datasets DBpedia and FIGER for entity typing in KGs for both fine-grained and coarse-grained classes. The results show that the combination of orderaware RDF2vec variants together with the contextual embeddings of the textual entity descriptions achieve the best results.
Lecture Notes in Computer Science, 2022
Knowledge Graphs (KGs) have become the backbone of various machine learning based applications ov... more Knowledge Graphs (KGs) have become the backbone of various machine learning based applications over the past decade. However, the KGs are often incomplete and inconsistent. Several representation learning based approaches have been introduced to complete the missing information in KGs. Besides, Neural Language Models (NLMs) have gained huge momentum in NLP applications. However, exploiting the contextual NLMs to tackle the Knowledge Graph Completion (KGC) task is still an open research problem. In this paper, a GPT-2 based KGC model is proposed and is evaluated on two benchmark datasets. The initial results obtained from the fine-tuning of the GPT-2 model for triple classification strengthens the importance of usage of NLMs for KGC. Also, the impact of contextual language models for KGC has been discussed.
Wikipedia has emerged as the largest multilingual, web based general reference work on the Intern... more Wikipedia has emerged as the largest multilingual, web based general reference work on the Internet. A huge amount of human resources have been invested in the creation and update of Wikipedia articles which are ideally complemented by so-called infobox templates defining the type of the underlying article. It has been observed that the Wikipedia infobox type information is often incomplete and inconsistent due to various reasons. However, the Wikipedia infobox type information plays a fundamental role for the RDF type information of Wikipedia based Knowledge Graphs such as DBpedia. This stimulates the need of always having the correct and complete infobox type information. In this work, we propose an approach to predict Wikipedia infobox types by using word embeddings on categories of Wikipedia articles, and analyze the impact of using minimal information from the Wikipedia articles in the prediction process.
Knowledge Graphs (KGs) comprise of interlinked information in the form of entities and relations ... more Knowledge Graphs (KGs) comprise of interlinked information in the form of entities and relations between them in a particular domain and provide the backbone for many applications. However, the KGs are often incomplete as the links between the entities are missing. Link Prediction is the task of predicting these missing links in a KG based on the existing links. Recent years have witnessed many studies on link prediction using KG embeddings which is one of the mainstream tasks in KG completion. To do so, most of the existing methods learn the latent representation of the entities and relations whereas only a few of them consider contextual information as well as the textual descriptions of the entities. This paper introduces an attentive encoder-decoder based link prediction approach considering both structural information of the KG and the textual entity descriptions. A path selection method is used to encapsulate the contextual information of an entity in a KG. The model explores ...
Written text can be understood as a means to acquire insights into the nature of past and present... more Written text can be understood as a means to acquire insights into the nature of past and present cultures and societies. Numerous projects have been devoted to digitizing and publishing historical textual documents in digital libraries which scientists can utilize as valuable resources for research. However, the extent of textual data available exceeds humans’ abilities to explore the data efficiently. In this paper, a framework is presented which combines unsupervised machine learning techniques and natural language processing on the example of historical text documents on the 19th century of the USA. Named entities are extracted from semi-structured text, which is enriched with complementary information from Wikidata. Word embeddings are leveraged to enable further analysis of the text corpus, which is visualized in a web-based application.
The Semantic Web: ESWC 2021 Satellite Events, 2021
The entity type information in a Knowledge Graph (KG) plays an important role in a wide range of ... more The entity type information in a Knowledge Graph (KG) plays an important role in a wide range of applications in Natural Language Processing such as entity linking, question answering, relation extraction, etc. However, the available entity types are often noisy and incomplete. Entity Typing is a non-trivial task if enough information is not available for the entities in a KG. In this work, neural language models and a character embedding model are exploited to predict the type of an entity from only the name of the entity without any other information from the KG. The model has been successfully evaluated on a benchmark dataset.
Child obesity is a serious problem in our modern world and shows an increase of 60% since 1990. D... more Child obesity is a serious problem in our modern world and shows an increase of 60% since 1990. Due to time and cost intensity of traditional therapy programs, scientists started to focus on IT-based interventions. Our paper focuses on measuring biosignals (e.g. heart rate) of obese children during fittest including different physical activities (e.g. running). We investigate whether it is possible to predict the performance of obese children during running tests based on static (e.g. BMI) as well as dynamic (e.g. heart rate) parameters. Here, we focused on heart rate-related parameters from the inverted U-shaped heart rate response of obese children during running tests. For future research, we plan to consider physical activity (e.g. step count) of the children at home. Our approach is a NeuroIS service, which uses low-cost devices making prediction on an individual’s future development and is also applicable to other domains (e.g. business information systems).
ArXiv, 2020
One of the grand challenges discussed during the Dagstuhl Seminar “Knowledge Graphs: New Directio... more One of the grand challenges discussed during the Dagstuhl Seminar “Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web” [24] and described in its report is that of a: Public FAIR Knowledge Graph of Everything: We increasingly see the creation of knowledge graphs that capture information about the entirety of a class of entities. For example, Amazon is creating a knowledge graph of all products in the world and Google and Apple have both created knowledge graphs of all locations in the world. This grand challenge extends this further by asking if we can create a knowledge graph of “everything” ranging from common sense concepts to location based entities. This knowledge graph should be “open to the public” in a FAIR manner democratizing this mass amount of knowledge. Although linked open data (LOD) is one knowledge graph, it is the closest realisation (and probably the only one) to a public FAIR Knowledge Graph (KG) of everything. Surely, LOD provides a ...
Wikipedia, the multilingual, free content encyclopedia has evolved as the largest and the most po... more Wikipedia, the multilingual, free content encyclopedia has evolved as the largest and the most popular general reference work on the Internet. Since the time of commencement of Wikipedia, crowd sourcing of articles has been one of the most salient features of this open encyclopedia. It is obvious that enormous amount of work and expertise goes in the creation of a self-content article. However, it has been observed that the infobox type information in Wikipedia articles is often incomplete, incorrect and missing. This is due to the human intervention in creating Wikipedia articles. Moreover, the type of the infoboxes in Wikipedia plays a vital role in the determination of RDF type inference in the Knowledge Graphs such as DBpedia. Hence, there arouses a necessity to have the correct infobox type information in the Wikipedia articles. In this paper, we propose an approach of predicting Wikipedia infobox type information using both word and network embeddings. Furthermore, the impact ...
Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020
Knowledge Graphs (KGs) have recently gained attention for representing knowledge about a particul... more Knowledge Graphs (KGs) have recently gained attention for representing knowledge about a particular domain. Since its advent, the Linked Open Data (LOD) cloud has constantly been growing containing many KGs about many different domains such as government, scholarly data, biomedical domain, etc. Apart from facilitating the inter-connectivity of datasets in the LOD cloud, KGs have been used in a variety of machine learning and Natural Language Processing (NLP) based applications. However, the information present in the KGs are sparse and are often incomplete. Predicting the missing links between the entities is necessary to overcome this issue. Moreover, in the LOD cloud, information about the same entities is available in multiple KGs in different forms. But the information that these entities are the same across KGs is missing. The main focus of this thesis is to do Knowledge Graph Completion by tackling the link prediction tasks within a KG as well as across different KGs. To do so, the latent representation of KGs in a low dimensional vector space has been exploited to predict the missing information in order to complete the KGs.
Open Knowledge Graphs (KGs) such as DBpedia and Wikidata have been recognized as the foundations ... more Open Knowledge Graphs (KGs) such as DBpedia and Wikidata have been recognized as the foundations for diverse applications in the field of data mining and information retrieval. Each of these KGs follows a different knowledge organization as well as is based on differently structured ontologies. Moreover, it has been observed that type information are often noisy, incomplete or even incorrect. In general, there is a need for well defined and comparable type information for the entities of the KGs. In this paper, we propose an isomorphism-based approach to infer subsumption relations to RDF type information in Wikidata by exploiting the RDF type information from DBpedia.
Knowledge Graphs are organized to describe entities from any discipline and the interrelations be... more Knowledge Graphs are organized to describe entities from any discipline and the interrelations between them. Apart from facilitating the inter-connectivity of datasets in the LOD cloud, KGs have been used in a variety of applications such as Web search or entity linking, and recently are part of popular search systems and Q&A applications etc. However, the KG applications suffer from high computational and storage cost. Hence, there arises the necessity of having a representation learning of the high dimensional KGs into low dimensional spaces preserving structural as well as relational information. In this study, we conduct a comprehensive survey based on techniques of KG embedding models which consider the structured information of the graph as well as the unstructured information in form of literals such as text, numerical values etc. Furthermore, we address the challenges in their embedding models followed by a discussion on different application scenarios.
ArXiv, 2019
Linked Open Data (LOD) is the publicly available RDF data in the Web. Each LOD entity is identfie... more Linked Open Data (LOD) is the publicly available RDF data in the Web. Each LOD entity is identfied by a URI and accessible via HTTP. LOD encodes globalscale knowledge potentially available to any human as well as artificial intelligence that may want to benefit from it as background knowledge for supporting their tasks. LOD has emerged as the backbone of applications in diverse fields such as Natural Language Processing, Information Retrieval, Computer Vision, Speech Recognition, and many more. Nevertheless, regardless of the specific tasks that LOD-based tools aim to address, the reuse of such knowledge may be challenging for diverse reasons, e.g. semantic heterogeneity, provenance, and data quality. As aptly stated by Heath et al. Linked Data might be outdated, imprecise, or simply wrong": there arouses a necessity to investigate the problem of linked data validity. This work reports a collaborative effort performed by nine teams of students, guided by an equal number of seni...
Semantic Web, 2021
Knowledge Graphs (KGs) are composed of structured information about a particular domain in the fo... more Knowledge Graphs (KGs) are composed of structured information about a particular domain in the form of entities and relations. In addition to the structured information KGs help in facilitating interconnectivity and interoperability between different resources represented in the Linked Data Cloud. KGs have been used in a variety of applications such as entity linking, question answering, recommender systems, etc. However, KG applications suffer from high computational and storage costs. Hence, there arises the necessity for a representation able to map the high dimensional KGs into low dimensional spaces, i.e., embedding space, preserving structural as well as relational information. This paper conducts a survey of KG embedding models which not only consider the structured information contained in the form of entities and relations in a KG but also its unstructured information represented as literals such as text, numerical values, images, etc. Along with a theoretical analysis and ...
Companion Proceedings of the Web Conference 2021, 2021
Proceedings of the 11th on Knowledge Capture Conference, 2021
The Semantic Web: ESWC 2020 Satellite Events
arXiv (Cornell University), Jul 28, 2022
The entity type information in Knowledge Graphs (KGs) such as DBpedia, Freebase, etc. is often in... more The entity type information in Knowledge Graphs (KGs) such as DBpedia, Freebase, etc. is often incomplete due to automated generation or human curation. Entity typing is the task of assigning or inferring the semantic type of an entity in a KG. This paper presents GRAND, a novel approach for entity typing leveraging different graph walk strategies in RDF2vec together with textual entity descriptions. RDF2vec first generates graph walks and then uses a language model to obtain embeddings for each node in the graph. This study shows that the walk generation strategy and the embedding model have a significant effect on the performance of the entity typing task. The proposed approach outperforms the baseline approaches on the benchmark datasets DBpedia and FIGER for entity typing in KGs for both fine-grained and coarse-grained classes. The results show that the combination of orderaware RDF2vec variants together with the contextual embeddings of the textual entity descriptions achieve the best results.
Lecture Notes in Computer Science, 2022
Knowledge Graphs (KGs) have become the backbone of various machine learning based applications ov... more Knowledge Graphs (KGs) have become the backbone of various machine learning based applications over the past decade. However, the KGs are often incomplete and inconsistent. Several representation learning based approaches have been introduced to complete the missing information in KGs. Besides, Neural Language Models (NLMs) have gained huge momentum in NLP applications. However, exploiting the contextual NLMs to tackle the Knowledge Graph Completion (KGC) task is still an open research problem. In this paper, a GPT-2 based KGC model is proposed and is evaluated on two benchmark datasets. The initial results obtained from the fine-tuning of the GPT-2 model for triple classification strengthens the importance of usage of NLMs for KGC. Also, the impact of contextual language models for KGC has been discussed.
Wikipedia has emerged as the largest multilingual, web based general reference work on the Intern... more Wikipedia has emerged as the largest multilingual, web based general reference work on the Internet. A huge amount of human resources have been invested in the creation and update of Wikipedia articles which are ideally complemented by so-called infobox templates defining the type of the underlying article. It has been observed that the Wikipedia infobox type information is often incomplete and inconsistent due to various reasons. However, the Wikipedia infobox type information plays a fundamental role for the RDF type information of Wikipedia based Knowledge Graphs such as DBpedia. This stimulates the need of always having the correct and complete infobox type information. In this work, we propose an approach to predict Wikipedia infobox types by using word embeddings on categories of Wikipedia articles, and analyze the impact of using minimal information from the Wikipedia articles in the prediction process.
Knowledge Graphs (KGs) comprise of interlinked information in the form of entities and relations ... more Knowledge Graphs (KGs) comprise of interlinked information in the form of entities and relations between them in a particular domain and provide the backbone for many applications. However, the KGs are often incomplete as the links between the entities are missing. Link Prediction is the task of predicting these missing links in a KG based on the existing links. Recent years have witnessed many studies on link prediction using KG embeddings which is one of the mainstream tasks in KG completion. To do so, most of the existing methods learn the latent representation of the entities and relations whereas only a few of them consider contextual information as well as the textual descriptions of the entities. This paper introduces an attentive encoder-decoder based link prediction approach considering both structural information of the KG and the textual entity descriptions. A path selection method is used to encapsulate the contextual information of an entity in a KG. The model explores ...
Written text can be understood as a means to acquire insights into the nature of past and present... more Written text can be understood as a means to acquire insights into the nature of past and present cultures and societies. Numerous projects have been devoted to digitizing and publishing historical textual documents in digital libraries which scientists can utilize as valuable resources for research. However, the extent of textual data available exceeds humans’ abilities to explore the data efficiently. In this paper, a framework is presented which combines unsupervised machine learning techniques and natural language processing on the example of historical text documents on the 19th century of the USA. Named entities are extracted from semi-structured text, which is enriched with complementary information from Wikidata. Word embeddings are leveraged to enable further analysis of the text corpus, which is visualized in a web-based application.
The Semantic Web: ESWC 2021 Satellite Events, 2021
The entity type information in a Knowledge Graph (KG) plays an important role in a wide range of ... more The entity type information in a Knowledge Graph (KG) plays an important role in a wide range of applications in Natural Language Processing such as entity linking, question answering, relation extraction, etc. However, the available entity types are often noisy and incomplete. Entity Typing is a non-trivial task if enough information is not available for the entities in a KG. In this work, neural language models and a character embedding model are exploited to predict the type of an entity from only the name of the entity without any other information from the KG. The model has been successfully evaluated on a benchmark dataset.
Child obesity is a serious problem in our modern world and shows an increase of 60% since 1990. D... more Child obesity is a serious problem in our modern world and shows an increase of 60% since 1990. Due to time and cost intensity of traditional therapy programs, scientists started to focus on IT-based interventions. Our paper focuses on measuring biosignals (e.g. heart rate) of obese children during fittest including different physical activities (e.g. running). We investigate whether it is possible to predict the performance of obese children during running tests based on static (e.g. BMI) as well as dynamic (e.g. heart rate) parameters. Here, we focused on heart rate-related parameters from the inverted U-shaped heart rate response of obese children during running tests. For future research, we plan to consider physical activity (e.g. step count) of the children at home. Our approach is a NeuroIS service, which uses low-cost devices making prediction on an individual’s future development and is also applicable to other domains (e.g. business information systems).
ArXiv, 2020
One of the grand challenges discussed during the Dagstuhl Seminar “Knowledge Graphs: New Directio... more One of the grand challenges discussed during the Dagstuhl Seminar “Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web” [24] and described in its report is that of a: Public FAIR Knowledge Graph of Everything: We increasingly see the creation of knowledge graphs that capture information about the entirety of a class of entities. For example, Amazon is creating a knowledge graph of all products in the world and Google and Apple have both created knowledge graphs of all locations in the world. This grand challenge extends this further by asking if we can create a knowledge graph of “everything” ranging from common sense concepts to location based entities. This knowledge graph should be “open to the public” in a FAIR manner democratizing this mass amount of knowledge. Although linked open data (LOD) is one knowledge graph, it is the closest realisation (and probably the only one) to a public FAIR Knowledge Graph (KG) of everything. Surely, LOD provides a ...
Wikipedia, the multilingual, free content encyclopedia has evolved as the largest and the most po... more Wikipedia, the multilingual, free content encyclopedia has evolved as the largest and the most popular general reference work on the Internet. Since the time of commencement of Wikipedia, crowd sourcing of articles has been one of the most salient features of this open encyclopedia. It is obvious that enormous amount of work and expertise goes in the creation of a self-content article. However, it has been observed that the infobox type information in Wikipedia articles is often incomplete, incorrect and missing. This is due to the human intervention in creating Wikipedia articles. Moreover, the type of the infoboxes in Wikipedia plays a vital role in the determination of RDF type inference in the Knowledge Graphs such as DBpedia. Hence, there arouses a necessity to have the correct infobox type information in the Wikipedia articles. In this paper, we propose an approach of predicting Wikipedia infobox type information using both word and network embeddings. Furthermore, the impact ...
Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020
Knowledge Graphs (KGs) have recently gained attention for representing knowledge about a particul... more Knowledge Graphs (KGs) have recently gained attention for representing knowledge about a particular domain. Since its advent, the Linked Open Data (LOD) cloud has constantly been growing containing many KGs about many different domains such as government, scholarly data, biomedical domain, etc. Apart from facilitating the inter-connectivity of datasets in the LOD cloud, KGs have been used in a variety of machine learning and Natural Language Processing (NLP) based applications. However, the information present in the KGs are sparse and are often incomplete. Predicting the missing links between the entities is necessary to overcome this issue. Moreover, in the LOD cloud, information about the same entities is available in multiple KGs in different forms. But the information that these entities are the same across KGs is missing. The main focus of this thesis is to do Knowledge Graph Completion by tackling the link prediction tasks within a KG as well as across different KGs. To do so, the latent representation of KGs in a low dimensional vector space has been exploited to predict the missing information in order to complete the KGs.
Open Knowledge Graphs (KGs) such as DBpedia and Wikidata have been recognized as the foundations ... more Open Knowledge Graphs (KGs) such as DBpedia and Wikidata have been recognized as the foundations for diverse applications in the field of data mining and information retrieval. Each of these KGs follows a different knowledge organization as well as is based on differently structured ontologies. Moreover, it has been observed that type information are often noisy, incomplete or even incorrect. In general, there is a need for well defined and comparable type information for the entities of the KGs. In this paper, we propose an isomorphism-based approach to infer subsumption relations to RDF type information in Wikidata by exploiting the RDF type information from DBpedia.
Knowledge Graphs are organized to describe entities from any discipline and the interrelations be... more Knowledge Graphs are organized to describe entities from any discipline and the interrelations between them. Apart from facilitating the inter-connectivity of datasets in the LOD cloud, KGs have been used in a variety of applications such as Web search or entity linking, and recently are part of popular search systems and Q&A applications etc. However, the KG applications suffer from high computational and storage cost. Hence, there arises the necessity of having a representation learning of the high dimensional KGs into low dimensional spaces preserving structural as well as relational information. In this study, we conduct a comprehensive survey based on techniques of KG embedding models which consider the structured information of the graph as well as the unstructured information in form of literals such as text, numerical values etc. Furthermore, we address the challenges in their embedding models followed by a discussion on different application scenarios.
ArXiv, 2019
Linked Open Data (LOD) is the publicly available RDF data in the Web. Each LOD entity is identfie... more Linked Open Data (LOD) is the publicly available RDF data in the Web. Each LOD entity is identfied by a URI and accessible via HTTP. LOD encodes globalscale knowledge potentially available to any human as well as artificial intelligence that may want to benefit from it as background knowledge for supporting their tasks. LOD has emerged as the backbone of applications in diverse fields such as Natural Language Processing, Information Retrieval, Computer Vision, Speech Recognition, and many more. Nevertheless, regardless of the specific tasks that LOD-based tools aim to address, the reuse of such knowledge may be challenging for diverse reasons, e.g. semantic heterogeneity, provenance, and data quality. As aptly stated by Heath et al. Linked Data might be outdated, imprecise, or simply wrong": there arouses a necessity to investigate the problem of linked data validity. This work reports a collaborative effort performed by nine teams of students, guided by an equal number of seni...
Semantic Web, 2021
Knowledge Graphs (KGs) are composed of structured information about a particular domain in the fo... more Knowledge Graphs (KGs) are composed of structured information about a particular domain in the form of entities and relations. In addition to the structured information KGs help in facilitating interconnectivity and interoperability between different resources represented in the Linked Data Cloud. KGs have been used in a variety of applications such as entity linking, question answering, recommender systems, etc. However, KG applications suffer from high computational and storage costs. Hence, there arises the necessity for a representation able to map the high dimensional KGs into low dimensional spaces, i.e., embedding space, preserving structural as well as relational information. This paper conducts a survey of KG embedding models which not only consider the structured information contained in the form of entities and relations in a KG but also its unstructured information represented as literals such as text, numerical values, images, etc. Along with a theoretical analysis and ...
Companion Proceedings of the Web Conference 2021, 2021
Proceedings of the 11th on Knowledge Capture Conference, 2021
The Semantic Web: ESWC 2020 Satellite Events