Mohamed Morsey - Academia.edu (original) (raw)
Papers by Mohamed Morsey
Purpose – DBpedia extracts structured information from Wikipedia, interlinks it with other knowle... more Purpose – DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the Web using Linked Data and SPARQL. However, the DBpedia release process is heavy-weight and releases are sometimes based on several months old data. DBpedia-Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. Design/methodology/approach – Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of recently updated articles. DBpedia-Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia-Live publishes the newly added/deleted triples in files, in order to enable synchronization between our DBpedia endpoint and other DBpedia mirrors. Findings – During the realization of DBpedia-Live we learned, that it is crucial to process Wikipedia updates in a priority queue. Recently-updated Wikipedia articles should have ...
Lecture Notes in Computer Science
Semantic Web
The DBpedia community project extracts structured, multilingual knowledge from Wikipedia and make... more The DBpedia community project extracts structured, multilingual knowledge from Wikipedia and makes it freely available on the Web using Semantic Web and Linked Data technologies. The project extracts knowledge from 111 different language editions of Wikipedia. The largest DBpedia knowledge base which is extracted from the English edition of Wikipedia consists of over 400 million facts that describe 3.7 million things. The DBpedia knowledge bases that are extracted from the other 110 Wikipedia editions together consist of 1.46 billion facts and describe 10 million additional things. The DBpedia project maps Wikipedia infoboxes from 27 different language editions to a single shared ontology consisting of 320 classes and 1,650 properties. The mappings are created via a worldwide crowd-sourcing effort and enable knowledge from the different Wikipedia editions to be combined. The project publishes regular releases of all DBpedia knowledge bases for download and provides SPARQL query access to 14 out of the 111 language editions via a global network of local DBpedia chapters. In addition to the regular releases, the project maintains a live knowledge base which is updated whenever a page in Wikipedia changes. DBpedia sets 27 million RDF links pointing into over 30 external data sources and thus enables data from these sources to be used together with DBpedia data. Several hundred data sets on the Web publish RDF links pointing to DBpedia themselves and thus make DBpedia one of the central interlinking hubs in the Linked Open Data (LOD) cloud. In this system report, we give an overview of the DBpedia community project, including its architecture, technical implementation, maintenance, internationalisation, usage statistics and applications.
2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 2016
Proceedings of the 6th International Conference on Cloud Computing and Services Science, 2016
Cloud Computing has several provisioning models, namely Infrastructure as a service (IaaS), Platf... more Cloud Computing has several provisioning models, namely Infrastructure as a service (IaaS), Platform as a service (PaaS), and Software as a service (SaaS). However, cloud users (tenants) have limited or no control over the underlying network resources and services. Network as a Service (NaaS) is emerging as a novel model to bridge this gap. However, NaaS requires an approach capable of modeling the underlying network resources and capabilities in abstracted and vendor-independent form. In this paper we elaborate on SemNaaS, a Semantic Web based approach for supporting network management in NaaS systems. Our contribution is threefold. First, we adopt and improve the Network Markup Language (NML) ontology for describing NaaS infrastructures. Second, based on that ontology, we develop a network modeling system that is integrated with the existing OpenNaaS framework. Third, we demonstrate the benefits that Semantic Web adds to the Network as a Service paradigm by applying SemNaaS operations to a specific NaaS use case.
Lecture Notes in Computer Science, 2015
Proceedings of the 10th EAI International Conference on Testbeds and Research Infrastructures for the Development of Networks & Communities, 2015
The Internet remains an unfinished work. There are several approaches to enhancing it that have b... more The Internet remains an unfinished work. There are several approaches to enhancing it that have been experimentally validated within federated testbed environments. To best gain scientific knowledge from these studies, reproducibility and automation are needed in all areas of the experiment life cycle. Within the GENI and FIRE context, several architectures and protocols have been developed for this purpose. However, a major open research issue remains, namely the description and discovery of the heterogeneous resources involved. To remedy this, we propose a semantic information model that can be used to allow declarative interoperability, build dependency graphs, validate requests, infer knowledge and conduct complex queries. The requirements for such an information model have been extracted from current international Future Internet research projects and the practicality of the model is being evaluated through initial implementations. The main outcome of this work is the definition of the Open-Multinet Upper Ontology and related sub-ontologies, which can be used to describe and manage federated infrastructures and their resources.
Proceedings of the 9th International Conference on Semantic Systems - I-SEMANTICS '13, 2013
Linked Open Data (LOD) comprises of an unprecedented volume of structured datasets on the Web. Ho... more Linked Open Data (LOD) comprises of an unprecedented volume of structured datasets on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowdsourced and even extracted data of relatively low quality. We present a methodology for assessing the quality of linked data resources, which comprises of a manual and a semi-automatic process. The first phase includes the detection of common quality problems and their representation in a quality problem taxonomy. In the manual process, the second phase comprises of the evaluation of a large number of individual resources, according to the quality problem taxonomy via crowdsourcing. This process is accompanied by a tool wherein a user assesses an individual resource and evaluates each fact for correctness. The semi-automatic process involves the generation and verification of schema axioms. We report the results obtained by applying this methodology to DBpedia. We identified 17 data quality problem types and 58 users assessed a total of 521 resources. Overall, 11.93% of the evaluated DBpedia triples were identified to have some quality issues. Applying the semi-automatic component yielded a total of 222,982 triples that have a high probability to be incorrect. In particular, we found that problems such as object values being incorrectly extracted, irrelevant extraction of information * Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
Lecture Notes in Computer Science, 2012
One of the main tasks when creating and maintaining knowledge bases is to validate facts and prov... more One of the main tasks when creating and maintaining knowledge bases is to validate facts and provide sources for them in order to ensure correctness and traceability of the provided knowledge. So far, this task is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents. In this article, we present DeFacto (Deep Fact Validation)an algorithm for validating facts by finding trustworthy sources for it on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of webpages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact.
Abstract. Linked Open Data (LOD) comprises of an unprecedented volume of structured data being av... more Abstract. Linked Open Data (LOD) comprises of an unprecedented volume of structured data being available on the Web. However, these datasets are of very varying quality ranging from extensively curated datasets to crowd-sourced and even extracted data of relatively low quality. We present a methodology for crowd-sourcing the quality assessment of linked data resources. The first step of the methodology comprises the detection of common quality problems and their representation in a quality problem taxonomy. In a second phase, the ...
A central component in many applications is the underlying data management layer. In Data-Web app... more A central component in many applications is the underlying data management layer. In Data-Web applications, the central component of this layer is the triple store. It is thus evident that finding the most adequate store for the application to develop is of crucial importance for individual projects as well as for data integration on the Data Web in general. In this paper, we propose a generic benchmark creation procedure for SPARQL, which we apply to the DBpedia knowledge base. In contrast to previous approaches, our benchmark ...
Data
A standardized descriptive ontology supports efficient querying and manipulation of data from het... more A standardized descriptive ontology supports efficient querying and manipulation of data from heterogeneous sources across boundaries of distributed infrastructures, particularly in federated environments. In this article, we present the Open-Multinet (OMN) set of ontologies, which were designed specifically for this purpose as well as to support management of life-cycles of infrastructure resources. We present their initial application in Future Internet testbeds, their use for representing and requesting available resources, and our experimental performance evaluation of the ontologies in terms of querying and translation times. Our results highlight the value and applicability of Semantic Web technologies in managing resources of federated cyber-infrastructures.
Purpose – DBpedia extracts structured information from Wikipedia, interlinks it with other knowle... more Purpose – DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the Web using Linked Data and SPARQL. However, the DBpedia release process is heavy-weight and releases are sometimes based on several months old data. DBpedia-Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. Design/methodology/approach – Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of recently updated articles. DBpedia-Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia-Live publishes the newly added/deleted triples in files, in order to enable synchronization between our DBpedia endpoint and other DBpedia mirrors. Findings – During the realization of DBpedia-Live we learned, that it is crucial to process Wikipedia updates in a priority queue. Recently-updated Wikipedia articles should have ...
Lecture Notes in Computer Science
Semantic Web
The DBpedia community project extracts structured, multilingual knowledge from Wikipedia and make... more The DBpedia community project extracts structured, multilingual knowledge from Wikipedia and makes it freely available on the Web using Semantic Web and Linked Data technologies. The project extracts knowledge from 111 different language editions of Wikipedia. The largest DBpedia knowledge base which is extracted from the English edition of Wikipedia consists of over 400 million facts that describe 3.7 million things. The DBpedia knowledge bases that are extracted from the other 110 Wikipedia editions together consist of 1.46 billion facts and describe 10 million additional things. The DBpedia project maps Wikipedia infoboxes from 27 different language editions to a single shared ontology consisting of 320 classes and 1,650 properties. The mappings are created via a worldwide crowd-sourcing effort and enable knowledge from the different Wikipedia editions to be combined. The project publishes regular releases of all DBpedia knowledge bases for download and provides SPARQL query access to 14 out of the 111 language editions via a global network of local DBpedia chapters. In addition to the regular releases, the project maintains a live knowledge base which is updated whenever a page in Wikipedia changes. DBpedia sets 27 million RDF links pointing into over 30 external data sources and thus enables data from these sources to be used together with DBpedia data. Several hundred data sets on the Web publish RDF links pointing to DBpedia themselves and thus make DBpedia one of the central interlinking hubs in the Linked Open Data (LOD) cloud. In this system report, we give an overview of the DBpedia community project, including its architecture, technical implementation, maintenance, internationalisation, usage statistics and applications.
2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 2016
Proceedings of the 6th International Conference on Cloud Computing and Services Science, 2016
Cloud Computing has several provisioning models, namely Infrastructure as a service (IaaS), Platf... more Cloud Computing has several provisioning models, namely Infrastructure as a service (IaaS), Platform as a service (PaaS), and Software as a service (SaaS). However, cloud users (tenants) have limited or no control over the underlying network resources and services. Network as a Service (NaaS) is emerging as a novel model to bridge this gap. However, NaaS requires an approach capable of modeling the underlying network resources and capabilities in abstracted and vendor-independent form. In this paper we elaborate on SemNaaS, a Semantic Web based approach for supporting network management in NaaS systems. Our contribution is threefold. First, we adopt and improve the Network Markup Language (NML) ontology for describing NaaS infrastructures. Second, based on that ontology, we develop a network modeling system that is integrated with the existing OpenNaaS framework. Third, we demonstrate the benefits that Semantic Web adds to the Network as a Service paradigm by applying SemNaaS operations to a specific NaaS use case.
Lecture Notes in Computer Science, 2015
Proceedings of the 10th EAI International Conference on Testbeds and Research Infrastructures for the Development of Networks & Communities, 2015
The Internet remains an unfinished work. There are several approaches to enhancing it that have b... more The Internet remains an unfinished work. There are several approaches to enhancing it that have been experimentally validated within federated testbed environments. To best gain scientific knowledge from these studies, reproducibility and automation are needed in all areas of the experiment life cycle. Within the GENI and FIRE context, several architectures and protocols have been developed for this purpose. However, a major open research issue remains, namely the description and discovery of the heterogeneous resources involved. To remedy this, we propose a semantic information model that can be used to allow declarative interoperability, build dependency graphs, validate requests, infer knowledge and conduct complex queries. The requirements for such an information model have been extracted from current international Future Internet research projects and the practicality of the model is being evaluated through initial implementations. The main outcome of this work is the definition of the Open-Multinet Upper Ontology and related sub-ontologies, which can be used to describe and manage federated infrastructures and their resources.
Proceedings of the 9th International Conference on Semantic Systems - I-SEMANTICS '13, 2013
Linked Open Data (LOD) comprises of an unprecedented volume of structured datasets on the Web. Ho... more Linked Open Data (LOD) comprises of an unprecedented volume of structured datasets on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowdsourced and even extracted data of relatively low quality. We present a methodology for assessing the quality of linked data resources, which comprises of a manual and a semi-automatic process. The first phase includes the detection of common quality problems and their representation in a quality problem taxonomy. In the manual process, the second phase comprises of the evaluation of a large number of individual resources, according to the quality problem taxonomy via crowdsourcing. This process is accompanied by a tool wherein a user assesses an individual resource and evaluates each fact for correctness. The semi-automatic process involves the generation and verification of schema axioms. We report the results obtained by applying this methodology to DBpedia. We identified 17 data quality problem types and 58 users assessed a total of 521 resources. Overall, 11.93% of the evaluated DBpedia triples were identified to have some quality issues. Applying the semi-automatic component yielded a total of 222,982 triples that have a high probability to be incorrect. In particular, we found that problems such as object values being incorrectly extracted, irrelevant extraction of information * Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
Lecture Notes in Computer Science, 2012
One of the main tasks when creating and maintaining knowledge bases is to validate facts and prov... more One of the main tasks when creating and maintaining knowledge bases is to validate facts and provide sources for them in order to ensure correctness and traceability of the provided knowledge. So far, this task is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents. In this article, we present DeFacto (Deep Fact Validation)an algorithm for validating facts by finding trustworthy sources for it on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of webpages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact.
Abstract. Linked Open Data (LOD) comprises of an unprecedented volume of structured data being av... more Abstract. Linked Open Data (LOD) comprises of an unprecedented volume of structured data being available on the Web. However, these datasets are of very varying quality ranging from extensively curated datasets to crowd-sourced and even extracted data of relatively low quality. We present a methodology for crowd-sourcing the quality assessment of linked data resources. The first step of the methodology comprises the detection of common quality problems and their representation in a quality problem taxonomy. In a second phase, the ...
A central component in many applications is the underlying data management layer. In Data-Web app... more A central component in many applications is the underlying data management layer. In Data-Web applications, the central component of this layer is the triple store. It is thus evident that finding the most adequate store for the application to develop is of crucial importance for individual projects as well as for data integration on the Data Web in general. In this paper, we propose a generic benchmark creation procedure for SPARQL, which we apply to the DBpedia knowledge base. In contrast to previous approaches, our benchmark ...
Data
A standardized descriptive ontology supports efficient querying and manipulation of data from het... more A standardized descriptive ontology supports efficient querying and manipulation of data from heterogeneous sources across boundaries of distributed infrastructures, particularly in federated environments. In this article, we present the Open-Multinet (OMN) set of ontologies, which were designed specifically for this purpose as well as to support management of life-cycles of infrastructure resources. We present their initial application in Future Internet testbeds, their use for representing and requesting available resources, and our experimental performance evaluation of the ontologies in terms of querying and translation times. Our results highlight the value and applicability of Semantic Web technologies in managing resources of federated cyber-infrastructures.