Hala Skaf-Molli - Academia.edu (original) (raw)

Papers by Hala Skaf-Molli

Processing SPARQL queries over large federations of SPARQL endpoints is crucial for keeping the S... more Processing SPARQL queries over large federations of SPARQL endpoints is crucial for keeping the Semantic Web decentralized. Despite the existence of hundreds of SPARQL endpoints, current federation engines only scale to dozens. One major issue comes from the current definition of the source selection problem, i.e., finding the minimal set of SPARQL endpoints to contact per triple pattern. Even if such a source selection is minimal, only a few combinations of sources may return results. Consequently, most of the query processing time is wasted evaluating combinations that return no results. In this paper, we introduce the concept of Result-Aware query plans. This concept ensures that every subquery of the query plan effectively contributes to the result of the query. To compute a Result-Aware query plan, we propose FedUP, a new federation engine able to produce Result-Aware query plans by tracking the provenance of query results. However, getting query results requires computing source selection, and computing source selection requires query results. To break this vicious cycle, FedUP computes results and provenances on tiny quotient summaries of federations at the cost of source selection accuracy. Experimental results on federated benchmarks demonstrate that FedUP outperforms state-of-the-art federation engines by orders of magnitude in the context of large-scale federations.

Lecture Notes in Computer Science, 2023

HAL (Le Centre pour la Communication Scientifique Directe), Oct 23, 2022

Processing top-queries on public online SPARQL endpoints often runs into fair use policy quotas a... more Processing top-queries on public online SPARQL endpoints often runs into fair use policy quotas and does not complete. Indeed, existing endpoints mainly follow the traditional materialize-and-sort strategy. Although restricted SPARQL servers ensure the termination of top-queries without quotas enforcement, they follow the materialize-and-sort approach, resulting in high data transfer and poor performance. In this paper, we propose to extend the Web preemption model with a preemptable partial top-operator. This operator drastically reduces data transfer and significantly improves query execution time. Experimental results show a reduction in data transfer by a factor of 100 and a reduction of up to 39% in Wikidata query execution time.

HAL (Le Centre pour la Communication Scientifique Directe), Jul 5, 2021

Lecture Notes in Computer Science, 2014

Springer eBooks, 2018

The term Linked Open Data refers to all data that is published on the Web according to a set of b... more The term Linked Open Data refers to all data that is published on the Web according to a set of best practices, the Linked Data Principles. The idea behind these principles is, on the one hand side, to use standards for the representation and the access to data on the Web. On the other hand, the principles propagate to set hyperlinks between data from different sources. These hyperlinks connect all Linked Data into a single global data graph, similar as the hyperlinks on the classic Web connecting all HTML documents into a single global information space.

Huge repositories of pedagogical resources such as the French initiative of digital thematic libr... more Huge repositories of pedagogical resources such as the French initiative of digital thematic libraries are now accessible by both students and professors. This high quantity of data makes their access difficult to the students, as they cannot get easily pertinent pedagogical resources that fit their needs. One way to make this access easier is to add annotations to these resources and to exploit these annotations to find pertinent answers. Of course, these annotations can be semantic. Semantic wikis are a new approach that automatically processes semantic annotations and that can be used to find the adequate resources given the requests of students. However, annotating semantically resources is not an easy task for human. Despite of their high potential, semantic wikis suffer from a lack of human provided semantic annotations, resulting in a loss of their efficiency. We propose a system (called HCA) that suggests automatically computed annotations to users in semantic wikis. In this paper users are students and professors. Users only have to validate, complete, modify, refuse or ignore these suggested annotations. Therefore, the annotation task becomes easier, and we assume that more users will provide annotations, leading to an improvement of the system and a facilitated access to pertinent pedagogical resources. The HCA system is based on collaborative filtering recommender systems, it does not exploit the content of the pages but the usage made on these pages by the students and the professors. The resulting semantic wikis contain several kinds of annotations, with different status: human, computer or humancomputed provided annotations.

HAL (Le Centre pour la Communication Scientifique Directe), Jun 8, 2016

Scientific Workflow management systems have been largely adopted by data-intensive science commun... more Scientific Workflow management systems have been largely adopted by data-intensive science communities. Many efforts have been dedicated to the representation and exploitation of provenance to improve reproducibility in data-intensive sciences. However, few works address the mining of provenance graphs to annotate the produced data with domain-specific context for better interpretation and sharing of results. In this paper, we propose PoeM, a lightweight framework for mining provenance in scientific workflows. PoeM allows to produce linked in silico experiment reports based on workflow runs. PoeM leverages semantic web technologies and reference vocabularies (PROV-O, P-Plan) to generate provenance mining rules and finally assemble linked scientific experiment reports (Micropublications, Experimental Factor Ontology). Preliminary experiments demonstrate that PoeM enables the querying and sharing of Galaxy 1-processed genomic data as 5-star linked datasets.

During collaborative writing, shared documents are replicated on geographically distant sites. Ea... more During collaborative writing, shared documents are replicated on geographically distant sites. Each user works on an individual copy. This results in divergent copies. Merging techniques such as those proposed by the Operational Transformation (OT) approach reconcile the differences among the replicas and ensure their convergence. Although the merging techniques resolve conflicting syntax, they do not help preserve coherence which is an important aspect of an effective document. Therefore, we investigate the use of ideas from narrative-based writing to improve the coherence of the document during collaborative editing. Narrativebased writing is a new technique for planning documents that enhances the implicit story conveyed by a document to the readers; thereby improving coherence. This paper presents a discussion of this investigation.

HAL (Le Centre pour la Communication Scientifique Directe), May 28, 2017

PROV has been adopted by a number of workflow systems for encoding the traces of workflow executi... more PROV has been adopted by a number of workflow systems for encoding the traces of workflow executions. Exploiting these provenance traces is hampered by two main impediments. Firstly, workflow systems extend PROV differently to cater for system-specific constructs. The difference between the adopted PROV extensions yields heterogeneity in the generated provenance traces. This heterogeneity diminishes the value of such traces, e.g. when combining and querying provenance traces of different workflow systems. Secondly, the provenance recorded by workflow systems tends to be large, and as such difficult to browse and understand by a human user. In this paper, we propose SHARP, a Linked Data approach for harmonizing cross-workflow provenance. The harmonization is performed by chasing tuple-generating and equalitygenerating dependencies defined for workflow provenance. This results in a provenance graph that can be summarized using domain-specific vocabularies. We experimentally evaluate the effectiveness of SHARP using a real-world omic experiment involving workflow traces generated by the Taverna and Galaxy systems.

Dans cette these, nous nous interessons a la mise en oeuvre de mecanismes pour supporter le trava... more Dans cette these, nous nous interessons a la mise en oeuvre de mecanismes pour supporter le travail cooperatif dans les environnements de developpement de logiciels. Dans de tels environnements, il est indispensable de gerer les acces concurrents aux donnees partagees et la coherence des donnees, ce qui permet de cooperer dans des bonnes conditions. Notre travail s'inscrit dans le cadre de l'environnement de developpement cooperatif COO. Actuellement, COO repose sur un modele transactionnel pour resoudre les problemes dus aux acces concurrents mais ne fournit aucune garantie sur la qualite des produits. Nous proposons de definir des contraintes semantiques sur les produits et leur procedes de fabrication pour avoir une telle garantie. Mais les mecanismes classiques de verification des contraintes ne sont pas compatibles avec la nature cooperative des activites de developpement. Pour cela, nous proposons un nouveau mecanisme de verification des contraintes dans un environnement cooperatif. Nos resultats sont les suivants : l'approche hybride Maizena qui utilise la semantique pour restreindre les executions acceptees par le protocole syntaxique de correction des interactions de COO et permet ainsi de confiner la cooperation dans une sphere de securite ; un algorithme pour gerer la coherence des produits logiciels. Cet algorithme verifie de contraintes definies a l'aide des formules de logique temporelle. Cet algorithme est valide par une implantation au sein de COO ; un mecanisme de recouvrement en avant permettant de separer une activite en plusieurs et de profiter de la cooperation pour valider les contraintes.

Proceedings of the 9th International Conference on Computer Supported Education, 2017

The professional development presents many difficulties related to speed of change and the explos... more The professional development presents many difficulties related to speed of change and the explosion of knowledge that requires people to learn at many intervals throughout their lives. This study proposes a combined Self-Regulated Learning Process, functional and technical architectures in a Lifelong Learning perspective. The Self-Regulated Learning is carried out using Semantic Open Learner Models. We illustrate our process through some services examples. This work is dedicated to the Lifelong Learning active community and more specifically to researchers in Technology Enhanced Learning, pedagogical engineers, and learners who meets difficulties in integrating multidisciplinary expertise, technology and know-how throughout their life.

Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18, 2018

ACM Reference Format: Serge Garlatti, Jean Marie Gilliot, Sacha Kieffer, Jérôme Eneau, Geneviève ... more ACM Reference Format: Serge Garlatti, Jean Marie Gilliot, Sacha Kieffer, Jérôme Eneau, Geneviève Lameul, Patricia Serrena-Alvadaro, Hala Skaf-Molli, and Emmanuel Desmontils. 2018. Open Learner Models, Trust and Knowledge Management for Life Long Learning: . InWWW ’18 Companion: The 2018 Web Conference Companion, April 23–27, 2018, Lyon, France, Jennifer B. Sartor, Theo D’Hondt, andWolfgang DeMeuter (Eds.). ACM, New York, NY, USA, Article 4, 2 pages. https://doi.org/10.1145/3184558.3193129

The “SElf-Data for Enhancing Lifelong learning Autonomy” (SEDELA) project started in September 20... more The “SElf-Data for Enhancing Lifelong learning Autonomy” (SEDELA) project started in September 2017. It gathers researchers from both IT and education science fields. The project aims to enhance learner’s autonomy skills in a lifelong learning perspective as well as develop, experiment and implement an innovative self-data management approach (El Mawas et al., 2017). Autonomy in adult education is defined as “the ability to take charge in one’s learning” (Holec, 1981, p.3), meaning specifically “determining the objectives; defining the contents and progressions; selecting methods and techniques to be used; monitoring the procedure of acquisition properly speaking (rhythm, time, place, etc.); evaluating what has been acquired” (Holec, 1981, p.3). Autonomous learners must have the capacity for critical reflection, decision making, and independent action (Little, 1991). But independence does not mean isolation as others often constitute resources for autonomous learners. Autonomy is co...

Semantic Web, 2020

While workflow systems have improved the repeatability of scientific experiments, the value of th... more While workflow systems have improved the repeatability of scientific experiments, the value of the processed (intermediate) data have been overlooked so far. In this paper, we argue that the intermediate data products of workflow executions should be seen as first-class objects that need to be curated and published. Not only will this be exploited to save time and resources needed when re-executing workflows, but more importantly, it will improve the reuse of data products by the same or peer scientists in the context of new hypotheses and experiments. To assist curator in annotating (intermediate) workflow data, we exploit in this work multiple sources of information, namely: i) the provenance information captured by the workflow system, and ii) domain annotations that are provided by tools registries, such as Bio.Tools. Furthermore, we show, on a concrete bioinformatics scenario, how summarisation techniques can be used to reduce the machine-generated provenance information of such data products into concise human-and machine-readable annotations.

Lecture Notes in Computer Science, 2018

Following the Triple Pattern Fragments (TPF) approach, intelligent clients are able to improve th... more Following the Triple Pattern Fragments (TPF) approach, intelligent clients are able to improve the availability of the Linked Data. However, data availability is still limited by the availability of TPF servers. Although some existing TPF servers belonging to different organizations already replicate the same datasets, existing intelligent clients are not able to take advantage of replicated data to provide fault tolerance and load-balancing. In this paper, we propose Ulysses, an intelligent TPF client that takes advantage of replicated datasets to provide fault tolerance and load-balancing. By reducing the load on a server, Ulysses improves the overall Linked Data availability and reduces data hosting cost for organizations. Ulysses relies on an adaptive clientside load-balancer and a cost-model to distribute the load among heterogeneous replicated TPF servers. Experimentations demonstrate that Ulysses reduces the load of TPF servers, tolerates failures and improves queries execution time in case of heavy loads on servers.

The growing importance of exchanges and collaborations in all business areas calls for fast, effi... more The growing importance of exchanges and collaborations in all business areas calls for fast, efficient, and flexible models of computersupported cooperation. As the traditionnal client-server paradigm is hampered by its structural limitations, the interest of the Information System Community is aroused by the promises of alternatives known as peer-topeer approaches. This paper introduces a generic architecture, designed for the execution of collaborative business processes. Basic motivations of the model are discussed, as well as major contributions notably in terms of service-oriented routing and failure handling.

We are interested in collaborative writing. In this report, we take Wikipedia as an example on co... more We are interested in collaborative writing. In this report, we take Wikipedia as an example on collaborative writing and study it in order to learn lessons. Wikipedia is a collaborative project with a goal to create a free, multilingual encyclopedia on the web. Wikipedia is based on web servers thar use the Wiki technology. A wiki is a software that allows users to create, edit, and link web pages easily. Wikipedia is written by collaboration between voluntaries. It has a fundamental principle, the neutral point of view. This principle recommends representing fairly, and as far as possible without bias, all significant views that have been published by reliable sources. Articles in Wikipedia are published under Free Documentation License GNU, the content of a page, can be freely copied, modified and redistributed. All new version is published under the same licence and must indicate Wikipedia as resource. The simplicity of editing in Wikipedia is a powerful point leading to its growing success. In the first section, we present the principal functionalities of Wikipedia, in order to be familiarized with this environment. In section two, we are interested in the community of Wikipedia and in differents status of its users. In the third section, we define the processes of Wikipedia with their associated activities. In the last section we present different types of editing conflicts in Wikipedia and their resolution.