Vassilis Christophides - Academia.edu (original) (raw)
Papers by Vassilis Christophides
HAL (Le Centre pour la Communication Scientifique Directe), Mar 15, 2022
HAL (Le Centre pour la Communication Scientifique Directe), Jan 18, 2018
arXiv (Cornell University), Sep 13, 2022
Journal of the ACM, Feb 12, 2016
The VLDB Journal
Real-time detection of anomalies in streaming data is receiving increasing attention as it allows... more Real-time detection of anomalies in streaming data is receiving increasing attention as it allows us to raise alerts, predict faults, and detect intrusions or threats across industries. Yet, little attention has been given to compare the effectiveness and efficiency of anomaly detectors for streaming data (i.e., of online algorithms). In this paper, we present a qualitative, synthetic overview of major online detectors from different algorithmic families (i.e., distance, density, tree or projection-based) and highlight their main ideas for constructing, updating and testing detection models. Then, we provide a thorough analysis of the results of a quantitative experimental evaluation of online detection algorithms along with their offline counterparts. The behavior of the detectors is correlated with the characteristics of different datasets (i.e., meta-features), thereby providing a meta-level analysis of their performance. Our study addresses several missing insights from the literature such as (a) how reliable are detectors against a random classifier and what dataset characteristics make them perform randomly; (b) to what extent online detectors approximate the performance of offline counterparts; (c) which sketch strategy and update primitives of detectors are best to detect anomalies visible only within a feature subspace of a dataset; (d) what are the tradeoffs between the effectiveness and the efficiency of detectors belonging to different algorithmic families; (e) which specific characteristics of datasets yield an online algorithm to outperform all others.
2021 IEEE International Joint Conference on Biometrics (IJCB), 2021
Liveness Detection (LivDet)-Face is an international competition series open to academia and indu... more Liveness Detection (LivDet)-Face is an international competition series open to academia and industry. The competition's objective is to assess and report state-of-theart in liveness / Presentation Attack Detection (PAD) for face recognition. Impersonation and presentation of false samples to the sensors can be classified as presentation attacks and the ability for the sensors to detect such attempts is known as PAD. LivDet-Face 2021 * will be the first edition of the face liveness competition. This competition serves as an important benchmark in face presentation attack detection, offering (a) an independent assessment of the current state of the art in face PAD, and (b) a common evaluation protocol, availability of Presentation Attack Instruments (PAI) and live face image dataset through the Biometric Evaluation and Testing (BEAT) platform. The competition can be easily followed by researchers after it is closed, in a platform in which participants can compare their solutions against the LivDet-Face winners.
Procedia Computer Science, 2016
Ubiquitous smart technologies gradually transform modern homes into Intranet of Things, where a m... more Ubiquitous smart technologies gradually transform modern homes into Intranet of Things, where a multitude of connected devices allow for novel home automation services (e.g., energy or bandwidth savings, comfort enhancement, etc.). Optimizing and enriching the Quality of Experience (QoE) of residential users emerges as a critical differentiator for Internet and Communication Service providers (ISPs and CSPs, respectively) and heavily relies on the analysis of various kinds of data (connectivity, performance, usage) gathered from home networks. In this paper, we are interested in new Machine-to-Machine data analysis techniques that go beyond binary association rule mining for traditional market basket analysis considered by previous works, to analyze individual device logs of home gateways. Based on multidimensional patterns mining framework, we extract complex device co-usage patterns of 201 residential broadband users of an ISP, subscribed to a triple-play service. Such fine-grained device usage patterns provide valuable insights for emerging use cases such as an adaptive usage of home devices, and also "things" recommendation.
2015 IEEE International Conference on Big Data (Big Data), 2015
In the Web of data, entities are described by interlinked data rather than documents on the Web. ... more In the Web of data, entities are described by interlinked data rather than documents on the Web. In this work, we focus on entity resolution in the Web of data, i.e., identifying descriptions that refer to the same real-world entity. To reduce the required number of pairwise comparisons, methods for entity resolution perform blocking as a pre-processing step. A blocking technique places similar entity descriptions into blocks and executes comparisons only between descriptions within the same block. We experimentally evaluate blocking techniques proposed for the Web of data and present dataset characteristics that determine the effectiveness and efficiency of such methods. Furthermore, we analyze the characteristics of the missed matching entity descriptions and examine different types of links that blocking techniques can potentially identify.
Lecture Notes in Computer Science
We present the architecture of a largely distributed Digital Library that is based on the Peer-to... more We present the architecture of a largely distributed Digital Library that is based on the Peer-to-Peer computing paradigm. The three goals of the architecture are:(i) increased node autonomy,(ii) flexible location of data, and (iii) efficient query evaluation. To satisfy these goals we propose a solution based on schema mappings and query reformulation. We identify the problems involved in developing a system based on the proposed architecture and present ways of tackling them. A prototype implementation provides encouraging ...
Abstract. We consider the problem of storing and accessing documents (SGML and HTML, in particula... more Abstract. We consider the problem of storing and accessing documents (SGML and HTML, in particular) using database technology. To specify the database image of documents, we use structuring schemas that consist in grammars annotated with database programs. To query documents, we introduce an extension of OQL, the ODMG standard query language for object databases. Our extension (named OQL-doc) allows to query documents without a precise knowledge of their structure using in particular generalized path expressions and ...
ieeexplore.ieee.org
Rama Akkiraju, IBM TJ Watson Research, USA Grigoris Antoniou, University of Crete/Institute of Co... more Rama Akkiraju, IBM TJ Watson Research, USA Grigoris Antoniou, University of Crete/Institute of Computer Science FORTH, Greece Mikio Aoyama, Nanzan University, Japan Ali Arsanjani, IBM Global Services, USA Malcolm Atkinson, University of Edinburgh, UK Boualem Benatallah, University of New South Wales, Australia Elisa Bertino, Purdue University, USA Ken Birman, Cornell University, USA Athman Bouguettaya, Virginia Tech., USA Paul Buhler, College of Charleston, Charleston, SC USA Christoph Bussler, Cisco Systems, USA Jorge ...
ACM Transactions on the Web, 2011
The ability to compute the differences that exist between two RDF/S Knowledge Bases (KB) is an im... more The ability to compute the differences that exist between two RDF/S Knowledge Bases (KB) is an important step to cope with the evolving nature of the Semantic Web (SW). In particular, RDF/S deltas can be employed to reduce the amount of data that need to be exchanged and managed over the network in order to build SW synchronization and versioning services. By considering deltas as sets of change operations, in this article we introduce various RDF/S differential functions which take into account inferred knowledge from an RDF/S knowledge base. We first study their correctness in transforming a source to a target RDF/S knowledge base in conjunction with the semantics of the employed change operations (i.e., with or without side-effects on inferred knowledge). Then we formally analyze desired properties of RDF/S deltas such as size minimality, semantic identity, redundancy elimination, reversibility, and composability, as well as identify those RDF/S differential functions that satisf...
Proceedings of the ACM SIGCOMM 2018 Conference on Posters and Demos, 2018
Page load time (PLT) is still the most common application Quality of Service (QoS) metric to esti... more Page load time (PLT) is still the most common application Quality of Service (QoS) metric to estimate the Quality of Experience (QoE) of Web users. Yet, recent literature abounds with interesting proposals for alternative metrics (e.g., Above The Fold, SpeedIndex and variants) that aim at closely capturing how users perceive the Webpage rendering process. However, these novel metrics are typically computationally expensive, as they require to monitor and post-process videos of the rendering process, and have failed to be widely deployed. In this demo, we show our implementation of an opensource Chrome extension that implements a practical and lightweight method to measure the approximated Above-the-Fold (AATF) time, as well as others Web performance metrics. The idea is, instead of accurately monitoring the rendering output, to track the download time of the last visible object on screen (i.e., "above the fold"). Our plugin also has options to save detailed reports for later analysis, a functionality ideally suited for researchers wanting to gather data from Web experiments.
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific ... more HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. 1
Online social networking sites carry a lot of information organized around the people who actuall... more Online social networking sites carry a lot of information organized around the people who actually submit it. They rely a lot on the interconnections between the actors of the network and they relate the generated information in terms of connections or ties among those actors. On the other hand, social networks carry a lot of information in real time or almost real time since people "report" information as they see it unfolding before their eyes. We are targeting this information and we try to understand when this information can be related to refer to a single event, which has both a spatial and a temporal dimension. These social sensors can alert us on what is happening in the society when it happens and by aggregating the different reports from people who are on the scene. In order to achieve that we suggest and describe a set of services that can be used for collection of the information, identifying the discussion topics and provide at the end an alert if the discussi...
Passive and Active Measurement, 2018
Page load time (PLT) is still the most common application Quality of Service (QoS) metric to esti... more Page load time (PLT) is still the most common application Quality of Service (QoS) metric to estimate the Quality of Experience (QoE) of Web users. Yet, recent literature abounds with proposals for alternative metrics (e.g., Above The Fold, SpeedIndex and variants) that aim at better estimating user QoE. The main purpose of this work is thus to thoroughly investigate a mapping between established and recently proposed objective metrics and user QoE. We obtain ground truth QoE via user experiments where we collect and analyze 3,400 Web accesses annotated with QoS metrics and explicit user ratings in a scale of 1 to 5, which we make available to the community. In particular, we contrast domain expert models (such as ITU-T and IQX) fed with a single QoS metric, to models trained using our ground-truth dataset over multiple QoS metrics as features. Results of our experiments show that, albeit very simple, expert models have a comparable accuracy to machine learning approaches. Furthermore, the model accuracy improves considerably when building per-page QoE models, which may raise scalability concerns as we discuss.
Proceedings of the 2016 workshop on QoE-based Analysis and Management of Data Communication Networks, 2016
The relationship between the user perceived Quality of Experience (QoE) with Internet application... more The relationship between the user perceived Quality of Experience (QoE) with Internet applications and the Quality of Service (QoS) of the underlying network and applications is complex. Unveiling statistical relations between QoE and QoS can boost the prediction and diagnosis of QoE. In this paper, we shed light on the relationship between QoE and QoS for a popular application: YouTube video streaming. We conducted a controlled study where we asked users to rate their perceived quality of YouTube videos under different network conditions. During this experiments, we also captured network QoS and application QoS. We then analyze the resulting dataset with SES, a feature selection algorithm that identifies minimal-size, statisticallyequivalent signatures with maximal predictive power for a target variable (e.g., QoE). We found that we can build optimal QoE predictors using a minimal signature of only three features from application or network QoS metrics compared to four when we consider features from both layers.
International Journal of Web-Based Learning and Teaching Technologies, 2007
This article elaborates on scenarios for collaborative knowledge creation in the spirit of the tr... more This article elaborates on scenarios for collaborative knowledge creation in the spirit of the trialogical learning paradigm. According to these scenarios, the group knowledge base is formed by combining the knowledge bases of the participants, according to various methods. The provision of flexible methods for defining various aspects of the group knowledge is expected to enhance synergy in the knowledge creation process and could lead to the development of tools that overcome the inelasticities of the current knowledge creation practices. Subsequently, these scenarios are projected to various knowledge representation frameworks and for each one of them, we analyze and discuss related techniques and identify issues that are worth further research.
HAL (Le Centre pour la Communication Scientifique Directe), Mar 15, 2022
HAL (Le Centre pour la Communication Scientifique Directe), Jan 18, 2018
arXiv (Cornell University), Sep 13, 2022
Journal of the ACM, Feb 12, 2016
The VLDB Journal
Real-time detection of anomalies in streaming data is receiving increasing attention as it allows... more Real-time detection of anomalies in streaming data is receiving increasing attention as it allows us to raise alerts, predict faults, and detect intrusions or threats across industries. Yet, little attention has been given to compare the effectiveness and efficiency of anomaly detectors for streaming data (i.e., of online algorithms). In this paper, we present a qualitative, synthetic overview of major online detectors from different algorithmic families (i.e., distance, density, tree or projection-based) and highlight their main ideas for constructing, updating and testing detection models. Then, we provide a thorough analysis of the results of a quantitative experimental evaluation of online detection algorithms along with their offline counterparts. The behavior of the detectors is correlated with the characteristics of different datasets (i.e., meta-features), thereby providing a meta-level analysis of their performance. Our study addresses several missing insights from the literature such as (a) how reliable are detectors against a random classifier and what dataset characteristics make them perform randomly; (b) to what extent online detectors approximate the performance of offline counterparts; (c) which sketch strategy and update primitives of detectors are best to detect anomalies visible only within a feature subspace of a dataset; (d) what are the tradeoffs between the effectiveness and the efficiency of detectors belonging to different algorithmic families; (e) which specific characteristics of datasets yield an online algorithm to outperform all others.
2021 IEEE International Joint Conference on Biometrics (IJCB), 2021
Liveness Detection (LivDet)-Face is an international competition series open to academia and indu... more Liveness Detection (LivDet)-Face is an international competition series open to academia and industry. The competition's objective is to assess and report state-of-theart in liveness / Presentation Attack Detection (PAD) for face recognition. Impersonation and presentation of false samples to the sensors can be classified as presentation attacks and the ability for the sensors to detect such attempts is known as PAD. LivDet-Face 2021 * will be the first edition of the face liveness competition. This competition serves as an important benchmark in face presentation attack detection, offering (a) an independent assessment of the current state of the art in face PAD, and (b) a common evaluation protocol, availability of Presentation Attack Instruments (PAI) and live face image dataset through the Biometric Evaluation and Testing (BEAT) platform. The competition can be easily followed by researchers after it is closed, in a platform in which participants can compare their solutions against the LivDet-Face winners.
Procedia Computer Science, 2016
Ubiquitous smart technologies gradually transform modern homes into Intranet of Things, where a m... more Ubiquitous smart technologies gradually transform modern homes into Intranet of Things, where a multitude of connected devices allow for novel home automation services (e.g., energy or bandwidth savings, comfort enhancement, etc.). Optimizing and enriching the Quality of Experience (QoE) of residential users emerges as a critical differentiator for Internet and Communication Service providers (ISPs and CSPs, respectively) and heavily relies on the analysis of various kinds of data (connectivity, performance, usage) gathered from home networks. In this paper, we are interested in new Machine-to-Machine data analysis techniques that go beyond binary association rule mining for traditional market basket analysis considered by previous works, to analyze individual device logs of home gateways. Based on multidimensional patterns mining framework, we extract complex device co-usage patterns of 201 residential broadband users of an ISP, subscribed to a triple-play service. Such fine-grained device usage patterns provide valuable insights for emerging use cases such as an adaptive usage of home devices, and also "things" recommendation.
2015 IEEE International Conference on Big Data (Big Data), 2015
In the Web of data, entities are described by interlinked data rather than documents on the Web. ... more In the Web of data, entities are described by interlinked data rather than documents on the Web. In this work, we focus on entity resolution in the Web of data, i.e., identifying descriptions that refer to the same real-world entity. To reduce the required number of pairwise comparisons, methods for entity resolution perform blocking as a pre-processing step. A blocking technique places similar entity descriptions into blocks and executes comparisons only between descriptions within the same block. We experimentally evaluate blocking techniques proposed for the Web of data and present dataset characteristics that determine the effectiveness and efficiency of such methods. Furthermore, we analyze the characteristics of the missed matching entity descriptions and examine different types of links that blocking techniques can potentially identify.
Lecture Notes in Computer Science
We present the architecture of a largely distributed Digital Library that is based on the Peer-to... more We present the architecture of a largely distributed Digital Library that is based on the Peer-to-Peer computing paradigm. The three goals of the architecture are:(i) increased node autonomy,(ii) flexible location of data, and (iii) efficient query evaluation. To satisfy these goals we propose a solution based on schema mappings and query reformulation. We identify the problems involved in developing a system based on the proposed architecture and present ways of tackling them. A prototype implementation provides encouraging ...
Abstract. We consider the problem of storing and accessing documents (SGML and HTML, in particula... more Abstract. We consider the problem of storing and accessing documents (SGML and HTML, in particular) using database technology. To specify the database image of documents, we use structuring schemas that consist in grammars annotated with database programs. To query documents, we introduce an extension of OQL, the ODMG standard query language for object databases. Our extension (named OQL-doc) allows to query documents without a precise knowledge of their structure using in particular generalized path expressions and ...
ieeexplore.ieee.org
Rama Akkiraju, IBM TJ Watson Research, USA Grigoris Antoniou, University of Crete/Institute of Co... more Rama Akkiraju, IBM TJ Watson Research, USA Grigoris Antoniou, University of Crete/Institute of Computer Science FORTH, Greece Mikio Aoyama, Nanzan University, Japan Ali Arsanjani, IBM Global Services, USA Malcolm Atkinson, University of Edinburgh, UK Boualem Benatallah, University of New South Wales, Australia Elisa Bertino, Purdue University, USA Ken Birman, Cornell University, USA Athman Bouguettaya, Virginia Tech., USA Paul Buhler, College of Charleston, Charleston, SC USA Christoph Bussler, Cisco Systems, USA Jorge ...
ACM Transactions on the Web, 2011
The ability to compute the differences that exist between two RDF/S Knowledge Bases (KB) is an im... more The ability to compute the differences that exist between two RDF/S Knowledge Bases (KB) is an important step to cope with the evolving nature of the Semantic Web (SW). In particular, RDF/S deltas can be employed to reduce the amount of data that need to be exchanged and managed over the network in order to build SW synchronization and versioning services. By considering deltas as sets of change operations, in this article we introduce various RDF/S differential functions which take into account inferred knowledge from an RDF/S knowledge base. We first study their correctness in transforming a source to a target RDF/S knowledge base in conjunction with the semantics of the employed change operations (i.e., with or without side-effects on inferred knowledge). Then we formally analyze desired properties of RDF/S deltas such as size minimality, semantic identity, redundancy elimination, reversibility, and composability, as well as identify those RDF/S differential functions that satisf...
Proceedings of the ACM SIGCOMM 2018 Conference on Posters and Demos, 2018
Page load time (PLT) is still the most common application Quality of Service (QoS) metric to esti... more Page load time (PLT) is still the most common application Quality of Service (QoS) metric to estimate the Quality of Experience (QoE) of Web users. Yet, recent literature abounds with interesting proposals for alternative metrics (e.g., Above The Fold, SpeedIndex and variants) that aim at closely capturing how users perceive the Webpage rendering process. However, these novel metrics are typically computationally expensive, as they require to monitor and post-process videos of the rendering process, and have failed to be widely deployed. In this demo, we show our implementation of an opensource Chrome extension that implements a practical and lightweight method to measure the approximated Above-the-Fold (AATF) time, as well as others Web performance metrics. The idea is, instead of accurately monitoring the rendering output, to track the download time of the last visible object on screen (i.e., "above the fold"). Our plugin also has options to save detailed reports for later analysis, a functionality ideally suited for researchers wanting to gather data from Web experiments.
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific ... more HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. 1
Online social networking sites carry a lot of information organized around the people who actuall... more Online social networking sites carry a lot of information organized around the people who actually submit it. They rely a lot on the interconnections between the actors of the network and they relate the generated information in terms of connections or ties among those actors. On the other hand, social networks carry a lot of information in real time or almost real time since people "report" information as they see it unfolding before their eyes. We are targeting this information and we try to understand when this information can be related to refer to a single event, which has both a spatial and a temporal dimension. These social sensors can alert us on what is happening in the society when it happens and by aggregating the different reports from people who are on the scene. In order to achieve that we suggest and describe a set of services that can be used for collection of the information, identifying the discussion topics and provide at the end an alert if the discussi...
Passive and Active Measurement, 2018
Page load time (PLT) is still the most common application Quality of Service (QoS) metric to esti... more Page load time (PLT) is still the most common application Quality of Service (QoS) metric to estimate the Quality of Experience (QoE) of Web users. Yet, recent literature abounds with proposals for alternative metrics (e.g., Above The Fold, SpeedIndex and variants) that aim at better estimating user QoE. The main purpose of this work is thus to thoroughly investigate a mapping between established and recently proposed objective metrics and user QoE. We obtain ground truth QoE via user experiments where we collect and analyze 3,400 Web accesses annotated with QoS metrics and explicit user ratings in a scale of 1 to 5, which we make available to the community. In particular, we contrast domain expert models (such as ITU-T and IQX) fed with a single QoS metric, to models trained using our ground-truth dataset over multiple QoS metrics as features. Results of our experiments show that, albeit very simple, expert models have a comparable accuracy to machine learning approaches. Furthermore, the model accuracy improves considerably when building per-page QoE models, which may raise scalability concerns as we discuss.
Proceedings of the 2016 workshop on QoE-based Analysis and Management of Data Communication Networks, 2016
The relationship between the user perceived Quality of Experience (QoE) with Internet application... more The relationship between the user perceived Quality of Experience (QoE) with Internet applications and the Quality of Service (QoS) of the underlying network and applications is complex. Unveiling statistical relations between QoE and QoS can boost the prediction and diagnosis of QoE. In this paper, we shed light on the relationship between QoE and QoS for a popular application: YouTube video streaming. We conducted a controlled study where we asked users to rate their perceived quality of YouTube videos under different network conditions. During this experiments, we also captured network QoS and application QoS. We then analyze the resulting dataset with SES, a feature selection algorithm that identifies minimal-size, statisticallyequivalent signatures with maximal predictive power for a target variable (e.g., QoE). We found that we can build optimal QoE predictors using a minimal signature of only three features from application or network QoS metrics compared to four when we consider features from both layers.
International Journal of Web-Based Learning and Teaching Technologies, 2007
This article elaborates on scenarios for collaborative knowledge creation in the spirit of the tr... more This article elaborates on scenarios for collaborative knowledge creation in the spirit of the trialogical learning paradigm. According to these scenarios, the group knowledge base is formed by combining the knowledge bases of the participants, according to various methods. The provision of flexible methods for defining various aspects of the group knowledge is expected to enhance synergy in the knowledge creation process and could lead to the development of tools that overcome the inelasticities of the current knowledge creation practices. Subsequently, these scenarios are projected to various knowledge representation frameworks and for each one of them, we analyze and discuss related techniques and identify issues that are worth further research.