Sioutas Spyros | Ionian University (original) (raw)

Papers by Sioutas Spyros

Research paper thumbnail of Large Scale Sentiment Analysis on Twitter with Spark

Sentiment analysis on Twitter data has attracted much attention recently. One of the system’s key... more Sentiment analysis on Twitter data has attracted much attention recently. One of the system’s key features, is the immediacy in communication with other users in an easy, user-friendly and fast way. Consequently, people tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide diversity of topics. This amount of information oers huge potential and can be harnessed to receive the sentiment tendency towards these topics. However, since none can invest an innite amount of time to read through these tweets, an automated decision making approach is necessary. Nevertheless, most existing solutions are limited in centralized environments only. Thus, they can only process at most a few thousand tweets. Such a sample, is not representative to dene the sentiment polarity towards a topic due to the massive number of tweets published daily. In this paper, we go one step further and develop a novel method for sentiment le...

Research paper thumbnail of New Dynamic Balanced Search Trees with Worst-Case Constant Update Time

J. Autom. Lang. Comb., 2003

We present new search trees with worst-case O(1) update time and O(log n) search time, storing n ... more We present new search trees with worst-case O(1) update time and O(log n) search time, storing n elements in linear space, in the Pointer Machine (PM) model of computation. In addition, these trees can easily support finger searches in time O(log d) and update operations in worst-case O(log* n) time. The parameter d represents the number of elements (distance) between the search element and an element pointed to by a pointer termed finger. Our data structure is based on a previous result by Fleischer that exhibits the same asymptotic time and space complexities for simple search trees. We improve on this result by handling deletions in an explicit way without using the standard trick of global rebuilding. This is the first search tree that combines worst-case update times with a local rebalancing scheme without using global rebuilding to tackle deletions. In addition, insight is acquired from the construction of these trees as to why deletions are considered more difficult than inse...

Research paper thumbnail of Rectangle Enclosure Reporting in Linear Space Revisited

J. Autom. Lang. Comb., 2003

We present a new algorithm for reporting all the enclosures in a set of plane rectangles in O(n l... more We present a new algorithm for reporting all the enclosures in a set of plane rectangles in O(n log n log log n + k log log n) time and linear space (k denotes the output size). The result is already known (it has already been achieved by two previous papers), however the proposed algorithm follows a different approach.

Research paper thumbnail of Mobile Data Fusion from Multiple Tracking Sensors to Augment Maritime Safety: Mobile Detection, Early Identification, and Tracking of Moving Objects

2015 IEEE International Conference on Mobile Services, 2015

We present a novel mobile solution to provide efficient services for data fusion towards augmenti... more We present a novel mobile solution to provide efficient services for data fusion towards augmenting maritime safety. It is the first, to the best authors knowledge, complex and integrated fusion of data coming from multiple mobile and typical tracking sensors (e.g. Low weight/high performance radar, position transmission mechanisms and electro-optic/systems and hyper-spectral sensors). The key aim is to assist the detection and early identification and tracking of moving targets (e.g. With moving target indication and data fusion/correlation capabilities), as well as methods for obstacle detection and maritime surveillance. Maritime mobile services have to exchange real time data with multiple maritime/aerial assets no matter where the operation is conducted (close to harbors or at the open sea) and independently from the existing surveillance infrastructure. The proposed approach is also useful for the detection of marine pollution incidents. This innovative single window solution presents high efficiency, low operational costs' profiles and possibly contributes to standardization in construction as it utilizes typical tracking infrastructure and typical mobile devices (smartphones and tablets/ipads) & PCs.

Research paper thumbnail of An information system for the effective management of ambulances

Proceedings 13th IEEE Symposium on Computer-Based Medical Systems. CBMS 2000

In this paper, we describe a system offering a solution to the problem of ambulance management an... more In this paper, we describe a system offering a solution to the problem of ambulance management and emergency incident handling in the prefecture of Attica in Greece. It is based on a Geographic Information System (GIS) coupled with Global Positioning System (GPS) and Global System for Mobile Communication (GSM) technologies. The system's operation is expected to minimize the ambulances' response time. Consequently, there will be a drastic improvement in the way emergency incidents are being handled. This fact will thus significantly affect the quality of health services offered to citizens.

Research paper thumbnail of On-line consistent ranking on e-recruitment: seeking the truth behind a well-formed CV

Artificial Intelligence Review, 2013

In this work we present a novel approach for evaluating job applicants in online recruitment syst... more In this work we present a novel approach for evaluating job applicants in online recruitment systems, using machine learning algorithms to solve the candidate ranking problem and performing semantic matching techniques. An application of our approach is implemented in the form of a prototype system, whose functionality is showcased and evaluated in a real-world recruitment scenario. The proposed system extracts a set of objective criteria from the applicants' LinkedIn profile, and compares them semantically to the job's prerequisites. It also infers their personality characteristics using linguistic analysis on their blog Author names appear in alphabetical order.

Research paper thumbnail of Taxonomy Development and Its Impact on a Self-learning e-Recruitment System

IFIP Advances in Information and Communication Technology, 2012

In this work we present a novel approach for evaluating job applicants in online recruitment syst... more In this work we present a novel approach for evaluating job applicants in online recruitment systems, using machine learning algorithms to solve the candidate ranking problem and performing semantic matching techniques. An application of our approach is implemented in the form of a prototype system, whose functionality is showcased and evaluated in a real-world recruitment scenario. The proposed system extracts a set of objective criteria from the applicants' LinkedIn profile, and compares them semantically to the job's prerequisites. It also infers their personality characteristics using linguistic analysis on their blog posts. Our system was found to perform consistently compared to human recruiters, thus it can be trusted for the automation of applicant ranking and personality mining.

Research paper thumbnail of Mining the Conceptual Model of Open Source CMS Using a Reverse Engineering Approach

Communications in Computer and Information Science, 2013

Model-driven engineering has become the emerging standard for software development focusing on th... more Model-driven engineering has become the emerging standard for software development focusing on the use of models as first-class citizens. One possible field of application of such model-driven approaches can be the open source Content Management Systems (CMS) domain. Typically, CMS are built using the source-code-oriented software development process raising issues related to usability, performance and other qualities of service in an application's lifecycle. To overcome these issues, the use of model-driven approaches in the development of CMS-based web applications (WAs) can be particular beneficial. To this end, we propose a model-driven reverse engineering approach for automatic mining of the conceptual model of existing WAs developed using the widely used CMS Joomla! by applying data mining techniques. This methodology can be used to form the cornerstone of an evaluation framework for Joomla!-based WAs either in the design or maintenance process.

Research paper thumbnail of TECHNICAL REPORT No. TR99/10/04

Research paper thumbnail of 2-D Spatial Indexing Scheme in Optimal Time

Lecture Notes in Computer Science, 2000

J. Štuller et al. (Eds.): ADBIS-DASFAA 2000, LNCS 1884, pp. 107–116, 2000. © Springer-Verlag Berl... more J. Štuller et al. (Eds.): ADBIS-DASFAA 2000, LNCS 1884, pp. 107–116, 2000. © Springer-Verlag Berlin Heidelberg 2000 ... 2-D Spatial Indexing Scheme in Optimal Time ... Nectarios Kitsios1, Christos Makris1, Spyros Sioutas1, Athanassios Tsakalidis1,2, John Tsaknakis1, ...

Research paper thumbnail of Personalized selection of web services for mobile environments

Proceedings of the International Conference on Management of Emergent Digital EcoSystems - MEDES '09, 2009

In this paper we discuss the integration of QoS-aware search and personalization algorithms in or... more In this paper we discuss the integration of QoS-aware search and personalization algorithms in order to discover effectual Web Services for the case of mobile web users. We present a number of novel ranking algorithms specially designed for mobile web architectures. To validate and evaluate the proposed algorithms we developed a fully working prototype for mobile-PDA devices. The prototype enables PDA users to access electronic shops and to retrieve information about their products, while going shopping. Comparative experimental results have shown encouraging results that prove m-scroutz to be effective.

Research paper thumbnail of Indexing Techniques for Spatiotemporal Databases

Encyclopedia of Information Science and Technology, Second Edition

We can define as spatiotemporal any database that maintains objects with geometric properties tha... more We can define as spatiotemporal any database that maintains objects with geometric properties that change over time, where usual geometric properties are the spatial position and spatial extent of an object in a specific d-dimensional space. The need to use spatiotemporal databases appears in a variety of applications such as intelligent transportation systems, cellular communications, and meteorology monitoring. This field of database research collaborates tightly with other research areas such as mobile telecommunications, and is harmonically integrated with other disciplines such as CAD/CAM, GIS, environmental science, and bioinformatics. Spatiotemporal databases stand at the crossroad of two other database research areas: spatial databases (Güting, 1994; Gaede & Gunther, 1998) and temporal databases (Salzberg & Tsotras, 1999). The efficient implementation of spatiotemporal databases needs new data models and query languages and novel access structures for storing and accessing i...

Research paper thumbnail of Dynamic Interpolation Search Revisited

Lecture Notes in Computer Science, 2006

A new dynamic Interpolation Search (IS) data structure is presented that achieves O(log log n) se... more A new dynamic Interpolation Search (IS) data structure is presented that achieves O(log log n) search time with high probability on unknown continuous or even discrete input distributions with measurable probability of key collisions, including power law and Binomial distributions. No such previous result holds for IS when the probability of key collisions is measurable. Moreover, our data structure exhibits O(1) expected search time with high probability for a wide class of input distributions that contains all those for which o(log log n) expected search time was previously known.

Research paper thumbnail of Temporal Selection Queries in Video Databases

The paper is concerned with the effective and efficient processing of temporal selection queries ... more The paper is concerned with the effective and efficient processing of temporal selection queries in Video Database and generally Temporal Database Management Systems (TDBMS). Based on both general spatio-temporal retrieval framework ((3)) and recent versions of internal-external Priority Search Trees, we present an optimal in time and space algorithm for the problem that answers certain temporal content queries invoking video functions. We prove the optimality by giving both a new theoretical comparison method based on microanalysis and its experimental verification. 1. Overview

Research paper thumbnail of Dynamic 3-sided planar range queries with expected doubly-logarithmic time

Theoretical Computer Science, 2014

We consider the problem of maintaining dynamically a set of points in the plane and supporting ra... more We consider the problem of maintaining dynamically a set of points in the plane and supporting range queries of the type [a, b] × (−∞, c]. We assume that the inserted points have their x-coordinates drawn from a class of smooth distributions, whereas the y-coordinates are arbitrarily distributed. The points to be deleted are selected uniformly at random among the inserted points. For the RAM model, we present a linear space data structure that supports queries in O(log log n + t) expected time with high probability and updates in O(log log n) expected amortized time, where n is the number of points stored and t is the size of the output of the query. For the I/O model we support queries in O(log log B n + t/B) expected I/Os with high probability and updates in O(log B log n) expected amortized I/Os using linear space, where B is the disk block size. The data structures are deterministic and the expectation is with respect to the input distribution. Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation. 2 1 and constants c 2 > 1 and c 1 ≥ 2 3/(c2−1) (see Lem. 1). Note that w i+1 = w c2 i. The root does not need to satisfy the lower bound of this range. The tree has height Θ(log c2 log c1 n).

Research paper thumbnail of Locating Maximal Multirepeats in Multiple Strings Under Various Constraints

The Computer Journal, 2006

A multirepeat in a string is a substring (factor) that appears a predefined number of times. A mu... more A multirepeat in a string is a substring (factor) that appears a predefined number of times. A multirepeat is maximal if it cannot be extended either to the right or to the left and produce a multirepeat. In this paper, we present algorithms for two different versions of the problem of finding maximal multirepeats in a set of strings. In the case of arbitrary gaps, we propose an algorithm with O(sN 2 n 1 a) time complexity. When the gap is bounded in a small range c, we propose an algorithm with O((c 2 1 s 2)mN 2 n log(Nn) 1 a) time complexity. Here, N is the number of strings, n the mean length of each string, m the multiplicity of the multirepeat and a the number of reported occurrences. Our results extend previous work by considering sets of strings as well as by generalizing pairs to multirepeats.

Research paper thumbnail of Canonical Polygon Queries on the Plane: A New Approach

Journal of Computers, 2009

The polygon retrieval problem on points is the problem of preprocessing a set of n points on the ... more The polygon retrieval problem on points is the problem of preprocessing a set of n points on the plane, so that given a polygon query, the subset of points lying inside it can be reported efficiently. It is of great interest in areas such as Computer Graphics, CAD applications, Spatial Databases and GIS developing tasks. In this paper we study the problem of canonical k-vertex polygon queries on the plane. A canonical k-vertex polygon query always meets the following specific property: a point retrieval query can be transformed into a linear number (with respect to the number of vertices) of point retrievals for orthogonal objects such as rectangles and triangles (throughout this work we call a triangle orthogonal iff two of its edges are axisparallel). We present two new algorithms for this problem. The first one requires O(n log 2 n) space and O(k log 3 n loglogn +A) query time. A simple modification scheme on first algorithm lead us to a second solution, which consumes O(n 2) space and O(k logn loglogn + A) query time, where A denotes the size of the answer and k is the number of vertices. The best previous solution for the general polygon retrieval problem uses O(n 2) space and answers a query in O(k log n + A) time, where k is the number of vertices. It is also very complicated and difficult to be implemented in a standard imperative programming language such as C or C++.

Research paper thumbnail of Applying robust multibit watermarks to digital images

Journal of Computational and Applied Mathematics, 2009

The current work is focusing on the implementation of a robust multibit watermarking algorithm fo... more The current work is focusing on the implementation of a robust multibit watermarking algorithm for digital images, which is based on an innovative spread spectrum technique analysis. The paper presents the watermark embedding and detection algorithms, which use both wavelets and the Discrete Cosine Transform and analyzes the arising issues.

Research paper thumbnail of Integrating GIS, GPS and GSM technologies for the effective management of ambulances

Computers, Environment and Urban Systems, 2001

In this paper, we describe a system offering a solution to the problem of ambulance management an... more In this paper, we describe a system offering a solution to the problem of ambulance management and emergency incident handling in the prefecture of Attica in Greece. It is based on the integration of geographic information system (GIS), global positioning system (GPS) and global system for mobile communication (GSM) technologies. The design of the system was the result of a project funded by the Greek Secretariat of Research and Technology. A significant operation for the handling of emergency incidents is the routing of ambulances to incident sites and then to the closest appropriate hospitals. The response time of a real-time system like ours to such queries is of vital significance. By using efficient data structures for the implementation of the graph representing the road network, the time performance of the shortest-path algorithm can be enhanced. Incorporating the efficient algorithm within the GIS will increase our system's viability.

Research paper thumbnail of Graph DBs vs. Column-Oriented Stores: A Pure Performance Comparison

Algorithmic Aspects of Cloud Computing, 2016

Cloud Computing has brought a great change in the way information is stored and applications run.... more Cloud Computing has brought a great change in the way information is stored and applications run. In order for one or more clusters to work as a cloud we need a middleware framework, such as Apache Hadoop [17], that provides reliability, scalability and distributed computing. Once the infrastructure has been established, a software framework can be installed, which runs on top of it and will be the connection to communicate with the applications developed by the users. The software, in this regard, is a NoSQL database. This paper deals with the problem of searching data in some widespread NoSQL databases used in cloud computing. Two categories of NoSQL databases are compared; one based on columns using a column-oriented key-value store, HBase [6], and a high-available graph database, Neo4j [11]. HBase is a distributed, scalable storage system that runs on top of HDFS, and has being designed based on Google's BigTable [4]. Neo4j has being designed and developed to be a reliable database, optimized for graph structures, instead of tables, and is a robust, scalable, high performance and high available database that supports ACID transactions and queries written in Cypher language. The aim of this paper is to create a novel system that will decide when a query must be send to be executed in a key-value store or a graph database. Thus, an experimental pure performance comparison has been made between Apache HBase and Neo4j for a variety of queries, that were programmed using systems API's and Java language.

Research paper thumbnail of Large Scale Sentiment Analysis on Twitter with Spark

Sentiment analysis on Twitter data has attracted much attention recently. One of the system’s key... more Sentiment analysis on Twitter data has attracted much attention recently. One of the system’s key features, is the immediacy in communication with other users in an easy, user-friendly and fast way. Consequently, people tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide diversity of topics. This amount of information oers huge potential and can be harnessed to receive the sentiment tendency towards these topics. However, since none can invest an innite amount of time to read through these tweets, an automated decision making approach is necessary. Nevertheless, most existing solutions are limited in centralized environments only. Thus, they can only process at most a few thousand tweets. Such a sample, is not representative to dene the sentiment polarity towards a topic due to the massive number of tweets published daily. In this paper, we go one step further and develop a novel method for sentiment le...

Research paper thumbnail of New Dynamic Balanced Search Trees with Worst-Case Constant Update Time

J. Autom. Lang. Comb., 2003

We present new search trees with worst-case O(1) update time and O(log n) search time, storing n ... more We present new search trees with worst-case O(1) update time and O(log n) search time, storing n elements in linear space, in the Pointer Machine (PM) model of computation. In addition, these trees can easily support finger searches in time O(log d) and update operations in worst-case O(log* n) time. The parameter d represents the number of elements (distance) between the search element and an element pointed to by a pointer termed finger. Our data structure is based on a previous result by Fleischer that exhibits the same asymptotic time and space complexities for simple search trees. We improve on this result by handling deletions in an explicit way without using the standard trick of global rebuilding. This is the first search tree that combines worst-case update times with a local rebalancing scheme without using global rebuilding to tackle deletions. In addition, insight is acquired from the construction of these trees as to why deletions are considered more difficult than inse...

Research paper thumbnail of Rectangle Enclosure Reporting in Linear Space Revisited

J. Autom. Lang. Comb., 2003

We present a new algorithm for reporting all the enclosures in a set of plane rectangles in O(n l... more We present a new algorithm for reporting all the enclosures in a set of plane rectangles in O(n log n log log n + k log log n) time and linear space (k denotes the output size). The result is already known (it has already been achieved by two previous papers), however the proposed algorithm follows a different approach.

Research paper thumbnail of Mobile Data Fusion from Multiple Tracking Sensors to Augment Maritime Safety: Mobile Detection, Early Identification, and Tracking of Moving Objects

2015 IEEE International Conference on Mobile Services, 2015

We present a novel mobile solution to provide efficient services for data fusion towards augmenti... more We present a novel mobile solution to provide efficient services for data fusion towards augmenting maritime safety. It is the first, to the best authors knowledge, complex and integrated fusion of data coming from multiple mobile and typical tracking sensors (e.g. Low weight/high performance radar, position transmission mechanisms and electro-optic/systems and hyper-spectral sensors). The key aim is to assist the detection and early identification and tracking of moving targets (e.g. With moving target indication and data fusion/correlation capabilities), as well as methods for obstacle detection and maritime surveillance. Maritime mobile services have to exchange real time data with multiple maritime/aerial assets no matter where the operation is conducted (close to harbors or at the open sea) and independently from the existing surveillance infrastructure. The proposed approach is also useful for the detection of marine pollution incidents. This innovative single window solution presents high efficiency, low operational costs' profiles and possibly contributes to standardization in construction as it utilizes typical tracking infrastructure and typical mobile devices (smartphones and tablets/ipads) & PCs.

Research paper thumbnail of An information system for the effective management of ambulances

Proceedings 13th IEEE Symposium on Computer-Based Medical Systems. CBMS 2000

In this paper, we describe a system offering a solution to the problem of ambulance management an... more In this paper, we describe a system offering a solution to the problem of ambulance management and emergency incident handling in the prefecture of Attica in Greece. It is based on a Geographic Information System (GIS) coupled with Global Positioning System (GPS) and Global System for Mobile Communication (GSM) technologies. The system's operation is expected to minimize the ambulances' response time. Consequently, there will be a drastic improvement in the way emergency incidents are being handled. This fact will thus significantly affect the quality of health services offered to citizens.

Research paper thumbnail of On-line consistent ranking on e-recruitment: seeking the truth behind a well-formed CV

Artificial Intelligence Review, 2013

In this work we present a novel approach for evaluating job applicants in online recruitment syst... more In this work we present a novel approach for evaluating job applicants in online recruitment systems, using machine learning algorithms to solve the candidate ranking problem and performing semantic matching techniques. An application of our approach is implemented in the form of a prototype system, whose functionality is showcased and evaluated in a real-world recruitment scenario. The proposed system extracts a set of objective criteria from the applicants' LinkedIn profile, and compares them semantically to the job's prerequisites. It also infers their personality characteristics using linguistic analysis on their blog Author names appear in alphabetical order.

Research paper thumbnail of Taxonomy Development and Its Impact on a Self-learning e-Recruitment System

IFIP Advances in Information and Communication Technology, 2012

In this work we present a novel approach for evaluating job applicants in online recruitment syst... more In this work we present a novel approach for evaluating job applicants in online recruitment systems, using machine learning algorithms to solve the candidate ranking problem and performing semantic matching techniques. An application of our approach is implemented in the form of a prototype system, whose functionality is showcased and evaluated in a real-world recruitment scenario. The proposed system extracts a set of objective criteria from the applicants' LinkedIn profile, and compares them semantically to the job's prerequisites. It also infers their personality characteristics using linguistic analysis on their blog posts. Our system was found to perform consistently compared to human recruiters, thus it can be trusted for the automation of applicant ranking and personality mining.

Research paper thumbnail of Mining the Conceptual Model of Open Source CMS Using a Reverse Engineering Approach

Communications in Computer and Information Science, 2013

Model-driven engineering has become the emerging standard for software development focusing on th... more Model-driven engineering has become the emerging standard for software development focusing on the use of models as first-class citizens. One possible field of application of such model-driven approaches can be the open source Content Management Systems (CMS) domain. Typically, CMS are built using the source-code-oriented software development process raising issues related to usability, performance and other qualities of service in an application's lifecycle. To overcome these issues, the use of model-driven approaches in the development of CMS-based web applications (WAs) can be particular beneficial. To this end, we propose a model-driven reverse engineering approach for automatic mining of the conceptual model of existing WAs developed using the widely used CMS Joomla! by applying data mining techniques. This methodology can be used to form the cornerstone of an evaluation framework for Joomla!-based WAs either in the design or maintenance process.

Research paper thumbnail of TECHNICAL REPORT No. TR99/10/04

Research paper thumbnail of 2-D Spatial Indexing Scheme in Optimal Time

Lecture Notes in Computer Science, 2000

J. Štuller et al. (Eds.): ADBIS-DASFAA 2000, LNCS 1884, pp. 107–116, 2000. © Springer-Verlag Berl... more J. Štuller et al. (Eds.): ADBIS-DASFAA 2000, LNCS 1884, pp. 107–116, 2000. © Springer-Verlag Berlin Heidelberg 2000 ... 2-D Spatial Indexing Scheme in Optimal Time ... Nectarios Kitsios1, Christos Makris1, Spyros Sioutas1, Athanassios Tsakalidis1,2, John Tsaknakis1, ...

Research paper thumbnail of Personalized selection of web services for mobile environments

Proceedings of the International Conference on Management of Emergent Digital EcoSystems - MEDES '09, 2009

In this paper we discuss the integration of QoS-aware search and personalization algorithms in or... more In this paper we discuss the integration of QoS-aware search and personalization algorithms in order to discover effectual Web Services for the case of mobile web users. We present a number of novel ranking algorithms specially designed for mobile web architectures. To validate and evaluate the proposed algorithms we developed a fully working prototype for mobile-PDA devices. The prototype enables PDA users to access electronic shops and to retrieve information about their products, while going shopping. Comparative experimental results have shown encouraging results that prove m-scroutz to be effective.

Research paper thumbnail of Indexing Techniques for Spatiotemporal Databases

Encyclopedia of Information Science and Technology, Second Edition

We can define as spatiotemporal any database that maintains objects with geometric properties tha... more We can define as spatiotemporal any database that maintains objects with geometric properties that change over time, where usual geometric properties are the spatial position and spatial extent of an object in a specific d-dimensional space. The need to use spatiotemporal databases appears in a variety of applications such as intelligent transportation systems, cellular communications, and meteorology monitoring. This field of database research collaborates tightly with other research areas such as mobile telecommunications, and is harmonically integrated with other disciplines such as CAD/CAM, GIS, environmental science, and bioinformatics. Spatiotemporal databases stand at the crossroad of two other database research areas: spatial databases (Güting, 1994; Gaede & Gunther, 1998) and temporal databases (Salzberg & Tsotras, 1999). The efficient implementation of spatiotemporal databases needs new data models and query languages and novel access structures for storing and accessing i...

Research paper thumbnail of Dynamic Interpolation Search Revisited

Lecture Notes in Computer Science, 2006

A new dynamic Interpolation Search (IS) data structure is presented that achieves O(log log n) se... more A new dynamic Interpolation Search (IS) data structure is presented that achieves O(log log n) search time with high probability on unknown continuous or even discrete input distributions with measurable probability of key collisions, including power law and Binomial distributions. No such previous result holds for IS when the probability of key collisions is measurable. Moreover, our data structure exhibits O(1) expected search time with high probability for a wide class of input distributions that contains all those for which o(log log n) expected search time was previously known.

Research paper thumbnail of Temporal Selection Queries in Video Databases

The paper is concerned with the effective and efficient processing of temporal selection queries ... more The paper is concerned with the effective and efficient processing of temporal selection queries in Video Database and generally Temporal Database Management Systems (TDBMS). Based on both general spatio-temporal retrieval framework ((3)) and recent versions of internal-external Priority Search Trees, we present an optimal in time and space algorithm for the problem that answers certain temporal content queries invoking video functions. We prove the optimality by giving both a new theoretical comparison method based on microanalysis and its experimental verification. 1. Overview

Research paper thumbnail of Dynamic 3-sided planar range queries with expected doubly-logarithmic time

Theoretical Computer Science, 2014

We consider the problem of maintaining dynamically a set of points in the plane and supporting ra... more We consider the problem of maintaining dynamically a set of points in the plane and supporting range queries of the type [a, b] × (−∞, c]. We assume that the inserted points have their x-coordinates drawn from a class of smooth distributions, whereas the y-coordinates are arbitrarily distributed. The points to be deleted are selected uniformly at random among the inserted points. For the RAM model, we present a linear space data structure that supports queries in O(log log n + t) expected time with high probability and updates in O(log log n) expected amortized time, where n is the number of points stored and t is the size of the output of the query. For the I/O model we support queries in O(log log B n + t/B) expected I/Os with high probability and updates in O(log B log n) expected amortized I/Os using linear space, where B is the disk block size. The data structures are deterministic and the expectation is with respect to the input distribution. Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation. 2 1 and constants c 2 > 1 and c 1 ≥ 2 3/(c2−1) (see Lem. 1). Note that w i+1 = w c2 i. The root does not need to satisfy the lower bound of this range. The tree has height Θ(log c2 log c1 n).

Research paper thumbnail of Locating Maximal Multirepeats in Multiple Strings Under Various Constraints

The Computer Journal, 2006

A multirepeat in a string is a substring (factor) that appears a predefined number of times. A mu... more A multirepeat in a string is a substring (factor) that appears a predefined number of times. A multirepeat is maximal if it cannot be extended either to the right or to the left and produce a multirepeat. In this paper, we present algorithms for two different versions of the problem of finding maximal multirepeats in a set of strings. In the case of arbitrary gaps, we propose an algorithm with O(sN 2 n 1 a) time complexity. When the gap is bounded in a small range c, we propose an algorithm with O((c 2 1 s 2)mN 2 n log(Nn) 1 a) time complexity. Here, N is the number of strings, n the mean length of each string, m the multiplicity of the multirepeat and a the number of reported occurrences. Our results extend previous work by considering sets of strings as well as by generalizing pairs to multirepeats.

Research paper thumbnail of Canonical Polygon Queries on the Plane: A New Approach

Journal of Computers, 2009

The polygon retrieval problem on points is the problem of preprocessing a set of n points on the ... more The polygon retrieval problem on points is the problem of preprocessing a set of n points on the plane, so that given a polygon query, the subset of points lying inside it can be reported efficiently. It is of great interest in areas such as Computer Graphics, CAD applications, Spatial Databases and GIS developing tasks. In this paper we study the problem of canonical k-vertex polygon queries on the plane. A canonical k-vertex polygon query always meets the following specific property: a point retrieval query can be transformed into a linear number (with respect to the number of vertices) of point retrievals for orthogonal objects such as rectangles and triangles (throughout this work we call a triangle orthogonal iff two of its edges are axisparallel). We present two new algorithms for this problem. The first one requires O(n log 2 n) space and O(k log 3 n loglogn +A) query time. A simple modification scheme on first algorithm lead us to a second solution, which consumes O(n 2) space and O(k logn loglogn + A) query time, where A denotes the size of the answer and k is the number of vertices. The best previous solution for the general polygon retrieval problem uses O(n 2) space and answers a query in O(k log n + A) time, where k is the number of vertices. It is also very complicated and difficult to be implemented in a standard imperative programming language such as C or C++.

Research paper thumbnail of Applying robust multibit watermarks to digital images

Journal of Computational and Applied Mathematics, 2009

The current work is focusing on the implementation of a robust multibit watermarking algorithm fo... more The current work is focusing on the implementation of a robust multibit watermarking algorithm for digital images, which is based on an innovative spread spectrum technique analysis. The paper presents the watermark embedding and detection algorithms, which use both wavelets and the Discrete Cosine Transform and analyzes the arising issues.

Research paper thumbnail of Integrating GIS, GPS and GSM technologies for the effective management of ambulances

Computers, Environment and Urban Systems, 2001

In this paper, we describe a system offering a solution to the problem of ambulance management an... more In this paper, we describe a system offering a solution to the problem of ambulance management and emergency incident handling in the prefecture of Attica in Greece. It is based on the integration of geographic information system (GIS), global positioning system (GPS) and global system for mobile communication (GSM) technologies. The design of the system was the result of a project funded by the Greek Secretariat of Research and Technology. A significant operation for the handling of emergency incidents is the routing of ambulances to incident sites and then to the closest appropriate hospitals. The response time of a real-time system like ours to such queries is of vital significance. By using efficient data structures for the implementation of the graph representing the road network, the time performance of the shortest-path algorithm can be enhanced. Incorporating the efficient algorithm within the GIS will increase our system's viability.

Research paper thumbnail of Graph DBs vs. Column-Oriented Stores: A Pure Performance Comparison

Algorithmic Aspects of Cloud Computing, 2016

Cloud Computing has brought a great change in the way information is stored and applications run.... more Cloud Computing has brought a great change in the way information is stored and applications run. In order for one or more clusters to work as a cloud we need a middleware framework, such as Apache Hadoop [17], that provides reliability, scalability and distributed computing. Once the infrastructure has been established, a software framework can be installed, which runs on top of it and will be the connection to communicate with the applications developed by the users. The software, in this regard, is a NoSQL database. This paper deals with the problem of searching data in some widespread NoSQL databases used in cloud computing. Two categories of NoSQL databases are compared; one based on columns using a column-oriented key-value store, HBase [6], and a high-available graph database, Neo4j [11]. HBase is a distributed, scalable storage system that runs on top of HDFS, and has being designed based on Google's BigTable [4]. Neo4j has being designed and developed to be a reliable database, optimized for graph structures, instead of tables, and is a robust, scalable, high performance and high available database that supports ACID transactions and queries written in Cypher language. The aim of this paper is to create a novel system that will decide when a query must be send to be executed in a key-value store or a graph database. Thus, an experimental pure performance comparison has been made between Apache HBase and Neo4j for a variety of queries, that were programmed using systems API's and Java language.