Distributed query processing on the grid (original) (raw)

A Distributed Query Execution Engine in a Grid Environment

Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07), 2007

Grid is a computational environment in which applications can use multiple distributed computational resources in a safe, coordinated, efficient and transparent way. Data Integration Middleware Systems (DIMS) are originally distributed systems that can make use of Grid environments to obtain a better performance and a rational use of available resources. This work describes a Distributed Query Execution Engine (DQEE) inserted in a Grid environment for executing sub-queries and operators in a distributed and parallel way. We present an approach to obtain an efficient distributed query execution with a reduced response time by the use of the DQEE.

Service-Based Distributed Querying on the Grid

2003

Service-based approaches (such as Web Services and the Open Grid Services Architecture) have gained considerable attention recently for supporting distributed application development in e-business and e-science. The emergence of a service-oriented view of hardware and software resources raises the question as to how database management systems and technologies can best be deployed or adapted for use in such an environment. This paper explores one aspect of service-based computing and data management, viz., how to integrate query processing technology with a service-based Grid. The paper describes in detail the design and implementation of a service-based distributed query processor for the Grid. The query processor is service-based in two orthogonal senses: firstly, it supports querying over data storage and analysis resources that are made available as services, and, secondly, its internal architecture factors out as services the functionalities related to the construction of distributed query plans on the one hand, and to their execution over the Grid on the other. The resulting system both provides a declarative approach to service orchestration in the Grid, and demonstrates how query processing can benefit from dynamic access to computational resources on the Grid.

Parallel Query Processing on the Grid

Database queries offer an easy-to-use declarative manner for describing complex data management tasks. Query processing technologies have been evolving for decades; however the emergence of the Grid creates a new setting in which novel research issues and challenges have arisen. This chapter discusses how Gridoriented and/or service-based query processors differ from traditional ones, and focuses on three complementary research issues, namely, how to schedule parallel database queries over non-dedicated, distributed resources; how to mitigate the impact of increased data transfer cost; and how to perform load balancing in this new setting. In addition, we discuss how parallel spatio-temporal query processing techniques can be applied to a Grid environment. The discussion revolves around the development of the OGSA-DQP system, which is a pioneer open-source servicebased query processing system that enables parallel query execution over Grid resources, and the way some of the most prominent issues about its performance were addressed. The unique characteristics of the scheduling problem of arbitrarily parallel queries over heterogeneous resources have motivated the development of a new hill-climbing algorithm. For the problems of increased data transmission cost and load balancing, due to the highly volatile conditions, techniques founded on control theory are examined. The emphasis of this chapter is on both the description of a real Grid-enabled parallel query processor and the presentation of the different approaches to tackling each of the afore-mentioned problems including the limitations of the current state-of-the-art solutions.

XG: A Grid-Enabled Query Processing Engine

Lecture Notes in Computer Science, 2006

In [12] we introduce a novel architecture for data processing, based on a functional fusion between a data and a computation layer. In this demo we show how this architecture is leveraged to offer significant speedups for data processing jobs such as data analysis and mining over large data sets. One novel contribution of our solution is its data-driven approach. The computation infrastructure is controlled from within the data layer. Grid compute job submission events are based within the query processor on the DBMS side and in effect controlled by the data processing job to be performed. This allows the early deployment of on-the-fly data aggregation techniques, minimizing the amount of data to be transfered to/from compute nodes and is in stark contrast to existing Grid solutions that interact with data layers as external (mainly) "storage" components. By integrating scheduling intelligence in the data layer itself we show that it is possible to provide a close to optimal solution to the more general grid trade-off between required data replication costs and computation speed-up benefits. We validate this in a scenario derived from a real business deployment, involving financial customer profiling using common types of data analytics.

Exploiting dynamic deployment in a distributed query processor for the grid

2008

Abstract Grid computing has enabled users to perform computationally expensive applications on distributed resources acquired dynamically. It has also allowed users to combine data and analysis components into new applications from sites all over the world. Often such distributed data is structured, and an established way of structuring computations is found in distributed query processing.

Modular adaptive query processing for service-based grids

2006

Abstract Distributed and heterogeneous environments present significant challenges to complex software systems, which must operate in the context of continuously changing loads, with partial or out-of-date information on resource capabilities. A distributed query processor (DQP) can be used to access and integrate data from distributed sources, as well as for combining data access with data analysis.

A service-oriented system for distributed data querying and integration on Grids

2009

Data Grids rely on the coordinated sharing of and interaction across multiple autonomous database management systems. They provide transparent access to heterogeneous and autonomous data resources stored on Grid nodes. Data sharing tools for Grids must include both distributed query processing and data integration functionality.

The design and implementation of OGSA-DQP: A service-based distributed query processor

Future Generation Computer Systems, 2009

Service-based approaches are rising to prominence because of their potential to meet the requirements for distributed application development in e-business and e-science. The emergence of a service-oriented view of hardware and software resources raises the question as to how database management systems and technologies can best be deployed or adapted for use in such an environment. This paper explores one aspect of service-based computing and data management, viz., how to integrate query processing technology with a service-based architecture suitable for a Grid environment. The paper addresses this by describing in detail the design and implementation of a service-based distributed query processor. The query processor is service-based in two orthogonal senses: firstly, it supports querying over data storage and analysis resources that are made available as services, and, secondly, its internal architecture factors out as services the functionalities related to the construction and execution of distributed query plans. The resulting system both provides a declarative approach to service orchestration, and demonstrates how query processing can benefit from a service-based architecture. As well as describing and motivating the architecture used, the paper also describes usage scenarios, and, using a bioinformatics application, presents performance results that benchmark the system and illustrate the benefits provided by the service-based architecture.

The Grid-DBMS: Towards Dynamic Data Management in Grid Environments

2005

Nowadays many data grid applications need to manage and process a huge amount of data distributed across multiple grid nodes and stored into heterogeneous databases. Grids encourage the publication of scientific data in a more open manner than is currently the case, and many e-Science projects have an urgent need to interconnect legacy and independently operated databases through a set of data access and integration services. In the data grid area a set of dynamic and adaptive services could address specific issues related to automatic data management aiming at both providing high performance and fully exploiting a grid infrastructure. In this paper we introduce the Grid-DBMS concept, a framework for dynamic data management in grid environments, highlighting its requirements, architecture, components and services.