Query Optimization Architecture for Data Grid Environment (original) (raw)
Related papers
2002
With the rapid collection of data in wide variety of fields-ranging from business transactions through medical investigations to scientific research-the demands in data analysis tools are ever growing. Today's challenges are less related to data storage and information retrieval, but can rather be found in the analysis of data on a global scale in a heterogenous information system: technologies such as On-Line Analytical Processing, Data Mining and Knowledge Discovery in Databases all require the integration of information and efficient query processing. In distributed and heterogenous datasets this can only be achieved by the efficient distribution and scheduling of subtasks in a distributed computing resource. We propose the use of mobile query optimizations based on agent-technology for distributed data warehouse and OLAP applications to adapt the concurrent query execution dynamically to the computing resource it executes on. This is of particular importance in cluster and grid computing.
A Distributed Query Execution Engine in a Grid Environment
Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07), 2007
Grid is a computational environment in which applications can use multiple distributed computational resources in a safe, coordinated, efficient and transparent way. Data Integration Middleware Systems (DIMS) are originally distributed systems that can make use of Grid environments to obtain a better performance and a rational use of available resources. This work describes a Distributed Query Execution Engine (DQEE) inserted in a Grid environment for executing sub-queries and operators in a distributed and parallel way. We present an approach to obtain an efficient distributed query execution with a reduced response time by the use of the DQEE.
Query Optimization in Grid Databases
2007 14th International Conference on Mixed Design of Integrated Circuits and Systems, 2007
DarGridâ…ˇ is an implemented database gird system whose goal is to provide a semantic solution for integrating database resources on the web. Although many algorithms have been proposed for optimizing query-processing in order to minimize costs and/or response time, associated with obtaining the answer to query in a distributed database system, database grid query optimization problem is fundamentally different from distributed query optimization. These differences are shown to be the consequences of autonomy and heterogeneity of databases in database grid. Therefore, more challenges have arisen for query optimization in database grid than traditional distributed database. Following this observation, we present the design of a query optimizer in DartGrid â…ˇ , and a heuristic, dynamic, and parallel query optimization approach for processing query in database grid is proposed.
Query Optimization in Database Grid
2005
DarGrid II is an implemented database gird system whose goal is to provide a semantic solution for integrating database resources on the web. Although many algorithms have been proposed for optimizing query-processing in order to minimize costs and/or response time, associated with obtaining the answer to query in a distributed database system, database grid query optimization problem is fundamentally different from distributed query optimization. These differences are shown to be the consequences of autonomy and heterogeneity of databases in database grid. Therefore, more challenges have arisen for query optimization in database grid than traditional distributed database. Following this observation, we present the design of a query optimizer in DartGrid II, and a heuristic, dynamic, and parallel query optimization approach for processing query in database grid is proposed.
Algorithms for Distributed Database Query Command on the Grid
Research and development activities relating to Grid Computing are leaving the academic field to reach important applications in the corporate world. The real possibility to join simple computers in a powerful network, increasing the processing capability and storage requirements, creates the potential environment for grid application systems. Although this has allowed progress in building rapidly various aspects of Grid infrastructure, the integration of different resources, including database, is fundamental. This paper shows the development and makes comparison between two algorithms for planning the distribution and parallelization of database query on grid computing.
Distributed query processing on the grid
2002
Distributed query processing (DQP) has been widely used in data intensive applications where data of relevance to users is stored in multiple locations. This paper argues:(i) that DQP can be important in the Grid, as a means of providing high-level, declarative languages for integrating data access and analysis; and (ii) that the Grid provides resource management facilities that are useful to developers of DQP systems.
XG: A Grid-Enabled Query Processing Engine
Lecture Notes in Computer Science, 2006
In [12] we introduce a novel architecture for data processing, based on a functional fusion between a data and a computation layer. In this demo we show how this architecture is leveraged to offer significant speedups for data processing jobs such as data analysis and mining over large data sets. One novel contribution of our solution is its data-driven approach. The computation infrastructure is controlled from within the data layer. Grid compute job submission events are based within the query processor on the DBMS side and in effect controlled by the data processing job to be performed. This allows the early deployment of on-the-fly data aggregation techniques, minimizing the amount of data to be transfered to/from compute nodes and is in stark contrast to existing Grid solutions that interact with data layers as external (mainly) "storage" components. By integrating scheduling intelligence in the data layer itself we show that it is possible to provide a close to optimal solution to the more general grid trade-off between required data replication costs and computation speed-up benefits. We validate this in a scenario derived from a real business deployment, involving financial customer profiling using common types of data analytics.
Parallel Query Processing on the Grid
Database queries offer an easy-to-use declarative manner for describing complex data management tasks. Query processing technologies have been evolving for decades; however the emergence of the Grid creates a new setting in which novel research issues and challenges have arisen. This chapter discusses how Gridoriented and/or service-based query processors differ from traditional ones, and focuses on three complementary research issues, namely, how to schedule parallel database queries over non-dedicated, distributed resources; how to mitigate the impact of increased data transfer cost; and how to perform load balancing in this new setting. In addition, we discuss how parallel spatio-temporal query processing techniques can be applied to a Grid environment. The discussion revolves around the development of the OGSA-DQP system, which is a pioneer open-source servicebased query processing system that enables parallel query execution over Grid resources, and the way some of the most prominent issues about its performance were addressed. The unique characteristics of the scheduling problem of arbitrarily parallel queries over heterogeneous resources have motivated the development of a new hill-climbing algorithm. For the problems of increased data transmission cost and load balancing, due to the highly volatile conditions, techniques founded on control theory are examined. The emphasis of this chapter is on both the description of a real Grid-enabled parallel query processor and the presentation of the different approaches to tackling each of the afore-mentioned problems including the limitations of the current state-of-the-art solutions.
Service-Based Distributed Querying on the Grid
2003
Service-based approaches (such as Web Services and the Open Grid Services Architecture) have gained considerable attention recently for supporting distributed application development in e-business and e-science. The emergence of a service-oriented view of hardware and software resources raises the question as to how database management systems and technologies can best be deployed or adapted for use in such an environment. This paper explores one aspect of service-based computing and data management, viz., how to integrate query processing technology with a service-based Grid. The paper describes in detail the design and implementation of a service-based distributed query processor for the Grid. The query processor is service-based in two orthogonal senses: firstly, it supports querying over data storage and analysis resources that are made available as services, and, secondly, its internal architecture factors out as services the functionalities related to the construction of distributed query plans on the one hand, and to their execution over the Grid on the other. The resulting system both provides a declarative approach to service orchestration in the Grid, and demonstrates how query processing can benefit from dynamic access to computational resources on the Grid.
Data integration and query reformulation in service-based grids
2007
This paper describes the XMAP data integration framework and query reformulation algorithm, provides insights into the performance of the algorithm, and about its use in implementing query processing services. Here we propose an approach for data integration-enabled distributed query processing on Grids by embedding the XMAP reformulation algorithm within the OGSA-DQP distributed query processor.