Adaptively Parallelizing Distributed Range Queries (original) (raw)
Related papers
Performance Analysis of a Pull-Based Parallel Video Server
IEEE Transactions on Parallel and Distributed Systems, 2000
AbstractÐIn conventional video-on-demand systems, video data are stored in a video server for delivery to multiple receivers over a communications network. The video server's hardware limits the maximum storage capacity as well as the maximum number of video sessions that can concurrently be delivered. Clearly, these limits will eventually be exceeded by the growing need for better video quality and larger user population. This paper studies a parallel video server architecture that exploits server parallelism to achieve incremental scalability. First, unlike data partition and replication, the architecture employs data striping at the server level to achieve fine-grain load balancing across multiple servers. Second, a client-pull service model is employed to eliminate the need for interserver synchronization. Third, an admission-scheduling algorithm is proposed to further control the instantaneous load at each server so that linear scalability can be achieved. This paper analyzes the performance of the architecture by deriving bounds for server service delay, client buffer requirement, prefetch delay, and scheduling delay. These performance metrics and design tradeoffs are further evaluated using numerical examples. Our results show that the proposed parallel video server architecture can be linearly scaled up to more concurrent users simply by adding more servers and redistributing the video data among the servers.
Performance analysis of a pull-based parallel video server
IEEE Transactions on Parallel and Distributed Systems, 2000
AbstractÐIn conventional video-on-demand systems, video data are stored in a video server for delivery to multiple receivers over a communications network. The video server's hardware limits the maximum storage capacity as well as the maximum number of video sessions that can concurrently be delivered. Clearly, these limits will eventually be exceeded by the growing need for better video quality and larger user population. This paper studies a parallel video server architecture that exploits server parallelism to achieve incremental scalability. First, unlike data partition and replication, the architecture employs data striping at the server level to achieve fine-grain load balancing across multiple servers. Second, a client-pull service model is employed to eliminate the need for interserver synchronization. Third, an admission-scheduling algorithm is proposed to further control the instantaneous load at each server so that linear scalability can be achieved. This paper analyzes the performance of the architecture by deriving bounds for server service delay, client buffer requirement, prefetch delay, and scheduling delay. These performance metrics and design tradeoffs are further evaluated using numerical examples. Our results show that the proposed parallel video server architecture can be linearly scaled up to more concurrent users simply by adding more servers and redistributing the video data among the servers.
Dynamic query scheduling in parallel data warehouses
Concurrency and Computation: Practice and Experience, 2003
Parallel processing is a key to high performance in very large data warehouse applications that execute complex analytical queries on huge amounts of data. Although parallel database systems (PDBSs) have been studied extensively in the past decades, the specifics of load balancing in parallel data warehouses have not been addressed in detail.
JAWS: job-aware workload scheduling for the exploration of turbulence simulations
Proceedings of the …, 2010
We present JAWS, a job-aware, data-driven batch scheduler that improves query throughput for data-intensive scientific database clusters. As datasets reach petabyte-scale, workloads that scan through vast amounts of data to extract features are gaining importance in the sciences. However, acute performance bottlenecks result when multiple queries execute simultaneously and compete for I/O resources. Our solution,
2009
The rapid growth in the size of databases and the advances made in Query Languages has resulted in increased SQL query complexity submitted by users, which in turn slows down the speed of information retrieval from the database. The future of high performance database systems lies in parallelism. Commercial vendors' database systems have introduced solutions but these have proved to be extremely expensive. This paper invistagete how networked resources such as workstations can be utilised by using Parallel Virtual Machine (PVM) to Optimise Database Query Execution. An investigation and experiments of the scalability of the PVM are conducted. PVM is used to implement parallelism in two separate ways: (i) Remove the work load for deriving and maintaining rules from the data server for Semantic Query Optimisation, therefore clears the way for more widespread use of SQO in databases [1,2]. (ii) Answer users queries by a proposed Parallel Query Algorithm PQA which works over a network of workstations, coupled with a sequential Database Management System DBMS called PostgreSql on the prototype called Expandable Server Architecture ESA [1,2,3,4]. Experiments have been conducted to tackle the problems of Parallel and Distributed systems such as task scheduling, load balance and fault tolerance.
Research in mobile database query optimization and processing
Mobile Information Systems, 2005
The emergence of mobile computing provides the ability to access information at any time and place. However, as mobile computing environments have inherent factors like power, storage, asymmetric communication cost, and bandwidth limitations, efficient query processing and minimum query response time are definitely of great interest. This survey groups a variety of query optimization and processing mechanisms in mobile databases into two main categories, namely: (i) query processing strategy, and (ii) caching management strategy. Query processing includes both pull and push operations (broadcast mechanisms). We further classify push operation into on-demand broadcast and periodic broadcast. Push operation (on-demand broadcast) relates to designing techniques that enable the server to accommodate multiple requests so that the request can be processed efficiently. Push operation (periodic broadcast) corresponds to data dissemination strategies. In this scheme, several techniques to improve the query performance by broadcasting data to a population of mobile users are described. A caching management strategy defines a number of methods for maintaining cached data items in clients' local storage. This strategy considers critical caching issues such as caching granularity, caching coherence strategy and caching replacement policy. Finally, this survey concludes with several open issues relating to mobile query optimization and processing strategy.
JAWS: Job-Aware Workload Scheduling for the Exploration of Turbulence Simulations
2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010
We present JAWS, a job-aware, data-driven batch scheduler that improves query throughput for data-intensive scientific database clusters. As datasets reach petabyte-scale, workloads that scan through vast amounts of data to extract features are gaining importance in the sciences. However, acute performance bottlenecks result when multiple queries execute simultaneously and compete for I/O resources. Our solution,
TelegraphCQ: Continuous Dataflow Processing for an Uncertain World
2003
Increasingly pervasive networks are leading towards a world where data is constantly in motion. In such a world, conventional techniques for query processing, which were developed under the assumption of a far more static and predictable computational environment, will not be sufficient. Instead, query processors based on adaptive dataflow will be necessary.
Evaluation Performance of Task Scheduling Algorithms in Heterogeneous Environments
A heterogeneous computing environment is a large-scale distributed data processing environment, it is depends to some extent parameters on the application and that classified in three main categories such as the hardware, the communication layer, and the software. A computer system is consists of hardware and software from two or more different manufacturers. Scheduling is one of the important factors in the heterogeneous environment and the aim of task scheduling in the processing environment is to move computation towards data. In order to achieve improve performance, increase the throughput and minimizing the makespan; scheduler must avoid unnecessary data transmission. Hence, different scheduling algorithms for heterogeneous computing environment are necessary to provide good performance. How to speedup scheduling the service resources to achieve the lowest cost becomes more and more important. This paper tries to illustrate and analyze the overview of eighteen different scheduling algorithms for heterogeneous computing environment and their scheduling issues and problems.