Jeeva Paudel - Academia.edu (original) (raw)

Uploads

Papers by Jeeva Paudel

Research paper thumbnail of Hybrid parallel task placement in irregular applications

Journal of Parallel and Distributed Computing, 2015

ABSTRACT What are the performance benefits of selectively relaxing the locality preferences of so... more ABSTRACT What are the performance benefits of selectively relaxing the locality preferences of some tasks in parallel applications? Can load-balancing algorithms for a distributed-memory cluster benefit from this relaxation? This work investigates these ideas by employing application-level task locality for selection of tasks rather than hardware memory topology as is the norm in the literature. A prototype designed to evaluate these ideas is implemented in X10, a realization of the asynchronous partitioned global address space programming model. This evaluation reveals the applicability of this new approach to several real-world applications chosen from the Cowichan and the Lonestar suites. On a cluster of 128 processors, the new work-stealing strategy demonstrates a speedup between 12% and 32% over X10’s existing scheduler. Moreover, the new strategy does not degrade the performance of any of the applications studied.

Research paper thumbnail of Optimizing shared data accesses in distributed-memory X10 systems

2014 21st International Conference on High Performance Computing (HiPC), 2014

Research paper thumbnail of Using the Cowichan problems to investigate the programmability of X10 programming system

Proceedings of the 2011 ACM SIGPLAN X10 Workshop on - X10 '11, 2011

In today's era of multicores and clustered architectures, high performance and high productivity ... more In today's era of multicores and clustered architectures, high performance and high productivity are central concerns in the design of parallel programming languages that aim to solve large computational problems. X10 is a language based on state-of-the-art object-oriented programming ideas and claims to take advantage of their proven flexibility and easeof-use to solve a wide spectrum of programming problems. The Cowichan problems are a set of computational problems that were designed to stress parallel programming environments and to assess their programmability. This paper uses Cowichan problems to assess the flexibility of X10. a suite of applications called the Cowichan problems [8, 10]. The specific contributions resulting from this work are:

Research paper thumbnail of Stratified Sampling for Even Workload Partitioning

Proceedings of the 23rd international conference on Parallel architectures and compilation - PACT '14, 2014

Research paper thumbnail of Hybrid parallel task placement in X10

Proceedings of the third ACM SIGPLAN X10 Workshop on - X10 '13, 2013

This paper presents a hybrid parallel task-placement strategy that combines work stealing and wor... more This paper presents a hybrid parallel task-placement strategy that combines work stealing and work dealing to improve workload distribution across nodes in distributed shared-memory machines. Existing work-dealing-based load balancers suffer from large performance penalties resulting from excessive task migration and from excessive communication among the nodes to determine the target node for a migrated task. This work employs a simple heuristic to determine the load status of a node and also to detect a good target for migration of tasks.

Research paper thumbnail of On the Merits of Distributed Work-Stealing on Selective Locality-Aware Tasks

2013 42nd International Conference on Parallel Processing, 2013

Improving the performance of work-stealing loadbalancing algorithms in distributed shared-memory ... more Improving the performance of work-stealing loadbalancing algorithms in distributed shared-memory systems is challenging. These algorithms need to overcome high costs of contention among workers, communication and remote datareferences between nodes, and their impact on the locality preferences of tasks. Prior research focus on stealing from a victim that best exploits data locality, and on using special deques that minimize the contention between local and remote workers.

Research paper thumbnail of Modularizing error recovery

2009 IEEE International Conference on Software Maintenance, 2009

ABSTRACT

Research paper thumbnail of Hybrid parallel task placement in irregular applications

Journal of Parallel and Distributed Computing, 2015

ABSTRACT What are the performance benefits of selectively relaxing the locality preferences of so... more ABSTRACT What are the performance benefits of selectively relaxing the locality preferences of some tasks in parallel applications? Can load-balancing algorithms for a distributed-memory cluster benefit from this relaxation? This work investigates these ideas by employing application-level task locality for selection of tasks rather than hardware memory topology as is the norm in the literature. A prototype designed to evaluate these ideas is implemented in X10, a realization of the asynchronous partitioned global address space programming model. This evaluation reveals the applicability of this new approach to several real-world applications chosen from the Cowichan and the Lonestar suites. On a cluster of 128 processors, the new work-stealing strategy demonstrates a speedup between 12% and 32% over X10’s existing scheduler. Moreover, the new strategy does not degrade the performance of any of the applications studied.

Research paper thumbnail of Optimizing shared data accesses in distributed-memory X10 systems

2014 21st International Conference on High Performance Computing (HiPC), 2014

Research paper thumbnail of Using the Cowichan problems to investigate the programmability of X10 programming system

Proceedings of the 2011 ACM SIGPLAN X10 Workshop on - X10 '11, 2011

In today's era of multicores and clustered architectures, high performance and high productivity ... more In today's era of multicores and clustered architectures, high performance and high productivity are central concerns in the design of parallel programming languages that aim to solve large computational problems. X10 is a language based on state-of-the-art object-oriented programming ideas and claims to take advantage of their proven flexibility and easeof-use to solve a wide spectrum of programming problems. The Cowichan problems are a set of computational problems that were designed to stress parallel programming environments and to assess their programmability. This paper uses Cowichan problems to assess the flexibility of X10. a suite of applications called the Cowichan problems [8, 10]. The specific contributions resulting from this work are:

Research paper thumbnail of Stratified Sampling for Even Workload Partitioning

Proceedings of the 23rd international conference on Parallel architectures and compilation - PACT '14, 2014

Research paper thumbnail of Hybrid parallel task placement in X10

Proceedings of the third ACM SIGPLAN X10 Workshop on - X10 '13, 2013

This paper presents a hybrid parallel task-placement strategy that combines work stealing and wor... more This paper presents a hybrid parallel task-placement strategy that combines work stealing and work dealing to improve workload distribution across nodes in distributed shared-memory machines. Existing work-dealing-based load balancers suffer from large performance penalties resulting from excessive task migration and from excessive communication among the nodes to determine the target node for a migrated task. This work employs a simple heuristic to determine the load status of a node and also to detect a good target for migration of tasks.

Research paper thumbnail of On the Merits of Distributed Work-Stealing on Selective Locality-Aware Tasks

2013 42nd International Conference on Parallel Processing, 2013

Improving the performance of work-stealing loadbalancing algorithms in distributed shared-memory ... more Improving the performance of work-stealing loadbalancing algorithms in distributed shared-memory systems is challenging. These algorithms need to overcome high costs of contention among workers, communication and remote datareferences between nodes, and their impact on the locality preferences of tasks. Prior research focus on stealing from a victim that best exploits data locality, and on using special deques that minimize the contention between local and remote workers.

Research paper thumbnail of Modularizing error recovery

2009 IEEE International Conference on Software Maintenance, 2009

ABSTRACT

Log In