Jeeva Paudel - Academia.edu (original) (raw)
Uploads
Papers by Jeeva Paudel
Journal of Parallel and Distributed Computing, 2015
ABSTRACT What are the performance benefits of selectively relaxing the locality preferences of so... more ABSTRACT What are the performance benefits of selectively relaxing the locality preferences of some tasks in parallel applications? Can load-balancing algorithms for a distributed-memory cluster benefit from this relaxation? This work investigates these ideas by employing application-level task locality for selection of tasks rather than hardware memory topology as is the norm in the literature. A prototype designed to evaluate these ideas is implemented in X10, a realization of the asynchronous partitioned global address space programming model. This evaluation reveals the applicability of this new approach to several real-world applications chosen from the Cowichan and the Lonestar suites. On a cluster of 128 processors, the new work-stealing strategy demonstrates a speedup between 12% and 32% over X10’s existing scheduler. Moreover, the new strategy does not degrade the performance of any of the applications studied.
2014 21st International Conference on High Performance Computing (HiPC), 2014
Proceedings of the 2011 ACM SIGPLAN X10 Workshop on - X10 '11, 2011
In today's era of multicores and clustered architectures, high performance and high productivity ... more In today's era of multicores and clustered architectures, high performance and high productivity are central concerns in the design of parallel programming languages that aim to solve large computational problems. X10 is a language based on state-of-the-art object-oriented programming ideas and claims to take advantage of their proven flexibility and easeof-use to solve a wide spectrum of programming problems. The Cowichan problems are a set of computational problems that were designed to stress parallel programming environments and to assess their programmability. This paper uses Cowichan problems to assess the flexibility of X10. a suite of applications called the Cowichan problems [8, 10]. The specific contributions resulting from this work are:
Proceedings of the 23rd international conference on Parallel architectures and compilation - PACT '14, 2014
Proceedings of the third ACM SIGPLAN X10 Workshop on - X10 '13, 2013
This paper presents a hybrid parallel task-placement strategy that combines work stealing and wor... more This paper presents a hybrid parallel task-placement strategy that combines work stealing and work dealing to improve workload distribution across nodes in distributed shared-memory machines. Existing work-dealing-based load balancers suffer from large performance penalties resulting from excessive task migration and from excessive communication among the nodes to determine the target node for a migrated task. This work employs a simple heuristic to determine the load status of a node and also to detect a good target for migration of tasks.
2013 42nd International Conference on Parallel Processing, 2013
Improving the performance of work-stealing loadbalancing algorithms in distributed shared-memory ... more Improving the performance of work-stealing loadbalancing algorithms in distributed shared-memory systems is challenging. These algorithms need to overcome high costs of contention among workers, communication and remote datareferences between nodes, and their impact on the locality preferences of tasks. Prior research focus on stealing from a victim that best exploits data locality, and on using special deques that minimize the contention between local and remote workers.
2009 IEEE International Conference on Software Maintenance, 2009
ABSTRACT
Journal of Parallel and Distributed Computing, 2015
ABSTRACT What are the performance benefits of selectively relaxing the locality preferences of so... more ABSTRACT What are the performance benefits of selectively relaxing the locality preferences of some tasks in parallel applications? Can load-balancing algorithms for a distributed-memory cluster benefit from this relaxation? This work investigates these ideas by employing application-level task locality for selection of tasks rather than hardware memory topology as is the norm in the literature. A prototype designed to evaluate these ideas is implemented in X10, a realization of the asynchronous partitioned global address space programming model. This evaluation reveals the applicability of this new approach to several real-world applications chosen from the Cowichan and the Lonestar suites. On a cluster of 128 processors, the new work-stealing strategy demonstrates a speedup between 12% and 32% over X10’s existing scheduler. Moreover, the new strategy does not degrade the performance of any of the applications studied.
2014 21st International Conference on High Performance Computing (HiPC), 2014
Proceedings of the 2011 ACM SIGPLAN X10 Workshop on - X10 '11, 2011
In today's era of multicores and clustered architectures, high performance and high productivity ... more In today's era of multicores and clustered architectures, high performance and high productivity are central concerns in the design of parallel programming languages that aim to solve large computational problems. X10 is a language based on state-of-the-art object-oriented programming ideas and claims to take advantage of their proven flexibility and easeof-use to solve a wide spectrum of programming problems. The Cowichan problems are a set of computational problems that were designed to stress parallel programming environments and to assess their programmability. This paper uses Cowichan problems to assess the flexibility of X10. a suite of applications called the Cowichan problems [8, 10]. The specific contributions resulting from this work are:
Proceedings of the 23rd international conference on Parallel architectures and compilation - PACT '14, 2014
Proceedings of the third ACM SIGPLAN X10 Workshop on - X10 '13, 2013
This paper presents a hybrid parallel task-placement strategy that combines work stealing and wor... more This paper presents a hybrid parallel task-placement strategy that combines work stealing and work dealing to improve workload distribution across nodes in distributed shared-memory machines. Existing work-dealing-based load balancers suffer from large performance penalties resulting from excessive task migration and from excessive communication among the nodes to determine the target node for a migrated task. This work employs a simple heuristic to determine the load status of a node and also to detect a good target for migration of tasks.
2013 42nd International Conference on Parallel Processing, 2013
Improving the performance of work-stealing loadbalancing algorithms in distributed shared-memory ... more Improving the performance of work-stealing loadbalancing algorithms in distributed shared-memory systems is challenging. These algorithms need to overcome high costs of contention among workers, communication and remote datareferences between nodes, and their impact on the locality preferences of tasks. Prior research focus on stealing from a victim that best exploits data locality, and on using special deques that minimize the contention between local and remote workers.
2009 IEEE International Conference on Software Maintenance, 2009
ABSTRACT