Model-Based Optimization of EULAG Kernel on Intel Xeon Phi Through Load Imbalancing (original) (raw)
IEEE Transactions on Parallel and Distributed Systems, 2017
Abstract
Load balancing is a widely accepted technique for performance optimization of scientific applications on parallel architectures. Indeed, balanced applications do not waste processor cycles on waiting at points of synchronization and data exchange, maximizing this way the utilization of processors. In this paper, we challenge the universality of the load-balancing approach to optimization of the performance of parallel applications. First, we formulate conditions that should be satisfied by the performance profile of an application in order for the application to achieve its best performance via load balancing. Then we use a real-life scientific application, EULAG MPDATA kernel, to demonstrate that its performance profile on a modern parallel architecture, Intel Xeon Phi, significantly deviates from these conditions. Based on this observation, we propose a method of performance optimization of scientific applications through load imbalancing. In the case of data parallel application, the method uses functional performance models of the application to find partitioning that minimizes its computation time but not necessarily balances the load of processors. We apply this method to optimization of MPDATA on Intel Xeon Phi. Experimental results demonstrate that the performance of this carefully optimized load-balanced application can be further improved by 15percent using the proposed load-imbalancing technique.
Alexey Lastovetsky hasn't uploaded this paper.
Let Alexey know you want this paper to be uploaded.
Ask for this paper to be uploaded.