Alexey Lastovetsky | University College of Dublin (original) (raw)
Uploads
Papers by Alexey Lastovetsky
Scientific Programming, 2013
DOAJ (DOAJ: Directory of Open Access Journals), 2013
... Page: 12. A Hybrid Randomized Initialization Protcol for TDMA in Single-Hop Wireless Networks... more ... Page: 12. A Hybrid Randomized Initialization Protcol for TDMA in Single-Hop Wireless Networks. Aleksandar Micicl, Ivan Stojmenovic. Page: 13. A Limited-Global Fault Information Model for Dynamic Routing in 2-D Meshes. Zhen Jiang, Jie Wu. Page: 14. ...
John Wiley & Sons, Inc. eBooks, Jan 21, 2004
John Wiley & Sons, Inc. eBooks, Aug 10, 2009
International Journal of High Performance Computing Applications, Nov 1, 2008
The Journal of Supercomputing, Mar 28, 2016
Journal of Parallel and Distributed Computing, Oct 1, 2012
ABSTRACT Energy is a scarce resource in Wireless Sensor Networks (WSN). Some studies show that mo... more ABSTRACT Energy is a scarce resource in Wireless Sensor Networks (WSN). Some studies show that more than 70% of energy is consumed in data transmission in WSN. Since most of the time, the sensed information is redundant due to geographically collocated sensors, ...
arXiv (Cornell University), May 7, 2012
Concurrency and Computation: Practice and Experience, Sep 2, 2022
Performance and energy are the two most important objectives for optimization on heterogeneous hi... more Performance and energy are the two most important objectives for optimization on heterogeneous high performance computing platforms. This work studies a mathematical problem motivated by the bi‐objective optimization of data‐parallel applications on such platforms for performance and energy. First, we formulate the problem and present an exact algorithm of polynomial complexity solving the problem where all the application profiles of objective type one are continuous and strictly increasing, and all the application profiles of objective type two are linear increasing. We then apply the algorithm to develop solutions for two related optimization problems of parallel applications on heterogeneous hybrid platforms, one for performance and dynamic energy and the other for performance and total energy. Our proposed solution methods are then employed to solve the two bi‐objective optimization problems for two data‐parallel applications, matrix multiplication and gene sequencing, on a hybrid platform employing five heterogeneous processors, namely, two different Intel multicore CPUs, an Nvidia K40c GPU, an Nvidia P100 PCIe GPU, and an Intel Xeon Phi.
Concurrency and Computation: Practice and Experience, Aug 30, 2018
Lecture Notes in Computer Science, 2016
The communication layer of modern HPC platforms is getting increasingly heterogeneous and hierarc... more The communication layer of modern HPC platforms is getting increasingly heterogeneous and hierarchical. As a result, even on platforms with homogeneous processors, the communication cost of many parallel applications will significantly vary depending on the mapping of their processes to the processors of the platform. The optimal mapping, minimizing the communication cost of the application, will strongly depend on the network structure and performance as well as the logical communication flow of the application. In our previous work, we proposed a general approach and two approximate heuristic algorithms aimed at minimization of the communication cost of data parallel applications which have two-dimensional symmetric communication pattern on heterogeneous hierarchical networks, and tested these algorithms in the context of the parallel matrix multiplication application. In this paper, we develop a new algorithm that is built on top of one of these heuristic approaches in the context of a real-life application, MPDATA, which is one of the major parts of the EULAG geophysical model. We carefully study the communication flow of MPDATA and discover that even under the assumption of a perfectly homogeneous communication network, the logical communication links of this application will have different bandwidths, which makes the optimization of its communication cost particularly challenging. We propose a new algorithm that is based on cost functions of one of our general heuristic algorithms and apply it to optimization of the communication cost of MPDATA, which has asymmetric heterogeneous communication pattern. We also present experimental results demonstrating performance gains due to this optimization.
IEEE Transactions on Parallel and Distributed Systems, Mar 1, 2017
Load balancing is a widely accepted technique for performance optimization of scientific applicat... more Load balancing is a widely accepted technique for performance optimization of scientific applications on parallel architectures. Indeed, balanced applications do not waste processor cycles on waiting at points of synchronization and data exchange, maximizing this way the utilization of processors. In this paper, we challenge the universality of the load-balancing approach to optimization of the performance of parallel applications. First, we formulate conditions that should be satisfied by the performance profile of an application in order for the application to achieve its best performance via load balancing. Then we use a real-life scientific application, EULAG MPDATA kernel, to demonstrate that its performance profile on a modern parallel architecture, Intel Xeon Phi, significantly deviates from these conditions. Based on this observation, we propose a method of performance optimization of scientific applications through load imbalancing. In the case of data parallel application, the method uses functional performance models of the application to find partitioning that minimizes its computation time but not necessarily balances the load of processors. We apply this method to optimization of MPDATA on Intel Xeon Phi. Experimental results demonstrate that the performance of this carefully optimized load-balanced application can be further improved by 15percent using the proposed load-imbalancing technique.
Lecture Notes in Computer Science, 2022
Supercomputing frontiers and innovations, Dec 1, 2017
Springer eBooks, 2020
Energy is one of the most important objectives for optimization on modern heterogeneous high perf... more Energy is one of the most important objectives for optimization on modern heterogeneous high performance computing (HPC) platforms. The tight integration of multicore CPUs with accelerators in these platforms present several challenges to optimization of multithreaded data-parallel applications for dynamic energy.
Scientific Programming, 2013
DOAJ (DOAJ: Directory of Open Access Journals), 2013
... Page: 12. A Hybrid Randomized Initialization Protcol for TDMA in Single-Hop Wireless Networks... more ... Page: 12. A Hybrid Randomized Initialization Protcol for TDMA in Single-Hop Wireless Networks. Aleksandar Micicl, Ivan Stojmenovic. Page: 13. A Limited-Global Fault Information Model for Dynamic Routing in 2-D Meshes. Zhen Jiang, Jie Wu. Page: 14. ...
John Wiley & Sons, Inc. eBooks, Jan 21, 2004
John Wiley & Sons, Inc. eBooks, Aug 10, 2009
International Journal of High Performance Computing Applications, Nov 1, 2008
The Journal of Supercomputing, Mar 28, 2016
Journal of Parallel and Distributed Computing, Oct 1, 2012
ABSTRACT Energy is a scarce resource in Wireless Sensor Networks (WSN). Some studies show that mo... more ABSTRACT Energy is a scarce resource in Wireless Sensor Networks (WSN). Some studies show that more than 70% of energy is consumed in data transmission in WSN. Since most of the time, the sensed information is redundant due to geographically collocated sensors, ...
arXiv (Cornell University), May 7, 2012
Concurrency and Computation: Practice and Experience, Sep 2, 2022
Performance and energy are the two most important objectives for optimization on heterogeneous hi... more Performance and energy are the two most important objectives for optimization on heterogeneous high performance computing platforms. This work studies a mathematical problem motivated by the bi‐objective optimization of data‐parallel applications on such platforms for performance and energy. First, we formulate the problem and present an exact algorithm of polynomial complexity solving the problem where all the application profiles of objective type one are continuous and strictly increasing, and all the application profiles of objective type two are linear increasing. We then apply the algorithm to develop solutions for two related optimization problems of parallel applications on heterogeneous hybrid platforms, one for performance and dynamic energy and the other for performance and total energy. Our proposed solution methods are then employed to solve the two bi‐objective optimization problems for two data‐parallel applications, matrix multiplication and gene sequencing, on a hybrid platform employing five heterogeneous processors, namely, two different Intel multicore CPUs, an Nvidia K40c GPU, an Nvidia P100 PCIe GPU, and an Intel Xeon Phi.
Concurrency and Computation: Practice and Experience, Aug 30, 2018
Lecture Notes in Computer Science, 2016
The communication layer of modern HPC platforms is getting increasingly heterogeneous and hierarc... more The communication layer of modern HPC platforms is getting increasingly heterogeneous and hierarchical. As a result, even on platforms with homogeneous processors, the communication cost of many parallel applications will significantly vary depending on the mapping of their processes to the processors of the platform. The optimal mapping, minimizing the communication cost of the application, will strongly depend on the network structure and performance as well as the logical communication flow of the application. In our previous work, we proposed a general approach and two approximate heuristic algorithms aimed at minimization of the communication cost of data parallel applications which have two-dimensional symmetric communication pattern on heterogeneous hierarchical networks, and tested these algorithms in the context of the parallel matrix multiplication application. In this paper, we develop a new algorithm that is built on top of one of these heuristic approaches in the context of a real-life application, MPDATA, which is one of the major parts of the EULAG geophysical model. We carefully study the communication flow of MPDATA and discover that even under the assumption of a perfectly homogeneous communication network, the logical communication links of this application will have different bandwidths, which makes the optimization of its communication cost particularly challenging. We propose a new algorithm that is based on cost functions of one of our general heuristic algorithms and apply it to optimization of the communication cost of MPDATA, which has asymmetric heterogeneous communication pattern. We also present experimental results demonstrating performance gains due to this optimization.
IEEE Transactions on Parallel and Distributed Systems, Mar 1, 2017
Load balancing is a widely accepted technique for performance optimization of scientific applicat... more Load balancing is a widely accepted technique for performance optimization of scientific applications on parallel architectures. Indeed, balanced applications do not waste processor cycles on waiting at points of synchronization and data exchange, maximizing this way the utilization of processors. In this paper, we challenge the universality of the load-balancing approach to optimization of the performance of parallel applications. First, we formulate conditions that should be satisfied by the performance profile of an application in order for the application to achieve its best performance via load balancing. Then we use a real-life scientific application, EULAG MPDATA kernel, to demonstrate that its performance profile on a modern parallel architecture, Intel Xeon Phi, significantly deviates from these conditions. Based on this observation, we propose a method of performance optimization of scientific applications through load imbalancing. In the case of data parallel application, the method uses functional performance models of the application to find partitioning that minimizes its computation time but not necessarily balances the load of processors. We apply this method to optimization of MPDATA on Intel Xeon Phi. Experimental results demonstrate that the performance of this carefully optimized load-balanced application can be further improved by 15percent using the proposed load-imbalancing technique.
Lecture Notes in Computer Science, 2022
Supercomputing frontiers and innovations, Dec 1, 2017
Springer eBooks, 2020
Energy is one of the most important objectives for optimization on modern heterogeneous high perf... more Energy is one of the most important objectives for optimization on modern heterogeneous high performance computing (HPC) platforms. The tight integration of multicore CPUs with accelerators in these platforms present several challenges to optimization of multithreaded data-parallel applications for dynamic energy.