Abal-Kassim Cheik Ahamed - Academia.edu (original) (raw)
Papers by Abal-Kassim Cheik Ahamed
2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES), 2016
In this paper, we present, evaluate and analyse the performance of parallel synchronous Jacobi al... more In this paper, we present, evaluate and analyse the performance of parallel synchronous Jacobi algorithms by different partitioned procedures including band-row splitting, band-row sparsity pattern splitting and substructuring splitting, when solving sparse large linear systems. Numerical experiments performed on a set of academic 3D Laplace equation and on a real gravity matrices arising from the Chicxulub crater are exhibited, and show the impact of splitting on parallel synchronous iterations when solving sparse large linear systems. The numerical results clearly show the interest of substructuring methods compared to band-row splitting strategies.
Les progres en termes de puissance de calcul ont entraine de nombreuses evolutions dans le domain... more Les progres en termes de puissance de calcul ont entraine de nombreuses evolutions dans le domaine de la science et de ses applications. La resolution de systemes lineaires survient frequemment dans le calcul scientifique, comme par exemple lors de la resolution d'equations aux derivees partielles par la methode des elements finis. Le temps de resolution decoule alors directement des performances des operations algebriques mises en jeu.Cette these a pour but de developper des algorithmes paralleles innovants pour la resolution de systemes lineaires creux de grandes tailles. Nous etudions et proposons comment calculer efficacement les operations d'algebre lineaire sur plateformes de calcul multi-coeur heterogenes-GPU afin d'optimiser et de rendre robuste la resolution de ces systemes. Nous proposons de nouvelles techniques d'acceleration basees sur la distribution automatique (auto-tuning) des threads sur la grille GPU suivant les caracteristiques du probleme et le ni...
2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES), 2016
In this paper, we present and analyze parallel substructuring methods based on conjugate gradient... more In this paper, we present and analyze parallel substructuring methods based on conjugate gradient method, a iterative Krylov method, for solving sparse linear systems on GPUs. Numerical experiments performed on a set of matrices coming from the finite element analysis of large scale engineering problems, show the efficiency and robustness of substructuring methods based on iterative Krylov method for solving sparse linear systems in a context of a hybrid multi-core-GPU.
Advances in Engineering Software, 2016
Abstract Performance computations depend on the machine architecture, the operating system, the p... more Abstract Performance computations depend on the machine architecture, the operating system, the problem studied and obviously on the programming implementation. Solving partial differential equations by numerical methods such as the finite element method requires the solution of large sparse linear systems. Graphics processing unit (GPU) is now commonly used to accelerate numerical simulations and most supercomputers provide large number of GPUs to their users. This paper proposes a comparison of both CUDA and OpenCL GPU languages to take the highest performance of multi-GPUs clusters. We analyse, evaluate and compare their respective performances for computing linear algebra operations and for solving large sparse linear systems with the conjugate gradient iterative method on multi-GPUs clusters.
Concurrency and Computation: Practice and Experience, 2015
To answer the question ‘How much energy is consumed for a numerical simulation running on Graphic... more To answer the question ‘How much energy is consumed for a numerical simulation running on Graphic Processing Unit?’, an experimental protocol is here established. The current provided to a graphic processing unit (GPU) during computation is directly measured using amperometric clamps. Signal processing on the intensity of the current of the power supplied to a GPU, with noise reduction technique, gives precise timing of GPU states, which allow establishing an energy consumption model of the GPU. Energy consumption of each operation, memory copy, vector addition, and element wise product is precisely measured to tune and validate the energy consumption model. The accuracy of the proposed energy consumption model compared to measurements is finally illustrated on a conjugate gradient method for a problem discretized by a finite element method. Copyright © 2015 John Wiley & Sons, Ltd.
The Journal of Supercomputing, 2016
In this paper, an original Jacobi implementation is considered for the solution of sparse linear ... more In this paper, an original Jacobi implementation is considered for the solution of sparse linear systems of equations. The proposed algorithm helps to optimize the parallel implementation on GPU. The performance analysis of GPU-based (using CUDA) algorithm of the implementation of this algorithm is compared to the corresponding serial CPU-based algorithm. Numerical experiments performed on a set of matrices arising from the finite element discretization of various equations (3D Laplace equation, 3D gravitational potential equation, 3D Heat equation) with different meshes, illustrate the performance, robustness and efficiency of our algorithm, with a speed up to 23$$\times × in double-precision arithmetics.
2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014
In this paper, we aim to introduce a new perspective when comparing highly parallelized algorithm... more In this paper, we aim to introduce a new perspective when comparing highly parallelized algorithms on GPU: the energy consumption of the GPU. We give an analysis of the performance of linear algebra operations, including addition of vectors, element-wise product, dot product and sparse matrix-vector product, in order to validate our experimental protocol. We also analyze their uses within conjugate gradient method for solving the gravity equations on Graphics Processing Unit (GPU). Cusp library is considered and compared to our own implementation with a set of real matrices arrising from the Chicxulub crater and obtained by the finite element discretization of the gravity equations. The experiments demonstrate the performance and robustness of our implementation in terms of energy efficiency.
2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014
The main objective of this work consists in analyzing sub-structuring method for the parallel sol... more The main objective of this work consists in analyzing sub-structuring method for the parallel solution of sparse linear systems with matrices arising from the discretization of partial differential equations such as finite element, finite volume and finite difference. With the success encountered by the general-purpose processing on graphics processing units (GPGPU), we develop an hybrid multiGPUs and CPUs sub-structuring algorithm. GPU computing, with CUDA, is used to accelerate the operations performed on each processor. Numerical experiments have been performed on a set of matrices arising from engineering problems. We compare C+MPI implementation on classical CPU cluster with C+MPI+CUDA on a cluster of GPU. The performance comparison shows a speed-up for the sub-structuring method up to 19 times in double precision by using CUDA.
2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014
In this paper, the authors propose an analysis of the frequency response function in a car compar... more In this paper, the authors propose an analysis of the frequency response function in a car compartment, subject to some fluctuating pressure distribution along the open cavity of the sun-roof at the top of a car. Coupling of a computational fluid dynamics and of a computational acoustics code is considered to simulate the acoustic fluid-structure interaction problem. Iterative Krylov methods and domain decomposition methods, tuned on Graphic Processing Unit (GPU), are considered to solve the acoustic problem with complex number arithmetics with double precision. Numerical simulations illustrate the efficiency, robustness and accuracy of the proposed approaches.
2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014
This paper gives an analysis and an evaluation of linear algebra operations on Graphics Processin... more This paper gives an analysis and an evaluation of linear algebra operations on Graphics Processing Unit (GPU) with complex number arithmetics with double precision. Knowing the performance of these operations, iterative Krylov methods are considered to solve the acoustic problem efficiently. Numerical experiments carried out on a set of acoustic matrices arising from the modelisation of acoustic phenomena within a cylinder and a car compartment are exposed, exhibiting the performance, robustness and efficiency of our algorithms, with a ratio up to 27× for dot product, 10× for sparse matrix-vector product and solvers in complex double precision arithmetics.
International Journal of High Performance Computing Applications, 2015
ABSTRACT Direct and iterative methods are often used to solve linear systems in engineering. The ... more ABSTRACT Direct and iterative methods are often used to solve linear systems in engineering. The matrices involved can be large, which leads to heavy computations on the central processing unit. A graphics processing unit can be used to accelerate these computations. In this paper, we propose a new library, named Alinea, for advanced linear algebra. This library is implemented in C++, CUDA and OpenCL. It includes several linear algebra operations and numerous algorithms for solving linear systems. For both central processing unit and graphic processing unit devices, there are different matrix storage formats, and real and complex arithmetics in single- and double-precision. The CUDA version includes a self-tuning of the grid, i.e. threading distribution, depending upon the hardware configuration and the size of the problems. Numerical experiments and comparison with existing libraries illustrates the efficiency, accuracy and robustness of the proposed library.
2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014
Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide... more Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide range of computationally intensive applications. Compared to CPUs, GPUs can achieve higher performances at accelerating the programs' execution in an energyefficient way. Therefore GPGPU computing is useful for high performance computing applications and in many scientific research fields. In order to bring further performance improvements, GPU clusters are increasingly adopted. The energy consumed by GPUs cannot be neglected. Therefore, an energy-efficient time scheduling of the programs that are going to be executed by the parallel GPUs based on their deadline as well as the assigned priorities could be deployed to face their energetic avidity. For this reason, we present in this paper a model enabling the measure of the power consumption and the time execution of some elementary operations running on a single GPU using a new developed energy measurement protocol. Consequently, using our methodology, energy needs of a program could be predicted, allowing a better task scheduling.
Pollack Periodica, 2015
Engineering problems involve the solution of large sparse linear systems, and require therefore f... more Engineering problems involve the solution of large sparse linear systems, and require therefore fast and high performance algorithms for algebra operations such as dot product, and matrix-vector multiplication. During the last decade, graphics processing units have been widely used. In this paper, linear algebra operations on graphics processing unit for single and double precision (with real and complex arithmetic) are analyzed in order to make iterative Krylov algorithms efficient compared to central processing units implementation. The performance of the proposed method is evaluated for the Laplace and the Helmholtz equations. Numerical experiments clearly show the robustness and effectiveness of the graphics processing unit tuned algorithms for compressed-sparse row data storage.
This paper presents the performance of linear algebra operations together with their uses within ... more This paper presents the performance of linear algebra operations together with their uses within iterative Krylov methods for solving the gravity equations on Graphics Processing Unit (GPU). Numerical experiments performed on a set of real gravity matrices arising from the Chicxulub crater are exposed, showing the performance, robustness andefficiency of our algorithms, with a speed-up of up to thirty in
2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems, 2012
Finite element analysis involves the solution of linear systems described by large size sparse ma... more Finite element analysis involves the solution of linear systems described by large size sparse matrices. Iterative Krylov methods are well suited for such type of problems. These methods require linear algebra operations, including sparse matrix-vector multiplication which can be computationally expensive for large size matrices. In this paper, we present the best way to perform these operations, in double precision,
International Journal of Computer Mathematics, 2015
ABSTRACT Many engineering and scientific problems need to solve boundary value problems for parti... more ABSTRACT Many engineering and scientific problems need to solve boundary value problems for partial differential equations or systems of them. For most cases, to obtain the solution with desired precision and in acceptable time, the only practical way is to harness the power of parallel processing. In this paper we present some effective applications of parallel processing based on hybrid CPU/GPU domain decomposition method. Within the family of domain decomposition methods, the so called optimised Schwarz methods have proven to have good convergence behaviour compared to classical Schwarz methods. The price for this feature is the need to transfer more physical information between subdomain interfaces. For solving large systems of linear algebraic equations resulting from the finite element discretisation of the subproblem for each subdomain, Krylov method is often a good choice. Since the overall efficiency of such methods depends on effective calculation of sparse matrix-vector product, approaches that use Graphics Processing Unit (GPU) instead of Central Processing Unit (CPU) for such task look very promising. In this paper we discuss effective implementation of algebraic operations for iterative Krylov methods on GPU. In order to ensure good performance for the non-overlapping Schwarz method, we propose to use optimised conditions obtained by a stochastic technique based on the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). The performance, robustness, and accuracy of the proposed approach are demonstrated for the solution of the gravitational potential equation for the data acquired from the geological survey of Chicxulub crater.
Lecture Notes in Computational Science and Engineering, 2014
2013 12th International Symposium on Distributed Computing and Applications to Business, Engineering & Science, 2013
In this paper, we solve the gravity equations on hybrid multi-CPU/GPU using high order finite ele... more In this paper, we solve the gravity equations on hybrid multi-CPU/GPU using high order finite elements. Domain decomposition methods are inherently parallel algorithms making them excellent candidates for implementation on hybrid architectures. Here, we propose a new stochastic-based optimization procedure for the optimized Schwarz domain decomposition method, which is implemented and tuned to graphics processors unit. To obtain high speed-up,
2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems, 2012
Many engineering and science problems require a computational effort to solve large sparse linear... more Many engineering and science problems require a computational effort to solve large sparse linear systems. Krylov subspace based iterative solvers have been widely used in that direction. Iterative Krylov methods involve linear algebra operations such as summation of vectors, dot product, norm, and matrix-vector multiplication. Since these operations could be very costly in computation time on Central Processing Unit (CPU),
2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES), 2016
In this paper, we present, evaluate and analyse the performance of parallel synchronous Jacobi al... more In this paper, we present, evaluate and analyse the performance of parallel synchronous Jacobi algorithms by different partitioned procedures including band-row splitting, band-row sparsity pattern splitting and substructuring splitting, when solving sparse large linear systems. Numerical experiments performed on a set of academic 3D Laplace equation and on a real gravity matrices arising from the Chicxulub crater are exhibited, and show the impact of splitting on parallel synchronous iterations when solving sparse large linear systems. The numerical results clearly show the interest of substructuring methods compared to band-row splitting strategies.
Les progres en termes de puissance de calcul ont entraine de nombreuses evolutions dans le domain... more Les progres en termes de puissance de calcul ont entraine de nombreuses evolutions dans le domaine de la science et de ses applications. La resolution de systemes lineaires survient frequemment dans le calcul scientifique, comme par exemple lors de la resolution d'equations aux derivees partielles par la methode des elements finis. Le temps de resolution decoule alors directement des performances des operations algebriques mises en jeu.Cette these a pour but de developper des algorithmes paralleles innovants pour la resolution de systemes lineaires creux de grandes tailles. Nous etudions et proposons comment calculer efficacement les operations d'algebre lineaire sur plateformes de calcul multi-coeur heterogenes-GPU afin d'optimiser et de rendre robuste la resolution de ces systemes. Nous proposons de nouvelles techniques d'acceleration basees sur la distribution automatique (auto-tuning) des threads sur la grille GPU suivant les caracteristiques du probleme et le ni...
2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES), 2016
In this paper, we present and analyze parallel substructuring methods based on conjugate gradient... more In this paper, we present and analyze parallel substructuring methods based on conjugate gradient method, a iterative Krylov method, for solving sparse linear systems on GPUs. Numerical experiments performed on a set of matrices coming from the finite element analysis of large scale engineering problems, show the efficiency and robustness of substructuring methods based on iterative Krylov method for solving sparse linear systems in a context of a hybrid multi-core-GPU.
Advances in Engineering Software, 2016
Abstract Performance computations depend on the machine architecture, the operating system, the p... more Abstract Performance computations depend on the machine architecture, the operating system, the problem studied and obviously on the programming implementation. Solving partial differential equations by numerical methods such as the finite element method requires the solution of large sparse linear systems. Graphics processing unit (GPU) is now commonly used to accelerate numerical simulations and most supercomputers provide large number of GPUs to their users. This paper proposes a comparison of both CUDA and OpenCL GPU languages to take the highest performance of multi-GPUs clusters. We analyse, evaluate and compare their respective performances for computing linear algebra operations and for solving large sparse linear systems with the conjugate gradient iterative method on multi-GPUs clusters.
Concurrency and Computation: Practice and Experience, 2015
To answer the question ‘How much energy is consumed for a numerical simulation running on Graphic... more To answer the question ‘How much energy is consumed for a numerical simulation running on Graphic Processing Unit?’, an experimental protocol is here established. The current provided to a graphic processing unit (GPU) during computation is directly measured using amperometric clamps. Signal processing on the intensity of the current of the power supplied to a GPU, with noise reduction technique, gives precise timing of GPU states, which allow establishing an energy consumption model of the GPU. Energy consumption of each operation, memory copy, vector addition, and element wise product is precisely measured to tune and validate the energy consumption model. The accuracy of the proposed energy consumption model compared to measurements is finally illustrated on a conjugate gradient method for a problem discretized by a finite element method. Copyright © 2015 John Wiley & Sons, Ltd.
The Journal of Supercomputing, 2016
In this paper, an original Jacobi implementation is considered for the solution of sparse linear ... more In this paper, an original Jacobi implementation is considered for the solution of sparse linear systems of equations. The proposed algorithm helps to optimize the parallel implementation on GPU. The performance analysis of GPU-based (using CUDA) algorithm of the implementation of this algorithm is compared to the corresponding serial CPU-based algorithm. Numerical experiments performed on a set of matrices arising from the finite element discretization of various equations (3D Laplace equation, 3D gravitational potential equation, 3D Heat equation) with different meshes, illustrate the performance, robustness and efficiency of our algorithm, with a speed up to 23$$\times × in double-precision arithmetics.
2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014
In this paper, we aim to introduce a new perspective when comparing highly parallelized algorithm... more In this paper, we aim to introduce a new perspective when comparing highly parallelized algorithms on GPU: the energy consumption of the GPU. We give an analysis of the performance of linear algebra operations, including addition of vectors, element-wise product, dot product and sparse matrix-vector product, in order to validate our experimental protocol. We also analyze their uses within conjugate gradient method for solving the gravity equations on Graphics Processing Unit (GPU). Cusp library is considered and compared to our own implementation with a set of real matrices arrising from the Chicxulub crater and obtained by the finite element discretization of the gravity equations. The experiments demonstrate the performance and robustness of our implementation in terms of energy efficiency.
2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014
The main objective of this work consists in analyzing sub-structuring method for the parallel sol... more The main objective of this work consists in analyzing sub-structuring method for the parallel solution of sparse linear systems with matrices arising from the discretization of partial differential equations such as finite element, finite volume and finite difference. With the success encountered by the general-purpose processing on graphics processing units (GPGPU), we develop an hybrid multiGPUs and CPUs sub-structuring algorithm. GPU computing, with CUDA, is used to accelerate the operations performed on each processor. Numerical experiments have been performed on a set of matrices arising from engineering problems. We compare C+MPI implementation on classical CPU cluster with C+MPI+CUDA on a cluster of GPU. The performance comparison shows a speed-up for the sub-structuring method up to 19 times in double precision by using CUDA.
2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014
In this paper, the authors propose an analysis of the frequency response function in a car compar... more In this paper, the authors propose an analysis of the frequency response function in a car compartment, subject to some fluctuating pressure distribution along the open cavity of the sun-roof at the top of a car. Coupling of a computational fluid dynamics and of a computational acoustics code is considered to simulate the acoustic fluid-structure interaction problem. Iterative Krylov methods and domain decomposition methods, tuned on Graphic Processing Unit (GPU), are considered to solve the acoustic problem with complex number arithmetics with double precision. Numerical simulations illustrate the efficiency, robustness and accuracy of the proposed approaches.
2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014
This paper gives an analysis and an evaluation of linear algebra operations on Graphics Processin... more This paper gives an analysis and an evaluation of linear algebra operations on Graphics Processing Unit (GPU) with complex number arithmetics with double precision. Knowing the performance of these operations, iterative Krylov methods are considered to solve the acoustic problem efficiently. Numerical experiments carried out on a set of acoustic matrices arising from the modelisation of acoustic phenomena within a cylinder and a car compartment are exposed, exhibiting the performance, robustness and efficiency of our algorithms, with a ratio up to 27× for dot product, 10× for sparse matrix-vector product and solvers in complex double precision arithmetics.
International Journal of High Performance Computing Applications, 2015
ABSTRACT Direct and iterative methods are often used to solve linear systems in engineering. The ... more ABSTRACT Direct and iterative methods are often used to solve linear systems in engineering. The matrices involved can be large, which leads to heavy computations on the central processing unit. A graphics processing unit can be used to accelerate these computations. In this paper, we propose a new library, named Alinea, for advanced linear algebra. This library is implemented in C++, CUDA and OpenCL. It includes several linear algebra operations and numerous algorithms for solving linear systems. For both central processing unit and graphic processing unit devices, there are different matrix storage formats, and real and complex arithmetics in single- and double-precision. The CUDA version includes a self-tuning of the grid, i.e. threading distribution, depending upon the hardware configuration and the size of the problems. Numerical experiments and comparison with existing libraries illustrates the efficiency, accuracy and robustness of the proposed library.
2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014
Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide... more Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide range of computationally intensive applications. Compared to CPUs, GPUs can achieve higher performances at accelerating the programs' execution in an energyefficient way. Therefore GPGPU computing is useful for high performance computing applications and in many scientific research fields. In order to bring further performance improvements, GPU clusters are increasingly adopted. The energy consumed by GPUs cannot be neglected. Therefore, an energy-efficient time scheduling of the programs that are going to be executed by the parallel GPUs based on their deadline as well as the assigned priorities could be deployed to face their energetic avidity. For this reason, we present in this paper a model enabling the measure of the power consumption and the time execution of some elementary operations running on a single GPU using a new developed energy measurement protocol. Consequently, using our methodology, energy needs of a program could be predicted, allowing a better task scheduling.
Pollack Periodica, 2015
Engineering problems involve the solution of large sparse linear systems, and require therefore f... more Engineering problems involve the solution of large sparse linear systems, and require therefore fast and high performance algorithms for algebra operations such as dot product, and matrix-vector multiplication. During the last decade, graphics processing units have been widely used. In this paper, linear algebra operations on graphics processing unit for single and double precision (with real and complex arithmetic) are analyzed in order to make iterative Krylov algorithms efficient compared to central processing units implementation. The performance of the proposed method is evaluated for the Laplace and the Helmholtz equations. Numerical experiments clearly show the robustness and effectiveness of the graphics processing unit tuned algorithms for compressed-sparse row data storage.
This paper presents the performance of linear algebra operations together with their uses within ... more This paper presents the performance of linear algebra operations together with their uses within iterative Krylov methods for solving the gravity equations on Graphics Processing Unit (GPU). Numerical experiments performed on a set of real gravity matrices arising from the Chicxulub crater are exposed, showing the performance, robustness andefficiency of our algorithms, with a speed-up of up to thirty in
2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems, 2012
Finite element analysis involves the solution of linear systems described by large size sparse ma... more Finite element analysis involves the solution of linear systems described by large size sparse matrices. Iterative Krylov methods are well suited for such type of problems. These methods require linear algebra operations, including sparse matrix-vector multiplication which can be computationally expensive for large size matrices. In this paper, we present the best way to perform these operations, in double precision,
International Journal of Computer Mathematics, 2015
ABSTRACT Many engineering and scientific problems need to solve boundary value problems for parti... more ABSTRACT Many engineering and scientific problems need to solve boundary value problems for partial differential equations or systems of them. For most cases, to obtain the solution with desired precision and in acceptable time, the only practical way is to harness the power of parallel processing. In this paper we present some effective applications of parallel processing based on hybrid CPU/GPU domain decomposition method. Within the family of domain decomposition methods, the so called optimised Schwarz methods have proven to have good convergence behaviour compared to classical Schwarz methods. The price for this feature is the need to transfer more physical information between subdomain interfaces. For solving large systems of linear algebraic equations resulting from the finite element discretisation of the subproblem for each subdomain, Krylov method is often a good choice. Since the overall efficiency of such methods depends on effective calculation of sparse matrix-vector product, approaches that use Graphics Processing Unit (GPU) instead of Central Processing Unit (CPU) for such task look very promising. In this paper we discuss effective implementation of algebraic operations for iterative Krylov methods on GPU. In order to ensure good performance for the non-overlapping Schwarz method, we propose to use optimised conditions obtained by a stochastic technique based on the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). The performance, robustness, and accuracy of the proposed approach are demonstrated for the solution of the gravitational potential equation for the data acquired from the geological survey of Chicxulub crater.
Lecture Notes in Computational Science and Engineering, 2014
2013 12th International Symposium on Distributed Computing and Applications to Business, Engineering & Science, 2013
In this paper, we solve the gravity equations on hybrid multi-CPU/GPU using high order finite ele... more In this paper, we solve the gravity equations on hybrid multi-CPU/GPU using high order finite elements. Domain decomposition methods are inherently parallel algorithms making them excellent candidates for implementation on hybrid architectures. Here, we propose a new stochastic-based optimization procedure for the optimized Schwarz domain decomposition method, which is implemented and tuned to graphics processors unit. To obtain high speed-up,
2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems, 2012
Many engineering and science problems require a computational effort to solve large sparse linear... more Many engineering and science problems require a computational effort to solve large sparse linear systems. Krylov subspace based iterative solvers have been widely used in that direction. Iterative Krylov methods involve linear algebra operations such as summation of vectors, dot product, norm, and matrix-vector multiplication. Since these operations could be very costly in computation time on Central Processing Unit (CPU),