Chekuri Choudary - Academia.edu (original) (raw)
Address: United States
less
Related Authors
German Research Center for Artificial Intelligence
Uploads
Papers by Chekuri Choudary
2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, 2015
Computational Fluid Dynamic modeling of full-scale monolithic catalytic reactors has remained elu... more Computational Fluid Dynamic modeling of full-scale monolithic catalytic reactors has remained elusive due to the extreme computational requirements. While simulation of full-scale catalytic reactors would require domain decomposition based parallelism and use of multiple central processing units, significant performance enhancement can be achieved by fully utilizing the compute resources available within each node in emerging architectures. Here, a serial reacting flow solver was used as a starting point. Performance was enhanced using multi-threading for acceleration of surface chemistry, material properties calculations, and species equation solvers, and using graphical processing units for acceleration of the linear solvers and pre-conditioners. Of the two test cases presented here, the largest test case entails steady-state calculations for catalytic methane-air combustion with 22 reaction steps and 19 species within a 13-channel catalytic monolith reactor discretized using 313,872 control volumes. For this particular test case, a speed-up factor of about 4.5 over serial calculations is noted.
Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newt... more Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newton-Krylov methods, which linearize a finite-difference stencil-based discretization of a problem, producing a sparse matrix with regular structure. Knowledge of this structure can be used to exploit parallelism and locality of reference on modern cache-based multi-and many-core architectures, achieving high performance for computations underlying commonly used iterative linear solvers. In this paper we describe our approach to sparse matrix data structure design and our implementation of the kernels underlying iterative linear solvers in PETSc. We also describe autotuning of CUDA implementations based on high-level descriptions of the stencil-based matrix and vector operations.
SIAM Journal on Scientific Computing, 2013
Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newt... more Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newton-Krylov methods, which linearize a finite-difference stencil-based discretization of a problem, producing a sparse matrix with regular structure. Knowledge of this structure can be used to exploit parallelism and locality of reference on modern cache-based multi-and manycore architectures, achieving high performance for computations underlying commonly used iterative linear solvers. In this paper we describe our approach to sparse matrix data structure design and our implementation of the kernels underlying iterative linear solvers in PETSc. We also describe autotuning of CUDA implementations based on high-level descriptions of the stencil-based matrix and vector operations.
2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, 2015
Computational Fluid Dynamic modeling of full-scale monolithic catalytic reactors has remained elu... more Computational Fluid Dynamic modeling of full-scale monolithic catalytic reactors has remained elusive due to the extreme computational requirements. While simulation of full-scale catalytic reactors would require domain decomposition based parallelism and use of multiple central processing units, significant performance enhancement can be achieved by fully utilizing the compute resources available within each node in emerging architectures. Here, a serial reacting flow solver was used as a starting point. Performance was enhanced using multi-threading for acceleration of surface chemistry, material properties calculations, and species equation solvers, and using graphical processing units for acceleration of the linear solvers and pre-conditioners. Of the two test cases presented here, the largest test case entails steady-state calculations for catalytic methane-air combustion with 22 reaction steps and 19 species within a 13-channel catalytic monolith reactor discretized using 313,872 control volumes. For this particular test case, a speed-up factor of about 4.5 over serial calculations is noted.
Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newt... more Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newton-Krylov methods, which linearize a finite-difference stencil-based discretization of a problem, producing a sparse matrix with regular structure. Knowledge of this structure can be used to exploit parallelism and locality of reference on modern cache-based multi-and many-core architectures, achieving high performance for computations underlying commonly used iterative linear solvers. In this paper we describe our approach to sparse matrix data structure design and our implementation of the kernels underlying iterative linear solvers in PETSc. We also describe autotuning of CUDA implementations based on high-level descriptions of the stencil-based matrix and vector operations.
SIAM Journal on Scientific Computing, 2013
Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newt... more Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newton-Krylov methods, which linearize a finite-difference stencil-based discretization of a problem, producing a sparse matrix with regular structure. Knowledge of this structure can be used to exploit parallelism and locality of reference on modern cache-based multi-and manycore architectures, achieving high performance for computations underlying commonly used iterative linear solvers. In this paper we describe our approach to sparse matrix data structure design and our implementation of the kernels underlying iterative linear solvers in PETSc. We also describe autotuning of CUDA implementations based on high-level descriptions of the stencil-based matrix and vector operations.