Chekuri Choudary - Academia.edu (original) (raw)

Address: United States

less

Related Authors

Steven Pinker

Remo Caponi

Armando Marques-Guedes

Fabio Cuzzolin

Cabbar Veysel Baysal

Andrea Cherubini

Amir Mosavi

Amir Mosavi

German Research Center for Artificial Intelligence

Nadir  Kabache

spyros G tzafestas

Jana  Javornik

Uploads

Papers by Chekuri Choudary

Research paper thumbnail of Autotuning FPGA Design Parameters for Performance and Power

2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

Research paper thumbnail of Direct numerical simulation of catalytic combustion in a multi-channel monolith reactor using personal computers with emerging architectures

Computational Fluid Dynamic modeling of full-scale monolithic catalytic reactors has remained elu... more Computational Fluid Dynamic modeling of full-scale monolithic catalytic reactors has remained elusive due to the extreme computational requirements. While simulation of full-scale catalytic reactors would require domain decomposition based parallelism and use of multiple central processing units, significant performance enhancement can be achieved by fully utilizing the compute resources available within each node in emerging architectures. Here, a serial reacting flow solver was used as a starting point. Performance was enhanced using multi-threading for acceleration of surface chemistry, material properties calculations, and species equation solvers, and using graphical processing units for acceleration of the linear solvers and pre-conditioners. Of the two test cases presented here, the largest test case entails steady-state calculations for catalytic methane-air combustion with 22 reaction steps and 19 species within a 13-channel catalytic monolith reactor discretized using 313,872 control volumes. For this particular test case, a speed-up factor of about 4.5 over serial calculations is noted.

Research paper thumbnail of Stencil-Aware GPU Optimization of Iterative Solvers

Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newt... more Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newton-Krylov methods, which linearize a finite-difference stencil-based discretization of a problem, producing a sparse matrix with regular structure. Knowledge of this structure can be used to exploit parallelism and locality of reference on modern cache-based multi-and many-core architectures, achieving high performance for computations underlying commonly used iterative linear solvers. In this paper we describe our approach to sparse matrix data structure design and our implementation of the kernels underlying iterative linear solvers in PETSc. We also describe autotuning of CUDA implementations based on high-level descriptions of the stencil-based matrix and vector operations.

Research paper thumbnail of Stencil-Aware GPU Optimization of Iterative Solvers

SIAM Journal on Scientific Computing, 2013

Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newt... more Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newton-Krylov methods, which linearize a finite-difference stencil-based discretization of a problem, producing a sparse matrix with regular structure. Knowledge of this structure can be used to exploit parallelism and locality of reference on modern cache-based multi-and manycore architectures, achieving high performance for computations underlying commonly used iterative linear solvers. In this paper we describe our approach to sparse matrix data structure design and our implementation of the kernels underlying iterative linear solvers in PETSc. We also describe autotuning of CUDA implementations based on high-level descriptions of the stencil-based matrix and vector operations.

Research paper thumbnail of Autotuning FPGA Design Parameters for Performance and Power

2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

Research paper thumbnail of Direct numerical simulation of catalytic combustion in a multi-channel monolith reactor using personal computers with emerging architectures

Computational Fluid Dynamic modeling of full-scale monolithic catalytic reactors has remained elu... more Computational Fluid Dynamic modeling of full-scale monolithic catalytic reactors has remained elusive due to the extreme computational requirements. While simulation of full-scale catalytic reactors would require domain decomposition based parallelism and use of multiple central processing units, significant performance enhancement can be achieved by fully utilizing the compute resources available within each node in emerging architectures. Here, a serial reacting flow solver was used as a starting point. Performance was enhanced using multi-threading for acceleration of surface chemistry, material properties calculations, and species equation solvers, and using graphical processing units for acceleration of the linear solvers and pre-conditioners. Of the two test cases presented here, the largest test case entails steady-state calculations for catalytic methane-air combustion with 22 reaction steps and 19 species within a 13-channel catalytic monolith reactor discretized using 313,872 control volumes. For this particular test case, a speed-up factor of about 4.5 over serial calculations is noted.

Research paper thumbnail of Stencil-Aware GPU Optimization of Iterative Solvers

Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newt... more Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newton-Krylov methods, which linearize a finite-difference stencil-based discretization of a problem, producing a sparse matrix with regular structure. Knowledge of this structure can be used to exploit parallelism and locality of reference on modern cache-based multi-and many-core architectures, achieving high performance for computations underlying commonly used iterative linear solvers. In this paper we describe our approach to sparse matrix data structure design and our implementation of the kernels underlying iterative linear solvers in PETSc. We also describe autotuning of CUDA implementations based on high-level descriptions of the stencil-based matrix and vector operations.

Research paper thumbnail of Stencil-Aware GPU Optimization of Iterative Solvers

SIAM Journal on Scientific Computing, 2013

Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newt... more Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newton-Krylov methods, which linearize a finite-difference stencil-based discretization of a problem, producing a sparse matrix with regular structure. Knowledge of this structure can be used to exploit parallelism and locality of reference on modern cache-based multi-and manycore architectures, achieving high performance for computations underlying commonly used iterative linear solvers. In this paper we describe our approach to sparse matrix data structure design and our implementation of the kernels underlying iterative linear solvers in PETSc. We also describe autotuning of CUDA implementations based on high-level descriptions of the stencil-based matrix and vector operations.

Log In