Multiple threads and parallel challenges for large simulations to accelerate a general Navier-Stokes CFD code on massively parallel systems (original) (raw)

Large-Scale CFD Parallel Computing Dealing with Massive Mesh

Journal of Engineering, 2013

In order to run CFD codes more efficiently on large scales, the parallel computing has to be employed. For example, in industrial scales, it usually uses tens of thousands of mesh cells to capture the details of complex geometries. How to distribute these mesh cells among the multiprocessors for obtaining a good parallel computing performance (HPC) is really a challenge. Due to dealing with the massive mesh cells, it is difficult for the CFD codes without parallel optimizations to handle this kind of large-scale computing. Some of the open source mesh partitioning software packages, such as Metis, ParMetis, Scotch, PT-Scotch, and Zoltan, are able to deal with the distribution of large number of mesh cells. Therefore they were employed as the parallel optimization tools ported into Code_Saturne, an open source CFD code, for testing if they can solve the issue of dealing with massive mesh cells for CFD codes. Through the studies, it was found that the mesh partitioning optimization so...

Design of a massively parallel CFD code for complex geometries

Comptes Rendus Mécanique, 2011

A strategy to build the next generation of fluid dynamics solvers able to fully benefit from high-performance computing is discussed. The procedure relies on a domain decomposition of unstructured meshes that is organized in two levels. The computing cells are first gathered at an elementary level in cell groups; at a second level, cell groups are dispatched over processors. Compared to the usual single-level domain decomposition, this double domain decomposition allows for easily optimizing the use of processor memory and therefore load balancing in both Eulerian and Lagrangian contexts. Specific communication procedures to handle faces, edges and nodes are associated to this double domain decomposition, which strongly reduce the computing cost; input-output times are optimized as well. In addition, any multi-level solution techniques, as deflated preconditioned conjugate gradient, are well-adapted to such mesh decomposition. This approach has been used to develop the YALES2 code, which also benefits from a non-degenerescent tessellation algorithm for tetrahedra to automatically generate highresolution meshes on super-computers. To illustrate the capabilities of the YALES2 algorithmic, an aeronautical burner is fully simulated with a mesh of 2.6 billion cells, followed by a demonstration test over 21 billion cells.

MPI-Parallelization of a Structured Grid CFD Solver including an Integrated Octree Grid Generator

An existing Computational Fluid Dynamics (CFD) solver is parallelized by means of MPI. The solver includes a dynamic and adaptive grid generator for Cartesian Quadtree and Octree grids, which therefore also have to be parallelized. The grid generator generates grids fulfilling a specific set of rules, that have to be enforced also in parallel. The assembly of the large sparse matrices resulting from the implicit discretization of Navier-Stokes equations is done in parallel, as is the solving process. The parallel performance of both of these processes depends heavily on a good load balancing in order to reach satisfactory speedup. Two versions of load balancing are demonstrated, one based on block swapping, and the other by utilizing the Metis or Parmetis software packages for load balancing of graphs. Results are presented for load balancing and for the parallel speedup of solving the linear algebra system of equations.

High performance parallel computing of flows in complex geometries

Comptes Rendus Mécanique, 2011

Informatique, algorithmique Calcul parallèle Dynamique des fluides numérique Efficient numerical tools taking advantage of the ever increasing power of high-performance computers, become key elements in the fields of energy supply and transportation, not only from a purely scientific point of view, but also at the design stage in industry. Indeed, flow phenomena that occur in or around the industrial applications such as gas turbines or aircraft are still not mastered. In fact, most Computational Fluid Dynamics (CFD) predictions produced today focus on reduced or simplified versions of the real systems and are usually solved with a steady state assumption. This article shows how recent developments of CFD codes and parallel computer architectures can help overcoming this barrier. With this new environment, new scientific and technological challenges can be addressed provided that thousands of computing cores are efficiently used in parallel. Strategies of modern flow solvers are discussed with particular emphases on meshpartitioning, load balancing and communication. These concepts are used in two CFD codes developed by CERFACS: a multi-block structured code dedicated to aircrafts and turbomachinery as well as an unstructured code for gas turbine flow predictions. Leading edge computations obtained with these high-end massively parallel CFD codes are illustrated and discussed in the context of aircrafts, turbo-machinery and gas turbine applications. Finally, future developments of CFD and high-end computers are proposed to provide leading edge tools and end applications with strong industrial implications at the design stage of the next generation of aircraft and gas turbines.

Parallel Computing on the Navier-Stokes Solver with the Multigrid Method

This paper is aimed to present the combination of the parallel computing and the multigrid method on the Navier-Stokes solver. The combination is based on the concept of the object-oriented programming (OOP), which consists of 4 independent modules: Grid Generation, Navier-Stokes Solver, Multigrid Method and Parallel Computing modules. The multigrid method is implemented by employing the full approximation storage (FAS) scheme for numerically solving the non-linear Navier-Stokes equations. The overall computation is performed by using the parallel computing in which a number of computers are concurrently computed for the same task but on different subdata. The two-dimensional laminar flow in a cavity at Re=1,000 is used as a test case. It is found that the computational time is decreased significantly when employing the combination of the multigrid method and the parallel computing.

A space-time parallel algorithm with adaptive mesh refinement for computational fluid dynamics

Computing and Visualization in Science, 2020

This paper describes a space-time parallel algorithm with space-time adaptive mesh refinement (AMR). AMR with subcycling is added to multigrid reduction-in-time (MGRIT) in order to provide solution efficient adaptive grids with a reduction in work performed on coarser grids. This algorithm is achieved by integrating two software libraries: XBraid (Parallel time integration with multigrid. https://computation.llnl.gov/projects/parallel-timeintegration-multigrid) and Chombo (Chombo software package for AMR applications-design document, 2014). The former is a parallel time integration library using multigrid and the latter is a massively parallel structured AMR library. Employing this adaptive space-time parallel algorithm is Chord (Comput Fluids 123:202-217, 2015), a computational fluid dynamics (CFD) application code for solving compressible fluid dynamics problems. For the same solution accuracy, speedups are demonstrated from the use of space-time parallelization over the time-sequential integration on Couette flow and Stokes' second problem. On a transient Couette flow case, at least a 1.5× speedup is achieved, and with a time periodic problem, a speedup of up to 13.7× over the time-sequential case is obtained. In both cases, the speedup is achieved by adding processors and exploring additional parallelization in time. The numerical experiments show the algorithm is promising for CFD applications that can take advantage of the time parallelism. Future work will focus on improving the parallel performance and providing more tests with complex fluid dynamics to demonstrate the full potential of the algorithm. Keywords Time-parallel • Mesh parallel-in-time • Adaptivity • Multigrid • MGRIT • High-order CFD • Finite-volume Communicated by Robert Speck.

Comparing Performance of Parallelizing Frameworks for Grid-Based Fluid Simulation on the CPU

Proceedings of the 8th Annual ACM India Conference, 2015

In this paper we present a comparison study of two widely used parallelizing frameworks on the CPU, namely, OpenMP and Intel Threading Building Blocks (TBB). The particular problem domain we apply to is a grid-based fluid simulation solver. The standard Eulerian grid-based fluid solver discretizes the Navier-Stokes equation on a staggered but regular grid and computes the fluid parameters like velocity and pressure in each grid cell. We use OpenMP and TBB to parallelize this computation, and study the behaviour of our implementation on each framework, while working with different number of threads and CPU cores. We provide arguments in support of implementing a mixed solution strategy using both the parallelizing frameworks together, thus improving performance over when either is used in isolation.

Running unstructured grid-based CFD solvers on modern graphics hardware

International Journal for Numerical Methods in Fluids, 2011

Techniques used to implement an unstructured grid solver on modern graphics hardware are described. The three-dimensional Euler equations for inviscid, compressible flow are considered. Effective memory bandwidth is improved by reducing total global memory access and overlapping redundant computation, as well as using an appropriate numbering scheme and data layout. The applicability of per-block shared memory is also considered. The performance of the solver is demonstrated on two benchmark cases: a missile and the NACA0012 wing. For a variety of mesh sizes, an average speed-up factor of roughly 9.5x is observed over the equivalent parallelized OpenMP-code running on a quad-core CPU, and roughly 33x over the equivalent code running in serial.

Analyzing the Parallel Scalability of an Implicit Unstructured Mesh CFD Code

High Performance Computing, 2000

In this paper, we identify the scalability bottlenecks of an unstructured grid CFD code (PETSc-FUN3D) by studying the impact of several algorithmic and architectural parameters and by examiningdif ferent programmingmodels. We discuss the basic performance characteristics of this PDE code with the help of simple performance models developed in our earlier work, presentingprimarily experimental results. In addition to achievingg ood

Understanding the parallel scalability of an implicit unstructured mesh cfd code

2000

In this paper, we identify the scalability bottlenecks of an unstructured grid CFD code (PETSc-FUN3D) by studying the impact of several algorithmic and architectural parameters and by examining different programming models. We discuss the basic performance characteristics of this PDE code with the help of simple performance models developed in our earlier work, presenting primarily experimental results. In addition to achieving good per-processor performance (which has been addressed in our cited work and without which scalability claims are suspect) we strive to improve the implementation and convergence scalability of PETSc-FUN3D on thousands of processors.