High performance parallel computing of flows in complex geometries (original) (raw)

Development and Validation of a Massively Parallel Flow Solver for Turbomachinery Flows

Journal of Propulsion and Power, 2001

This paper presents the development and validation of the unsteady, three-dimensional, multiblock, parallel turbomachinery flow solver, TFLO. The Unsteady Reynolds Averaged Navier-Stokes (Unsteady RANS) equations are solved using a cell-centered discretization on arbitrary multiblock meshes. The solution procedure is based on efficient explicit Runge-Kutta methods with several convergence acceleration techniques such as multigrid, residual averaging, and local time-stepping. The algebraic Baldwin-Lomax, the one-equation Spalart-Allmaras, and the two-equation Wilcox k-w turbulence models are implemented. The solver is parallelized using domain decomposition, an SPMD (Single Program Multiple Data) strategy, and the Message Passing Interface (MPI) Standard. A mixing model and a sliding mesh interface approach have been implemented to exchange flow information between blade rows in both steady and unsteady rotor/stator interaction flows. The dual-time stepping technique is applied to advance unsteady computations in time. This paper focuses heavily on the initial validation of the flow solver, TFLO, with emphasis on steady-state calculation of multiple blade-row flows. For validation and verification purposes, results from TFLO are compared with both existing experimental data and computational results from other software used in industry. The large set of cases tested increases our confidence in the ability of TFLO to accurately predict flows inside typical turbomachinery geometries, and sets the stage for the large-scale computation of unsteady, multiple blade-row flows.

Design of a massively parallel CFD code for complex geometries

Comptes Rendus Mécanique, 2011

A strategy to build the next generation of fluid dynamics solvers able to fully benefit from high-performance computing is discussed. The procedure relies on a domain decomposition of unstructured meshes that is organized in two levels. The computing cells are first gathered at an elementary level in cell groups; at a second level, cell groups are dispatched over processors. Compared to the usual single-level domain decomposition, this double domain decomposition allows for easily optimizing the use of processor memory and therefore load balancing in both Eulerian and Lagrangian contexts. Specific communication procedures to handle faces, edges and nodes are associated to this double domain decomposition, which strongly reduce the computing cost; input-output times are optimized as well. In addition, any multi-level solution techniques, as deflated preconditioned conjugate gradient, are well-adapted to such mesh decomposition. This approach has been used to develop the YALES2 code, which also benefits from a non-degenerescent tessellation algorithm for tetrahedra to automatically generate highresolution meshes on super-computers. To illustrate the capabilities of the YALES2 algorithmic, an aeronautical burner is fully simulated with a mesh of 2.6 billion cells, followed by a demonstration test over 21 billion cells.

Parallelization Strategies for Computational Fluid Dynamics Software: State of the Art Review

Computational fluid dynamics (CFD) is one of the most emerging fields of fluid mechanics used to analyze fluid flow situation. This analysis is based on simulations carried out on computing machines. For complex configurations, the grid points are so large that the computational time required to obtain the results are very high. Parallel computing is adopted to reduce the computational time of CFD by utilizing the available resource of computing. Parallel computing tools like OpenMP, MPI, CUDA, combination of these and few others are used to achieve parallelization of CFD software. This article provides a comprehensive state of the art review of important CFD areas and parallelization strategies for the related software. Issues related to the computational time complexities and parallelization of CFD software are highlighted. Benefits and issues of using various parallel computing tools for parallelization of CFD software are briefed. Open areas of CFD where parallelization is not much attempted are identified and parallel computing tools which can be useful for parallelization of CFD software are spotlighted. Few suggestions for future work in parallel computing of CFD software are also provided.

MEGAFLOW: Parallel complete aircraft CFD

Parallel Computing, 2001

As a consequence of the worldwide competition in the aircraft market requirements for the accurate prediction of aerodynamic performance and optimization of con®gurations have increased very much. More sophisticated wind tunnel as well as high quality CFD techniques have become necessary and essential tools for aircraft industry aerodynamic development groups. This requires an ongoing struggle for eciency improvements where parallel computing is one of the major issues. This paper considers the MEGAFLOW activities in the area of parallel¯ow solvers including their application and use in the industrial framework. It describes the parallelization principles for the structured multi-block Navier±Stokes code FLOWer and the unstructured Navier±Stokes code TAU. Principle results for industrial applications are given with respect to eciency, speed-up, load balancing, etc. Computational examples demonstrate the quality of the solvers for 3D¯ow over wing/body con®gurations as well as complete transport aircraft.

European Conference on Computational Fluid Dynamics ECCOMAS CFD 2006

2013

Abstract. This paper deals with the implementation and performance analysis of a parallel Algebraic Multigrid Solver (pAMG) for a finite volume unstructured CFD code. The parallelization of the solver is based on the domain decomposition approach using the single program multiple data paradigm. The Message passing interface Library (MPI) is used for communication of data. An ILU(0) iterative solver is used for smoothing the errors arising within each partition at the different grid levels, and a multi-level synchronization across the computational domain partitions is enforced in order to improve the performance of the parallelized Multigrid solver. Two synchronization strategies are evaluated: in the first the synchronization is applied across the multigrid levels during the restriction step in addition to the base level, while in the second the synchronization is enforced during the restriction and prolongation steps. To increase robustness gathering of coefficients across partiti...

Optimised Hybrid Parallelisation of a CFD Code on Many Core Architectures

2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 2013

Reliable aerodynamic and aeroelastic design of wind turbines, aircraft wings and turbomachinery blades increasingly relies on the use of high-fidelity Navier-Stokes Computational Fluid Dynamics codes to predict the strongly nonlinear periodic flows associated with structural vibrations and periodically varying farfield boundary conditions. On a single computer core, the harmonic balance solution of the Navier-Stokes equations has been shown to significantly reduce the analysis runtime with respect to the conventional time-domain approach. The problem size of realistic simulations, however, requires high-performance computing. The Computational Fluid Dynamics COSA code features a novel harmonic balance Navier-Stokes solver which has been previously parallelised using both a pure MPI implementation and a hybrid MPI/OpenMP implementation. This paper presents the recently completed optimisation of both parallelisations. The achieved performance improvements of both parallelisations highlight the effectiveness of the adopted parallel optimisation strategies. Moreover, a comparative analysis of the optimal performance of these two architectures in terms of runtime and power consumption using some of the current common HPC architectures highlights the reduction of both aspects achievable by using the hybrid parallelisation with emerging many-core architectures.

Development and implementation of algorithm for the future large-scale computing in CFD

2015

Faculty of engineering M-house, LTH Master’s degree Development and implementation of algorithms for the future large-scale computing in CFD by Robin Qvarfordt The aim of this thesis is to study how the BCM (Building Cube Method) can improve performance of simulations in CFD with the steady increasing performance of modern parallel computers. A parallel program is developed and tested on different grid configurations and problems, among them the Navier-Stokes equations. The first part of the thesis includes theory of numerical methods and software to be used when writing the program. Here is discussed how staggered grid is used to avoid the checkerboard effect, the basics of finite difference method, the PISO algorithm and software like openMPI (open Message Passing Interface) which is used to parallelize. This is followed by a description of the implementation and testing. Results are found continuously in chapter 4 and chapter 5. Chapter 4 is about stability and convergence and co...

Multiple threads and parallel challenges for large simulations to accelerate a general Navier–Stokes CFD code on massively parallel systems

2012

Computational fluid dynamics is an increasingly important application domain for computational scientists. In this paper, we propose and analyze optimizations necessary to run CFD simulations consisting of multibillion-cell mesh models on large processor systems. Our investigation leverages the general industrial Navier-Stokes CFD application, Code_Saturne, developed by Electricité de France for incompressible and nearly compressible flows. In this paper, we outline the main bottlenecks and challenges for massively parallel systems and emerging processor features such as many-core, transactional memory, and thread level speculation. We also present an approach based on an octree search algorithm to facilitate the joining of mesh parts and to build complex larger unstructured meshes of several billion grid cells. We describe two parallel strategies of an algebraic multigrid solver and we detail how to introduce new levels of parallelism based on compiler directives with OpenMP, transactional memory and thread level speculation, for finite volume cell-centered formulation and face-based loops. A renumbering scheme for mesh faces is proposed to enhance thread-level parallelism. and implementations capable of simulating with multibillions of cells or particles are beginning to emerge within the research community. Nevertheless, one of the bigger challenges is to reach this capability with general CFD Navier-Stokes industrial software.

An Object-Oriented Parallel Finite-Volume CFD Code

Volume 6: Turbomachinery, Parts A, B, and C, 2008

This paper concerns the parallelization and optimization of an in-house three-dimensional unstructured finite-volume computational fluid dynamics (CFD) code. It aims to highlight the use of programming techniques in order to speedup computation and minimize memory usage. The motivation for developing an inhouse solver is that commercial codes are general and sometimes simulations are not in agreement with actual phenomena. Moreover, in-house models can be developed and easily integrated to the solver. The original code was initially written in Fortran 77 though the most recent added subroutines include Fortran 90 features. Due to language restrictions and the initial project objectives, issues such as memory usage minimization were not considered. The new code uses an object-oriented paradigm aiming to enhance code reuse and increase efficiency during application development. The parallel code is fully written in Fortran 90 using MPI and hence portable to different architectures. Numerical experiments of typical 3D cases, such as flat plate with uniform incoming flow and a converging-diverging supersonic nozzle, were carried out showing good parallel efficiency. The serial version of the ported code has shown a considerable reduction on the execution time compared to the original code. Convergent solutions agree with the solution of the original code.