Integrated flow and stress using an unstructured mesh on distributed memory parallel systems (original) (raw)
Related papers
International Journal for Numerical Methods in Engineering, 1993
Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two-and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIM D/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.
International Journal of High Performance Computing Applications, 2000
Realizing scalable performance on high performance computing systems is not straightforward for single-phenomenon codes (such as computational fluid dynamics [CFD]). This task is magnified considerably when the target software involves the interactions of a range of phenomena that have distinctive solution procedures involving different discretization methods. The problems of addressing the key issues of retaining data integrity and the ordering of the calculation procedures are significant. A strategy for parallelizing this multiphysics family of codes is described for software exploiting finite-volume discretization methods on unstructured meshes using iterative solution procedures. A mesh partitioning-based SPMD approach is used. However, since different variables use distinct discretization schemes, this means that distinct partitions are required; techniques for addressing this issue are described using the mesh-partitioning tool, JOSTLE. In this contribution, the strategy is tested for a variety of test cases under a wide range of conditions (e.g., problem size, number of processors, asynchronous/synchronous communications, etc.) using a variety of strategies for mapping the mesh partition onto the processor topology.
1998
As the complexity of parallel applications increase, the performance limitations resulting from computational load imbalance become dominant. Mapping the problem space to the processors in a parallel machine in a manner that balances the workload of each processors will typically reduce the run-time. In many cases the computation time required for a given calculation cannot be pre-determined even at run-time and so static partition of the problem returns poor performance. For problems in which the computational load across the discretisation is dynamic and inhomogeneous, for example multi-physics problems involving uid and solid mechanics with phase changes, the workload for a static subdomain will change over the course of a computation and cannot be estimated beforehand. For such applications the mapping of loads to processors is required to change dynamically, at run-time in order to maintain reasonable e ciency. The issues of dynamic load balancing are examined in the context of PHYSICA, a three dimensional unstructured mesh multi-physics continuum mechanics computational modelling code 2].
ParFUM: a parallel framework for unstructured meshes for scalable dynamic physics applications
Engineering with Computers, 2006
Unstructured meshes are used in many engineering applications with irregular domains, from elastic deformation problems to crack propagation to fluid flow. Because of their complexity and dynamic behavior, the development of scalable parallel software for these applications is challenging. The Charm++ Parallel Framework for Unstructured Meshes allows one to write parallel programs that operate on unstructured meshes with only minimal knowledge of parallel computing, while making it possible to achieve excellent scalability even for complex applications. Charm++'s messagedriven model enables computation/communication overlap, while its run-time load balancing capabilities make it possible to react to the changes in computational load that occur in dynamic physics applications. The framework is highly flexible and has been enhanced with numerous capabilities for the manipulation of unstructured meshes, such as parallel mesh adaptivity and collision detection. 1
Parallel Library for Unstructured Mesh Problems
The growing class of applications which solve partial di erential equations (PDEs) on unstructured adaptive meshes are considered. Solution to such sparse, non-symmetric and in most cases ill-conditioned systems is often obtained using iterative methods. The programming complexity of such applications on parallel architectures is well known. The development of a Parallel Library for Unstructured Mesh Problems (PLUMP), which supports the transparent use of parallel machines for such applications, is addressed. PLUMP exploits the common denominators in such problems, provides key kernels such as the matrix-vector product and preconditioners for a wide range of iterative solvers, and supports the parallelization of this class of applications in a clean and concise manner. The PLUMP library is implemented in C and FORTRAN77 using the Message-Passing Interface (MPI) and is available free under copyright for research purposes.
Efficient Parallel Computation of Unstructured Finite Element Reacting Flow Solutions
Parallel Computing, 1997
A parallel unstructured finite element (FE) reacting flow solver designed for message passing MIMD computers is described. This implementation employs automated partitioning algorithms for load balancing unstructured grids, a distributed sparse matrix representation of the global FE equations, and parallel Krylov subspace iterative solvers. In this paper, a number of issues related to the efficient implementation of parallel unstructured mesh applications are presented. These issues include the differences between structured and unstructured mesh parallel applications, major communication kernels for unstructured Krylov iterative solvers, automatic mesh partitioning algorithms, and the influence of mesh partitioning metrics and single-node CPU performance on parallel performance. Results are presented for example FE heat transfer, fluid flow and full reacting flow applications on a 1024 processor nCUBE 2 hypercube and a 1904 processor Intel Paragon. Results indicate that very high computational rates and high scaled efficiencies can be achieved for large problems despite the use of sparse matrix data structures and the required unstructured data communication.
Evaluation of the JOSTLE mesh partitioning code for practical multiphysics applications
1996
The use of unstructured mesh codes on parallel machines is one of the most e ective ways to solve large computational mechanics problems. Completely general geometries and complex behaviour can be modelled and, in principle, the inherent sparsity of many such problems can be exploited to obtain excellent parallel e ciences. However, unlike their structured counterparts, the problem of distributing the mesh across the memory of the machine, whilst minimising the amount of interprocessor communication, must be carefully addressed. This process is an overhead that is not incurred by a serial code, but is shown to be rapidly computable at run time and tailored for the machine being used.
Parallel Computational Fluid Dynamics 1997, 1998
A large class of computational problems are characterised by frequent synchronisation, and computational requirements which change as a function of time. When such a problem is solved on a message passing multiprocessor machine 5], the combination of these characteristics leads to system performance which deteriorate in time. As the communication performance of parallel hardware steadily improves so load balance becomes a dominant factor in obtaining high parallel e ciency. Performance can be improved with periodic redistribution of computational load; however, redistribution can sometimes be very costly. We study the issue of deciding when to invoke a global load re-balancing mechanism. Such a decision policy must e ectively weigh the costs of remapping against the performance bene ts, and should be general enough to apply automatically to a wide range of computations. This paper discusses a generic strategy for Dynamic Load Balancing (DLB) in unstructured mesh computational mechanics applications. The strategy is intended to handle varying levels of load changes throughout the run. The major issues involved in a generic dynamic load balancing scheme will be investigated together with techniques to automate the implementation of a dynamic load balancing mechanism within the Computer Aided Parallelisation Tools (CAPTools) environment, which is a semi-automatic tool for parallelisation of mesh based FORTRAN codes 2].
Partition Alignment in Three Dimensional Unstructured Mesh Multi-Physics Modelling
Parallel Computational Fluid Dynamics 1998, 1999
Unstructured mesh codes for modelling continuum physics phenomena have evolved to provide the facility to model complex interacting systems. Parallelisation of such codes using single Program Multi Data (SPMD) domain decomposition techniques implemented with message passing has been demonstrated to provide high parallel e ciency, scalability to large numbers of processors P and portability across a wide range of parallel platforms. High e ciency, especially for large P requires that load balance is achieved in each parallel loop. For a code in which loops span a variety of mesh entity types, for example, elements, faces and vertices, some compromise is required between load balance for each entity type and the quantity of inter-processor communication required to satisfy data dependence between processors.