A Parallel Algorithm for the Dynamic Partitioning of Particle-Mesh Computational Systems (original) (raw)

Dynamic load balancing in parallel particle methods

Contemporary scientific simulations require vast computing power to solve complex problems in natural and life sciences. High-performance computing (HPC) enables fast execution of scientific simulation codes by parallelizing the computational workload across processing elements (PEs). This is usually done by decomposing the computational domain into smaller subdomains and assigning each subdomain to a PE. The subdomains encapsulate computational elements such as computational meshes or particles. The workload of each PE is defined by the number of operations done using these computational elements.

Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics

International Journal for Numerical Methods in Engineering, 1993

Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two-and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIM D/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.

GOTPM: A Parallel Hybrid Particle-Mesh

2003

We describe a parallel, cosmological N-body code based on a hybrid scheme using the particle-mesh (PM) and Barnes-Hut (BH) oct-tree algorithm. We call the algorithm GOTPM for Grid-of-Oct-Trees-Particle-Mesh. The code is parallelized using the Message Passing Interface (MPI) library and is optimized to run on Beowulf clusters as well as symmetric multi-processors. The gravitational potential is determined on a mesh using a standard PM method with particle forces determined through interpolation. The softened PM force is corrected for short range interactions using a grid of localized BH trees throughout the entire simulation volume in a completely analogous way to P 3 M methods. This method makes no assumptions about the local density for short range force corrections and so is consistent with the results of the P 3 M method in the limit that the treecode opening angle parameter, θ → 0. The PM method is parallelized using one-dimensional slice domain decomposition. Particles are dist...

Design and Implementation of Particle Systems for Meshfree Methods with High Performance

High Performance Parallel Computing [Working Title]

Particle systems, commonly associated with computer graphics, animation, and video games, are an essential component in the implementation of numerical methods ranging from the meshfree methods for computational fluid dynamics and related applications (e.g., smoothed particle hydrodynamics, SPH) to minimization methods for arbitrary problems (e.g., particle swarm optimization, PSO). These methods are frequently embarrassingly parallel in nature, making them a natural fit for implementation on massively parallel computational hardware such as modern graphics processing units (GPUs). However, naive implementations fail to fully exploit the capabilities of this hardware. We present practical solutions to the challenges faced in the efficient parallel implementation of these particle systems, with a focus on performance, robustness, and flexibility. The techniques are illustrated through GPUSPH, the first implementation of SPH to run completely on GPU, and currently supporting multi-GPU clusters, uniform precision independent of domain size, and multiple SPH formulations.

Software Architecture for parallel particle tracking with the distribution of large amount of data

High Performance Computing Symposium (HPC 2018), 2017

This study is about the architecture of a library designed for particle tracking. This library exposes common features used to track particles in large meshes using parallel algorithms to localize, manage, distribute and move particles over computing units. The proposed library design ables particles to be tracked using multiple heterogeneous parallel paradigms with component reusability. A customized algorithm to distribute particles over processes has been developed that uses different features of this architecture and shows a high impact on particle localization and movement execution time.

A Language and Development Environment for Parallel Particle Methods

2017

We present the Parallel Particle-Mesh Environment (PPME), a domain-specific language (DSL) and development environment for numerical simulations using particles and hybrid particle-mesh methods. PPME is the successor of the Parallel Particle-Mesh Language (PPML) [1,2], a Fortranbased DSL that provides high-level abstractions for the development of distributed-memory particlemesh simulations with the parallel particle-mesh library for high-performance computing [3]. The abstractions in PPML allow scientific programmers to write more concise and declarative code in comparison to hand-coded implementations. Essentially, it frees developers from the burden of writing boilerplate code that manages parallelism, synchronization, and data distribution. However, PPML has downsides which we address in PPME [4]: The lightweight embedding of PPML into Fortran, based on language macros, prevents advanced code analysis and complex compile-time computation. This makes debugging PPML programs hard ...

PPM – A highly efficient parallel particle–mesh library for the simulation of continuum systems

Journal of Computational Physics, 2006

This paper presents a highly efficient parallel particle-mesh (PPM) library, based on a unifying particle formulation for the simulation of continuous systems. In this formulation, the grid-free character of particle methods is relaxed by the introduction of a mesh for the reinitialization of the particles, the computation of the field equations, and the discretization of differential operators. The present utilization of the mesh does not detract from the adaptivity, the efficient handling of complex geometries, the minimal dissipation, and the good stability properties of particle methods. The coexistence of meshes and particles, allows for the development of a consistent and adaptive numerical method, but it presents a set of challenging parallelization issues that have hindered in the past the broader use of particle methods. The present library solves the key parallelization issues involving particle-mesh interpolations and the balancing of processor particle loading, using a novel adaptive tree for mixed domain decompositions along with a coloring scheme for the particle-mesh interpolation. The high parallel efficiency of the library is demonstrated in a series of benchmark tests on distributed memory and on a shared-memory vector architecture. The modularity of the method is shown by a range of simulations, from compressible vortex rings using a novel formulation of smooth particle hydrodynamics, to simulations of diffusion in real biological cell organelles. The present library enables large scale simulations of diverse physical problems using adaptive particle methods and provides a computational tool that is a viable alternative to mesh-based methods.

Dynamic re-allocation of meshes for parallel Finite Element applications

project aims to bring together the developments in parallel partitioning and parallel FE applications to ensure that the potential of scalable computing can be achieved for fully-functional industrial simulation, which includes efficient adaptive meshing (and re-meshing) options. The parallel dynamic re-partitioning routines should also be to handle the full complexity and range of finite elements as used in industrial structural mechanics codes, as exemplified by the applications within the project.

A Software Framework for Portable Parallelization of Particle-Mesh Simulations

Lecture Notes in Computer Science, 2006

We present a software framework for the transparent and portable parallelization of simulations using particle-mesh methods. Particles are used to transport physical properties and a mesh is required in order to reinitialize the distorted particle locations, ensuring the convergence of the method. Field quantities are computed on the particles using fast multipole methods or by discretizing and solving the governing equations on the mesh. This combination of meshes and particles presents a challenging set of parallelization issues. The present library addresses these issues for a wide range of applications, and it enables orders of magnitude increase in the number of computational elements employed in particle methods. We demonstrate the performance and scalability of the library on several problems, including the first-ever billion particle simulation of diffusion in real biological cell geometries.