A Parallel Variational Mesh Quality Improvement Method for Tetrahedral Meshes (original) (raw)
Related papers
A parallel variational mesh quality improvement method for distributed memory machines
2020
There are numerous scientific applications which require large and complex meshes. Given the explosive growth in parallel computer architectures, ranging from supercomputers to hybrid CPU/GPU architectures, there has been a corresponding increase in interest in parallel computer simulations. For computational simulations involving the above applications, algorithms, which generate the mesh and manipulate it in parallel, are required. In particular, parallel mesh quality improvement is required whenever meshes of low quality arise in such simulations. In this talk, we describe our parallel variational mesh quality improvement method designed for distributed memory machines. The method is based on the sequential variational mesh quality improvement method of Huang and Kamenski. Although most mesh quality improvement methods directly minimize an objective function that explicitly specifies the mesh quality, Huang and Kamenski use the Moving Mesh PDE (MMPDE) method to discretize and fin...
Effective large scale computing software for parallel mesh generation
2011
Scientists commonly turn to supercomputers or Clusters of Workstations with hundreds (even thousands) of nodes to generate meshes for large-scale simulations. Parallel mesh generation software is then used to decompose the original mesh generation problem into smaller sub-problems that can be solved (meshed) in parallel. The size of the final mesh is limited by the amount of aggregate memory of the parallel machine. Also, requesting many compute nodes on a shared computing resource may result in a long waiting, far surpassing the time it takes to solve the problem. These two problems (i.e., insufficient memory when computing on a small number of nodes, and long waiting times when using many nodes from a shared computing resource) can be addressed by using out-of-core algorithms. These are algorithms that keep most of the dataset out-of-core (i.e., outside of memory, on disk) and load only a portion in-core (i.e., into memory) at a time. We explored two approaches to out-of-core comp...
Improving the performance of Uintah: A large-scale adaptive meshing computational framework
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010
Uintah is a highly parallel and adaptive multiphysics framework created by the Center for Simulation of Accidental Fires and Explosions in Utah. Uintah, which is built upon the Common Component Architecture, has facilitated the simulation of a wide variety of fluid-structure interaction problems using both adaptive structured meshes for the fluid and particles to model solids. Uintah was originally designed for, and has performed well on, about a thousand processors. The evolution of Uintah to use tens of thousands processors has required improvements in memory usage, data structure design, load balancing algorithms and cost estimation in order to improve strong and weak scalability up to 98,304 cores for situations in which the mesh used varies adaptively and also cases in which particles that represent the solids move from mesh cell to mesh cell.
SIAM Journal on Scientific Computing, 2012
In this paper we describe a general adaptive finite element framework for unstructured tetrahedral meshes without hanging nodes suitable for large scale parallel computations. Our framework is designed to scale linearly to several thousands of processors, using fully distributed and efficient algorithms. The key components of our implementation, local mesh refinement and load balancing algorithms, are described in detail. Finally, we present a theoretical and experimental performance study of our framework, used in a large scale computational fluid dynamics computation, and we compare scaling and complexity of different algorithms on different massively parallel architectures.
Parallel multilevel diffusion algorithms for repartitioning of adaptive meshes
1997
Graph partitioning has been shown to be an effective way to divide a large computation over an arbitrary number of processors. A good partitioning can ensure load balance and minimize the communication overhead of the computation by partitioning an irregular mesh into p equal parts while minimizing the number of edges cut by the partition. For a large class of irregular mesh applications, the structure of the graph changes from one phase of the computation to the next. Eventually, as the graph evolves, the adapted mesh has to be repartitioned to ensure good load balance. Failure to do so will lead to higher parallel run time. This repartitioning needs to maintain a low edge-cut in order to minimize communication overhead in the follow-on computation. It also needs to minimize the time for physically migrating data from one processor to another since this time can dominate overall run time. Finally, it must be fast and scalable since it may be necessary to repartition frequently. Partitioning the adapted mesh again from scratch with an existing graph partitioner can be done quickly and will result in a low edge-cut. However, it will lead to an excessive migration of data among processors. In this paper, we present new parallel algorithms for robustly computing repartitionings of adaptively refined meshes. These algorithms perform diffusion of vertices in a multilevel framework and minimize data movement without compromising the edge-cut. Furthermore, our parallel repartitioners include parameterized heuristics to specifically optimize edge-cut, total data migration, or the maximum amount of data migrated into and out of any one processor. Our results on a variety of synthetic meshes show that our parallel multilevel diffusion algorithms are highly robust schemes for repartitioning adaptive meshes. The resulting edge-cuts are close to those resulting from partitioning from scratch with a state-of-the-art graph partitioner, while data migration is substantially reduced. Furthermore, repartitioning can be done very fast. Our experiments show that meshes with around eight million vertices can be repartitioned on a 256-processor Cray T3D in only a couple of seconds.
ParFUM: a parallel framework for unstructured meshes for scalable dynamic physics applications
Engineering with Computers, 2006
Unstructured meshes are used in many engineering applications with irregular domains, from elastic deformation problems to crack propagation to fluid flow. Because of their complexity and dynamic behavior, the development of scalable parallel software for these applications is challenging. The Charm++ Parallel Framework for Unstructured Meshes allows one to write parallel programs that operate on unstructured meshes with only minimal knowledge of parallel computing, while making it possible to achieve excellent scalability even for complex applications. Charm++'s messagedriven model enables computation/communication overlap, while its run-time load balancing capabilities make it possible to react to the changes in computational load that occur in dynamic physics applications. The framework is highly flexible and has been enhanced with numerous capabilities for the manipulation of unstructured meshes, such as parallel mesh adaptivity and collision detection. 1
Scandalously Parallelizable Mesh Generation
2011
We propose a novel approach which employs random sampling to generate an accurate non-uniform mesh for numerically solving Partial Differential Equation Boundary Value Problems (PDE-BVP's). From a uniform probability distribution U over a 1D domain, we sample M discretizations of size N where M ≫ N . The statistical moments of the solutions to a given BVP on each of the M ultra-sparse meshes provide insight into identifying highly accurate non-uniform meshes. Essentially, we use the pointwise mean and variance of the coarse-grid solutions to construct a mapping Q(x) from uniformly to non-uniformly spaced mesh-points. The error convergence properties of the approximate solution to the PDE-BVP on the non-uniform mesh are superior to a uniform mesh for a certain class of BVP's. In particular, the method works well for BVP's with locally non-smooth solutions. We present a framework for studying the sampled sparse-mesh solutions and provide numerical evidence for the utility of this approach as applied to a set of example BVP's. We conclude with a discussion of how the near-perfect paralellizability of our approach suggests that these strategies have the potential for highly efficient utilization of massively parallel multi-core technologies such as General Purpose Graphics Processing Units (GPGPU's). We believe that the proposed algorithm is beyond embarrassingly parallel; implementing it on anything but a massively multi-core architecture would be scandalous.
Towards Distributed Semi-speculative Adaptive Anisotropic Parallel Mesh Generation
arXiv (Cornell University), 2023
This paper presents the foundational elements of a distributed memory method for mesh generation that is designed to leverage concurrency offered by large-scale computing. To achieve this goal, meshing functionality is separated from performance aspects by utilizing a separate entity for each-a shared memory mesh generation code called CDT3D and PREMA for parallel runtime support. Although CDT3D is designed for scalability, lessons are presented regarding additional measures that were taken to enable the code's integration into the distributed memory method as a black box. In the presented method, an initial mesh is data decomposed and subdomains are distributed amongst the nodes of a high-performance computing (HPC) cluster. Meshing operations within CDT3D utilize a speculative execution model, enabling the strict adaptation of subdomains' interior elements. Interface elements undergo several iterations of shifting so that they are adapted when their data dependencies are resolved. PREMA aids in this endeavor by providing asynchronous message passing between encapsulations of data, work load balancing, and migration capabilities all within a globally addressable namespace. PREMA also assists in establishing data dependencies between subdomains, thus enabling "neighborhoods" of subdomains to work independently of each other in performing interface shifts and adaptation. Preliminary results show that the presented method is able to produce meshes of comparable quality to those generated by the original shared memory CDT3D code. Given the costly overhead of collective communication seen by existing state-of-the-art software, relative communication performance of the presented distributed memory method also shows that its emphasis on avoiding global synchronization presents a potentially viable solution in achieving scalability when targeting large configurations of cores.
Lecture Notes in Computational Science and Engineering
Parallel mesh generation is a relatively new research area between the boundaries of two scientific computing disciplines: computational geometry and parallel computing. In this chapter we present a survey of parallel unstructured mesh generation methods. Parallel mesh generation methods decompose the original mesh generation problem into smaller subproblems which are meshed in parallel. We organize the parallel mesh generation methods in terms of two basic attributes: (1) the sequential technique used for meshing the individual subproblems and (2) the degree of coupling between the subproblems. This survey shows that without compromising in the stability of parallel mesh generation methods it is possible to develop parallel meshing software using off-the-shelf sequential meshing codes. However, more research is required for the efficient use of the state-of-the-art codes which can scale from emerging chip multiprocessors (CMPs) to clusters built from CMPs.
Distributed high-performance parallel mesh generation with ViennaMesh
2013
The ever-growing demand for higher accuracy in scientific simulations based on the discretization of equations given on physical domains is typically coupled with an increase in the number of mesh elements. Conventional mesh generation tools struggle to keep up with the increased workload, as they do not scale with the availability of, for example, multi-core CPUs. We present a parallel mesh generation approach for multi-core and distributed computing environments based on our generic meshing library ViennaMesh and on the Advancing Front mesh generation algorithm. Our approach is discussed in detail and performance results are shown.