A parallel SSOR preconditioner for lattice QCD (original) (raw)
Related papers
Parallel SSOR preconditioning for lattice QCD
Parallel Computing, 1999
The locally lexicographic symmetric successive overrelaxation algorithm (ll-SSOR) is the most eective parallel preconditioner known for iterative solvers used in lattice gauge theory. After reviewing the basic properties of ll-SSOR, the focus of this contribution is put on its parallel aspects: the administrative overhead of the parallel implementation of ll-SSOR, which is due to many conditional operations, decreases its eciency by a factor of up to one third. A simple generalization of the algorithm is proposed that allows the application of the lexicographic ordering along speci®ed axes, while along the other dimensions odd±even preconditioning is used. In this way one can tune the preconditioner towards optimal performance by balancing ll-SSOR eectivity and administrative overhead.
SSOR preconditioning in simulations of the QCD Schrödinger functional
Computer Physics Communications, 2000
We report on a parallelized implementation of SSOR preconditioning for O(a) improved lattice QCD with Schrödinger functional boundary conditions. Numerical simulations in the quenched approximation at parameters in the light quark mass region demonstrate that a performance gain of a factor ∼ 1.5 over even-odd preconditioning can be achieved.
Multigrid preconditioning for the overlap operator in lattice QCD
Numerische Mathematik, 2015
The overlap operator is a lattice discretization of the Dirac operator of quantum chromodynamics, the fundamental physical theory of the strong interaction between the quarks. As opposed to other discretizations it preserves the important physical property of chiral symmetry, at the expense of requiring much more effort when solving systems with this operator. We present a preconditioning technique based on another lattice discretization, the Wilson-Dirac operator. The mathematical analysis precisely describes the effect of this preconditioning in the case that the Wilson-Dirac operator is normal. Although this is not exactly the case in realistic settings, we show that current smearing techniques indeed drive the Wilson-Dirac operator towards normality, thus providing a motivation why our preconditioner works well in computational practice. Results of numerical experiments in physically relevant settings show that our preconditioning yields accelerations of up to one order of magnitude.
1998
The enormous computing resources that large-scale simulations in Lattice QCD require will continue to test the limits of even the largest supercomputers into the foreseeable future. The efficiency of such simulations will therefore concern practitioners of lattice QCD for some time to come. I begin with an introduction to those aspects of lattice QCD essential to the remainder of the thesis, and follow with a description of the Wilson fermion matrix M, an object which is central to my theme. The principal bottleneck in Lattice QCD simulations is the solution of linear systems involving M, and this topic is treated in depth. I compare some of the more popular iterative methods, including Minimal Residual, Corij ugate Gradient on the Normal Equation, BI-Conjugate Gradient, QMR., BiCGSTAB and BiCGSTAB2, and then turn to a study of block algorithms, a special class of iterative solvers for systems with multiple right-hand sides. Included in this study are two block algorithms which had ...
Adaptive Multigrid Algorithm for Lattice QCD
Physical Review Letters, 2008
We present a new multigrid solver that is suitable for the Dirac operator in the presence of disordered gauge fields. The key behind the success of the algorithm is an adaptive projection onto the coarse grids that preserves the near null space. The resulting algorithm has weak dependence on the gauge coupling and exhibits very little critical slowing down in the chiral limit. Results are presented for the Wilson Dirac operator of the 2d U(1) Schwinger model. PACS numbers: 11.15.Ha, 12.38.Gc The most demanding computational task in lattice QCD simulations consists of the calculation of quark propagators, which are needed both for generating gauge field configurations with the appropriate measure and for the evaluation of most observables. The calculation of a quark propagator, which in the course of a simulation must be carried out innumerous times with varying sources and gauge backgrounds, consists in turn of solving a very large system of linear equations,
Fine-grained parallelization of lattice QCD kernel routine on GPUs
Journal of Parallel and Distributed Computing, 2008
Simulation time for the classical problem of Lattice Quantum Chromodynamics (Lattice QCD) is dominated by one kernel routine responsible for computing the actions of a Dirac operator. This paper describes an experience in parallelizing this kernel routine. We explore parallelization granularities for this kernel routine on Graphical Processing Units (GPUs). We show that fine-grained parallelism can outperform coarse-grained parallelization, given that control-flow and communication effects are minimized. We propose two techniques for transforming control-flow-based code to control-free code. We also show how to reduce the communication effect by optimizing for commonly used sequences of calls to this routine. In our implementation on NVIDIA 8800 GTX, we were able to achieve an 8.3x speedup over an SSE2 optimized version on 2.8 GHz Intel Xeon CPU.
Chiral Fermions Algorithms In Lattice QCD
East European Journal of Physics, 2019
The theory that explains the strong interactions of the elementary particles, as part of the standard model, it is the so-called Quantum Chromodynamics (QCD) theory. In regimes of low energy this theory it is formulated and solved in a lattice with four dimensions using numerical simulations. This method it is called the lattice QCD theory. Quark propagator it the most important element that is calculated because it contains the physical information of lattice QCD. Computing quark propagator of chiral fermions in lattice means that we should invert the chiral Dirac operator, which has high complexity. In the standard inversion algorithms of the Krylov subspace methods, that are used in these kinds of simulations, the time of inversion is scaled with the inverse of the quark mass. In lattice QCD simulations with chiral fermions, this phenomenon it is knowing as the critical slowing-down problem. The purpose of this work is to show that the preconditioned GMRESR algorithm, developed i...
Optimized Lattice QCD kernels for a Pentium 4 Cluster
2001
Soon, a new cluster of parallel Pentium 4 machines will be set up at JLAB to run Lattice QCD calculations. I discuss the rationale for optimized Lattice QCD routines, and how the features of the Pentium 4 enable new optimized routines to run much faster than normal C routines. I describe the optimization strategies used in SU(3) linear algebra routines, and in both single-node and parallel implementations of the Wilson-Dirac Operator. Finally, I show single node performance timings for the parallel version of the Wilson-Dirac operator.
Better than $1/Mflops sustained: a scalable PC-based parallel computer for lattice QCD
Computer Physics Communications, 2000
We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. The Eotvos Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total