US QCD computational performance studies with PERI (original) (raw)

Performance of lattice QCD programs on CP-PACS

Parallel Computing, 1999

The CP-PACS is a massively parallel MIMD computer with the theoretical peak speed of 614 GFLOPS which has been developed for computational physics applications at the University of Tsukuba, Japan. We report on the performance of the CP-PACS computer measured during recent production runs using our Quantum Chromodynamics code for the simulation of quarks and gluons in particle physics. With the full 2048 processing nodes, our code shows a sustained speed of 237.5 GFLOPS for the heat-bath update of gluon variables, 264.6 GFLOPS for the over-relaxation update, and 325.3 GFLOPS for quark matrix inversion with an even-odd preconditioned minimal residual algorithm.

Enabling the ATLAS Experiment at the LHC for High Performance Computing

2017

In this thesis, I studied the feasibility of running computer data analysis programs from the Worldwide LHC Computing Grid, in particular large-scale simulations of the ATLAS experiment at the CERN LHC, on current general purpose High Performance Computing (HPC) systems. An approach for integrating HPC systems into the Grid is proposed, which has been implemented and tested on the „Todi” HPC machine at the Swiss National Supercomputing Centre (CSCS). Over the course of the test, more than 500000 CPU-hours of processing time have been provided to ATLAS, which is roughly equivalent to the combined computing power of the two ATLAS clusters at the University of Bern. This showed that current HPC systems can be used to efficiently run large-scale simulations of the ATLAS detector and of the detected physics processes. As a first conclusion of my work, one can argue that, in perspective, running large-scale tasks on a few large machines might be more cost-effective than running on relativ...

The QCDOC Project

Nuclear Physics B-proceedings Supplements, 2005

The QCDOC project has developed a supercomputer optimised for the needs of Lattice QCD simulations. It provides a very competitive price to sustained performance ratio of around $1 USD per sustained Megaflop/s in combination with outstanding scalability. Thus very large systems delivering over 5 TFlop/s of performance on the evolution of a single lattice is possible. Large prototypes have been built and are functioning correctly.The software environment raises the state of the art in such custom supercomputers. It is based on a lean custom node operating system that eliminates many unnecessary overheads that plague other systems. Despite the custom nature, the operating system implements a standards compliant UNIX-like programming environment easing the porting of software from other systems. The SciDAC QMP interface adds internode communication in a fashion that provides a uniform cross-platform programming environment.

Better than $1/Mflops sustained: a scalable PC-based parallel computer for lattice QCD

Computer Physics Communications, 2000

We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. The Eotvos Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total

QCDOC: A 10 Teraflops Computer for Tightly-Coupled Calculations

2004

Numerical simulations of the strong nuclear force, known as quantum chromodynamics or QCD, have proven to be a demanding, forefront problem in high-performance computing. In this report, we describe a new computer, QCDOC (QCD On a Chip), designed for optimal price/performance in the study of QCD. QCDOC uses a six-dimensional, low-latency mesh network to connect processing nodes, each of which includes a single custom ASIC, designed by our collaboration and built by IBM, plus DDR SDRAM. Each node has a peak speed of 1 Gigaflops and two 12,288 node, 10+ Teraflops machines are to be completed in the fall of 2004. Currently, a 512 node machine is running, delivering efficiencies as high as 45% of peak on the conjugate gradient solvers that dominate our calculations and a 4096-node machine with a cost of 1.6Misunderconstruction.Thisshouldgiveusaprice/performancelessthan1.6M is under construction. This should give us a price/performance less than 1.6Misunderconstruction.Thisshouldgiveusaprice/performancelessthan1 per sustained Megaflops. 0-7695-2153-3/04 $20.00 (c) 2004 IEEE the propagation of an electron in a background photon field. Standard Krylov space solvers work well to produce the solution and dominate the calculational time for QCD simulations.

Better than $l/Mflops sustained: a scalable PC-based parallel computer for lattice QCD

Computer Physics Communications, 2003

We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. The Eötvös Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD, respectively (for 64-bit applications the performance is approximately halved). The novel feature of our system is its communication architecture. In order to have a scalable, cost-effective machine we use Gigabit Ethernet cards for nearest-neighbor communications in a two-dimensional mesh. This type of communication is cost effective (only 30% of the hardware costs is spent on the communication). According to our benchmark measurements this type of communication results in around 40% communication time fraction for lattices upto 48 3 · 96 in full QCD simulations. The price/sustained-performance ratio for full QCD is better than 1/MflopsforWilson(andaround1/Mflops for Wilson (and around 1/MflopsforWilson(andaround1.5/Mflops for staggered) quarks for practically any lattice size, which can fit in our parallel computer. The communication software is freely available upon request for non-profit organizations. * 1 There are obvious advantages of PC based systems. Single PC hardware usually has excellent price/performance ratios for both single and double precision applications. In most cases the operating system (Linux), compiler (gcc) and other software are free. Another advantage of using PC/Linux based systems is that lattice codes remain portable. Furthermore, due to their price they are available for a broader community working on lattice gauge theory. For recent review papers and benchmarks see .

Lattice QCD with commodity hardware and software

2000

Large scale QCD Monte Carlo calculations have typically been performed on either commercial supercomputers or specially built massively parallel computers such as Fermilab's ACPMAPS. Commodity computer systems offer impressive floating point performance-tocost ratios which exceed those of commercial supercomputers. As high performance networking components approach commodity pricing, it becomes reasonable to assemble a massively parallel supercomputer from commodity parts. We describe the work and progress to date of a collaboration working on this problem.