QPACE: Quantum Chromodynamics Parallel Computing on the Cell Broadband Engine (original) (raw)

QCDOC: A 10 Teraflops Computer for Tightly-Coupled Calculations

2004

Numerical simulations of the strong nuclear force, known as quantum chromodynamics or QCD, have proven to be a demanding, forefront problem in high-performance computing. In this report, we describe a new computer, QCDOC (QCD On a Chip), designed for optimal price/performance in the study of QCD. QCDOC uses a six-dimensional, low-latency mesh network to connect processing nodes, each of which includes a single custom ASIC, designed by our collaboration and built by IBM, plus DDR SDRAM. Each node has a peak speed of 1 Gigaflops and two 12,288 node, 10+ Teraflops machines are to be completed in the fall of 2004. Currently, a 512 node machine is running, delivering efficiencies as high as 45% of peak on the conjugate gradient solvers that dominate our calculations and a 4096-node machine with a cost of 1.6Misunderconstruction.Thisshouldgiveusaprice/performancelessthan1.6M is under construction. This should give us a price/performance less than 1.6Misunderconstruction.Thisshouldgiveusaprice/performancelessthan1 per sustained Megaflops. 0-7695-2153-3/04 $20.00 (c) 2004 IEEE the propagation of an electron in a background photon field. Standard Krylov space solvers work well to produce the solution and dominate the calculational time for QCD simulations.

FPGA Implementation of a Lattice Quantum Chromodynamics Algorithm Using Logarithmic Arithmetic

2005

In this paper, we discuss the implementation of a lattice Quantum Chromodynamics (QCD) application to a Xilinx VirtexII FPGA device on an Alpha Data ADM-XRC-II board using Handel-C and logarithmic arithmetic. The specific algorithm implemented is the Wilson Dirac Fermion Vector times Matrix Product operation. QCD is the scientific theory that describes the interactions of various types of sub-atomic particles. Lattice QCD is the use of computer simulations to prove aspects of this theory. The research described in this paper aims to investigate whether FPGAs and logarithmic arithmetic are a viable compute-platform for high performance computing by implementing lattice QCD for this platform. We have achieved competitive performance of at least 936 MFlops per node, executing 14.2 floating point equivalent operations per cycle, which is far higher than the previous solutions proposed for lattice QCD simulations.

QPACE -- a QCD parallel computer based on Cell processors

2009

QPACE is a novel parallel computer which has been developed to be primarily used for lattice QCD simulations. The compute power is provided by the IBM PowerXCell 8i processor, an enhanced version of the Cell processor that is used in the Playstation 3. The QPACE nodes are interconnected by a custom, application optimized 3-dimensional torus network implemented on an FPGA. To achieve the very high packaging density of 26 TFlops per rack a new water cooling concept has been developed and successfully realized. In this paper we give an overview of the architecture and highlight some important technical details of the system. Furthermore, we provide initial performance results and report on the installation of 8 QPACE racks providing an aggregate peak performance of 200 TFlops.

Investigating how to simulate lattice gauge theories on a quantum computer

PhD thesis, 2023

Quantum computers have the potential to expand the utility of lattice gauge theory to investigate non-perturbative particle physics phenomena that cannot be accessed using a standard Monte Carlo method due to the sign problem. Thanks to the qubit, quantum computers can store Hilbert space in a more efficient way compared to classical computers. This allows the Hamiltonian approach to be computationally feasible, leading to absolute freedom from the sign-problem. But what the current noisy intermediate scale quantum hardware can achieve is under investigation, and therefore we chose to study the energy spectrum and the time evolution of an SU(2) theory using two kinds of quantum hardware: the D-Wave quantum annealer and the IBM gate-based quantum hardware.

Better than $l/Mflops sustained: a scalable PC-based parallel computer for lattice QCD

Computer Physics Communications, 2003

We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. The Eötvös Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD, respectively (for 64-bit applications the performance is approximately halved). The novel feature of our system is its communication architecture. In order to have a scalable, cost-effective machine we use Gigabit Ethernet cards for nearest-neighbor communications in a two-dimensional mesh. This type of communication is cost effective (only 30% of the hardware costs is spent on the communication). According to our benchmark measurements this type of communication results in around 40% communication time fraction for lattices upto 48 3 · 96 in full QCD simulations. The price/sustained-performance ratio for full QCD is better than 1/MflopsforWilson(andaround1/Mflops for Wilson (and around 1/MflopsforWilson(andaround1.5/Mflops for staggered) quarks for practically any lattice size, which can fit in our parallel computer. The communication software is freely available upon request for non-profit organizations. * 1 There are obvious advantages of PC based systems. Single PC hardware usually has excellent price/performance ratios for both single and double precision applications. In most cases the operating system (Linux), compiler (gcc) and other software are free. Another advantage of using PC/Linux based systems is that lattice codes remain portable. Furthermore, due to their price they are available for a broader community working on lattice gauge theory. For recent review papers and benchmarks see .

Better than $1/Mflops sustained: a scalable PC-based parallel computer for lattice QCD

Computer Physics Communications, 2000

We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. The Eotvos Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total

Overview of the QCDSP and QCDOC computers

Ibm Journal of Research and Development, 2005

The QCDSP and QCDOC computers are two generations of multithousand-node multidimensional mesh-based computers designed to study quantum chromodynamics (QCD), the theory of the strong nuclear force. QCDSP (QCD on digital signal processors), a four-dimensional mesh machine, was completed in 1998; in that year, it won the Gordon Bell Prize in the price/performance category. Two large installations-of 8,192 and 12,288 nodes, with a combined peak speed of one teraflops-have been in operation since. QCD-on-a-chip (QCDOC) utilizes a sixdimensional mesh and compute nodes fabricated with IBM systemon-a-chip technology. It offers a tenfold improvement in price/ performance. Currently, 100-node versions are operating, and there are plans to build three 12,288-node, 10-teraflops machines. In this paper, we describe the architecture of both the QCDSP and QCDOC machines, the operating systems employed, the user software environment, and the performance of our applicationlattice QCD. IBM J. RES. & DEV. VOL. 49 NO. 2/3 MARCH/MAY 2005 P. A. BOYLE ET AL.

Field Programmable Gate Arrays for Enhancing the Speed and Energy Efficiency of Quantum Dynamics Simulations

Journal of Chemical Theory and Computation, 2020

We present the first application of field programmable gate arrays (FPGAs) as new, customizable hardware architectures for carrying out fast and energy-efficient quantum dynamics simulations of large chemical/material systems. Instead of tailoring the software to fixed hardware, which is the typical case for writing quantum chemistry code for central processing units (CPUs) and graphics processing units (GPUs), FPGAs allow us to directly customize the underlying hardware (even at the level of specific electrical signals in the circuit) to give a truly optimized computational performance for quantum dynamics calculations. By offloading the most intensive and repetitive calculations onto an FPGA, we show that the computational performance of our real-time electron dynamics calculations can even exceed that of optimized commercial mathematical libraries running on high-performance GPUs. In addition to this impressive computational speedup, we show that FPGAs are immensely energy-efficient and consume 4 times less energy than modern GPU or CPU architectures. These energy savings are a practical and important metric for supercomputing centers (many of which exceed over $1 million in power costs alone), as exascale computing capabilities become more widespread and commonplace. Taken together, the implementation techniques and performance metrics of our study demonstrate that FPGAs could play a promising role in upcoming quantum chemistry and materials science applications, particularly for the acceleration and energyefficient execution of quantum dynamics calculations.

Simulating lattice gauge theories within quantum technologies

The European Physical Journal D

Lattice gauge theories, which originated from particle physics in the context of Quantum Chromodynamics (QCD), provide an important intellectual stimulus to further develop quantum information technologies. While one long-term goal is the reliable quantum simulation of currently intractable aspects of QCD itself, lattice gauge theories also play an important role in condensed matter physics and in quantum information science. In this way, lattice gauge theories provide both motivation and a framework for interdisciplinary research towards the development of special purpose digital and analog quantum simulators, and ultimately of scalable universal quantum computers. In this manuscript, recent results and new tools from a quantum science approach to study lattice gauge theories are reviewed. Two new complementary approaches are discussed: first, tensor network methods are presented – a classical simulation approach – applied to the study of lattice gauge theories together with some r...