IBM Blue Gene (original) (raw)

From Wikipedia, the free encyclopedia

Series of supercomputers by IBM

IBM Blue Gene

A Blue Gene/P supercomputer at Argonne National Laboratory
Developer IBM
Type Supercomputer platform
Release date BG/L: Feb 1999 (Feb 1999)BG/P: June 2007BG/Q: Nov 2011
Discontinued 2015 (2015)
CPU BG/L: PowerPC 440BG/P: PowerPC 450BG/Q: PowerPC A2
Predecessor IBM RS/6000 SP;QCDOC
Successor Summit, Sierra

Hierarchy of Blue Gene processing units

Blue Gene was an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with relatively low power consumption.

The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, and Blue Gene/Q. During their deployment, Blue Gene systems often led the TOP500[1] and Green500[2] rankings of the most powerful and most power-efficient supercomputers, respectively. Blue Gene systems have also consistently scored top positions in the Graph500 list.[3] The project was awarded the 2009 National Medal of Technology and Innovation.[4]

After Blue Gene/Q, IBM focused its supercomputer efforts on the OpenPower platform, using accelerators such as FPGAs and GPUs to address the diminishing returns of Moore's law.[5][6]

A video presentation of the history and technology of the Blue Gene project was given at the Supercomputing 2020 conference.[7]

In December 1999, IBM announced a US$100 million research initiative for a five-year effort to build a massively parallel computer, to be applied to the study of biomolecular phenomena such as protein folding.[8] The research and development was pursued by a large multi-disciplinary team at the IBM T. J. Watson Research Center, initially led by William R. Pulleyblank.[9]The project had two main goals: to advance understanding of the mechanisms behind protein folding via large-scale simulation, and to explore novel ideas in massively parallel machine architecture and software. Major areas of investigation included: how to use this novel platform to effectively meet its scientific goals, how to make such massively parallel machines more usable, and how to achieve performance targets at a reasonable cost, through novel machine architectures.

The initial design for Blue Gene was based on an early version of the Cyclops64 architecture, designed by Monty Denneau. In parallel, Alan Gara had started working on an extension of the QCDOC architecture into a more general-purpose supercomputer. The US Department of Energy started funding the development of this system and it became known as Blue Gene/L (L for Light). Development of the original Blue Gene architecture continued under the name Blue Gene/C (C for Cyclops) and, later, Cyclops64.

Architecture and chip logic design for the Blue Gene systems was done at the IBM T. J. Watson Research Center, chip design was completed and chips were manufactured by IBM Microelectronics, and the systems were built at IBM Rochester, MN.

In November 2004 a 16-rack system, with each rack holding 1,024 compute nodes, achieved first place in the TOP500 list, with a LINPACK benchmarks performance of 70.72 TFLOPS.[1] It thereby overtook NEC's Earth Simulator, which had held the title of the fastest computer in the world since 2002. From 2004 through 2007 the Blue Gene/L installation at LLNL[10] gradually expanded to 104 racks, achieving 478 TFLOPS Linpack and 596 TFLOPS peak. The LLNL BlueGene/L installation held the first position in the TOP500 list for 3.5 years, until in June 2008 it was overtaken by IBM's Cell-based Roadrunner system at Los Alamos National Laboratory, which was the first system to surpass the 1 PetaFLOPS mark.

While the LLNL installation was the largest Blue Gene/L installation, many smaller installations followed. The November 2006 TOP500 list showed 27 computers with the eServer Blue Gene Solution architecture. For example, three racks of Blue Gene/L were housed at the San Diego Supercomputer Center.

While the TOP500 measures performance on a single benchmark application, Linpack, Blue Gene/L also set records for performance on a wider set of applications. Blue Gene/L was the first supercomputer ever to run over 100 TFLOPS sustained on a real-world application, namely a three-dimensional molecular dynamics code (ddcMD), simulating solidification (nucleation and growth processes) of molten metal under high pressure and temperature conditions. This achievement won the 2005 Gordon Bell Prize.

In June 2006, NNSA and IBM announced that Blue Gene/L achieved 207.3 TFLOPS on a quantum chemical application (Qbox).[11] At Supercomputing 2006,[12] Blue Gene/L was awarded the winning prize in all HPC Challenge Classes of awards.[13] In 2007, a team from the IBM Almaden Research Center and the University of Nevada ran an artificial neural network almost half as complex as the brain of a mouse for the equivalent of a second (the network was run at 1/10 of normal speed for 10 seconds).[14]

The name Blue Gene comes from what it was originally designed to do, help biologists understand the processes of protein folding and gene development.[15] "Blue" is a traditional moniker that IBM uses for many of its products and the company itself. The original Blue Gene design was renamed "Blue Gene/C" and eventually Cyclops64. The "L" in Blue Gene/L comes from "Light" as that design's original name was "Blue Light". The "P" version was designed to be a petascale design. "Q" is just the letter after "P".[16]

The Blue Gene/L supercomputer was unique in the following aspects:[17]

The Blue Gene/L architecture was an evolution of the QCDSP and QCDOC architectures. Each Blue Gene/L Compute or I/O node was a single ASIC with associated DRAM memory chips. The ASIC integrated two 700 MHz PowerPC 440 embedded processors, each with a double-pipeline-double-precision Floating-Point Unit (FPU), a cache sub-system with built-in DRAM controller and the logic to support multiple communication sub-systems. The dual FPUs gave each Blue Gene/L node a theoretical peak performance of 5.6 GFLOPS (gigaFLOPS). The two CPUs were not cache coherent with one another.

Compute nodes were packaged two per compute card, with 16 compute cards (thus 32 nodes) plus up to 2 I/O nodes per node board. A cabinet/rack contained 32 node boards.[18] By the integration of all essential sub-systems on a single chip, and the use of low-power logic, each Compute or I/O node dissipated about 17 watts (including DRAMs). The low power per node allowed aggressive packaging of up to 1024 compute nodes, plus additional I/O nodes, in a standard 19-inch rack, within reasonable limits on electrical power supply and air cooling. The system performance metrics, in terms of FLOPS per watt, FLOPS per m2 of floorspace and FLOPS per unit cost, allowed scaling up to very high performance. With so many nodes, component failures were inevitable. The system was able to electrically isolate faulty components, down to a granularity of half a rack (512 compute nodes), to allow the machine to continue to run.

Each Blue Gene/L node was attached to three parallel communications networks: a 3D toroidal network for peer-to-peer communication between compute nodes, a collective network for collective communication (broadcasts and reduce operations), and a global interrupt network for fast barriers. The I/O nodes, which run the Linux operating system, provided communication to storage and external hosts via an Ethernet network. The I/O nodes handled filesystem operations on behalf of the compute nodes. A separate and private Ethernet management network provided access to any node for configuration, booting and diagnostics.

To allow multiple programs to run concurrently, a Blue Gene/L system could be partitioned into electronically isolated sets of nodes. The number of nodes in a partition had to be a positive integer power of 2, with at least 25 = 32 nodes. To run a program on Blue Gene/L, a partition of the computer was first to be reserved. The program was then loaded and run on all the nodes within the partition, and no other program could access nodes within the partition while it was in use. Upon completion, the partition nodes were released for future programs to use.

Blue Gene/L compute nodes used a minimal operating system supporting a single user program. Only a subset of POSIX calls was supported, and only one process could run at a time on a node in co-processor mode—or one process per CPU in virtual mode. Programmers needed to implement green threads in order to simulate local concurrency. Application development was usually performed in C, C++, or Fortran using MPI for communication. However, some scripting languages such as Ruby[19] and Python[20] have been ported to the compute nodes.

IBM published BlueMatter, the application developed to exercise Blue Gene/L, as open source.[21] This serves to document how the torus and collective interfaces were used by applications, and may serve as a base for others to exercise the current generation of supercomputers.

A Blue Gene/P node card

A schematic overview of a Blue Gene/P supercomputer

In June 2007, IBM unveiled Blue Gene/P, the second generation of the Blue Gene series of supercomputers and designed through a collaboration that included IBM, LLNL, and Argonne National Laboratory's Leadership Computing Facility.[22]

The design of Blue Gene/P is a technology evolution from Blue Gene/L. Each Blue Gene/P Compute chip contains four PowerPC 450 processor cores, running at 850 MHz. The cores are cache coherent and the chip can operate as a 4-way symmetric multiprocessor (SMP). The memory subsystem on the chip consists of small private L2 caches, a central shared 8 MB L3 cache, and dual DDR2 memory controllers. The chip also integrates the logic for node-to-node communication, using the same network topologies as Blue Gene/L, but at more than twice the bandwidth. A compute card contains a Blue Gene/P chip with 2 or 4 GB DRAM, comprising a "compute node". A single compute node has a peak performance of 13.6 GFLOPS. 32 Compute cards are plugged into an air-cooled node board. A rack contains 32 node boards (thus 1024 nodes, 4096 processor cores).[23]By using many small, low-power, densely packaged chips, Blue Gene/P exceeded the power efficiency of other supercomputers of its generation, and at 371 MFLOPS/W Blue Gene/P installations ranked at or near the top of the Green500 lists in 2007–2008.[2]

The following is an incomplete list of Blue Gene/P installations. Per November 2009, the TOP500 list contained 15 Blue Gene/P installations of 2-racks (2048 nodes, 8192 processor cores, 23.86 TFLOPS Linpack) and larger.[1]

The IBM Blue Gene/Q installation Mira at the Argonne National Laboratory, near Chicago, Illinois

The third design in the Blue Gene series, Blue Gene/Q, significantly expanded and enhanced on the Blue Gene/L and /P architectures.

The Blue Gene/Q "compute chip" is based on the 64-bit IBM A2 processor core. The A2 processor core is 4-way simultaneously multithreaded and was augmented with a SIMD quad-vector double-precision floating-point unit (IBM QPX). Each Blue Gene/Q compute chip contains 18 such A2 processor cores, running at 1.6 GHz. 16 Cores are used for application computing and a 17th core is used for handling operating system assist functions such as interrupts, asynchronous I/O, MPI pacing, and RAS. The 18th core is a redundant manufacturing spare, used to increase yield. The spared-out core is disabled prior to system operation. The chip's processor cores are linked by a crossbar switch to a 32 MB eDRAM L2 cache, operating at half core speed. The L2 cache is multi-versioned—supporting transactional memory and speculative execution—and has hardware support for atomic operations.[39] L2 cache misses are handled by two built-in DDR3 memory controllers running at 1.33 GHz. The chip also integrates logic for chip-to-chip communications in a 5D torus configuration, with 2 GB/s chip-to-chip links. The Blue Gene/Q chip is manufactured on IBM's copper SOI process at 45 nm. It delivers a peak performance of 204.8 GFLOPS while drawing approximately 55 watts. The chip measures 19×19 mm (359.5 mm²) and comprises 1.47 billion transistors. Completing the compute node, the chip is mounted on a compute card along with 16 GB DDR3 DRAM (i.e., 1 GB for each user processor core).[40]

A Q32[41] "compute drawer" contains 32 compute nodes, each water cooled.[42]A "midplane" (crate) contains 16 Q32 compute drawers for a total of 512 compute nodes, electrically interconnected in a 5D torus configuration (4x4x4x4x2). Beyond the midplane level, all connections are optical. Racks have two midplanes, thus 32 compute drawers, for a total of 1024 compute nodes, 16,384 user cores, and 16 TB RAM.[42]

Separate I/O drawers, placed at the top of a rack or in a separate rack, are air cooled and contain 8 compute cards and 8 PCIe expansion slots for InfiniBand or 10 Gigabit Ethernet networking.[42]

At the time of the Blue Gene/Q system announcement in November 2011,[43] an initial 4-rack Blue Gene/Q system (4096 nodes, 65536 user processor cores) achieved #17 in the TOP500 list[1] with 677.1 TeraFLOPS Linpack, outperforming the original 2007 104-rack BlueGene/L installation described above. The same 4-rack system achieved the top position in the Graph500 list[3] with over 250 GTEPS (giga traversed edges per second). Blue Gene/Q systems also topped the Green500 list of most energy efficient supercomputers with up to 2.1 GFLOPS/W.[2]

In June 2012, Blue Gene/Q installations took the top positions in all three lists: TOP500,[1] Graph500[3] and Green500.[2]

The following is an incomplete list of Blue Gene/Q installations. Per June 2012, the TOP500 list contained 20 Blue Gene/Q installations of 1/2-rack (512 nodes, 8192 processor cores, 86.35 TFLOPS Linpack) and larger.[1] At a (size-independent) power efficiency of about 2.1 GFLOPS/W, all these systems also populated the top of the June 2012 Green 500 list.[2]

Record-breaking science applications have been run on the BG/Q, the first to cross 10 petaflops of sustained performance. The cosmology simulation framework HACC achieved almost 14 petaflops with a 3.6 trillion particle benchmark run,[64] while the Cardioid code,[65][66] which models the electrophysiology of the human heart, achieved nearly 12 petaflops with a near real-time simulation, both on Sequoia. A fully compressible flow solver has also achieved 14.4 PFLOP/s (originally 11 PFLOP/s) on Sequoia, 72% of the machine's nominal peak performance.[67]

  1. ^ a b c d e f g h i "November 2004 - TOP500 Supercomputer Sites". Top500.org. Retrieved 13 December 2019.
  2. ^ a b c d e "Green500 - TOP500 Supercomputer Sites". Green500.org. Archived from the original on 26 August 2016. Retrieved 13 October 2017.
  3. ^ a b c "The Graph500 List". Archived from the original on 2011-12-27.
  4. ^ Harris, Mark (September 18, 2009). "Obama honours IBM supercomputer". Techradar.com. Retrieved 2009-09-18.
  5. ^ "Supercomputing Strategy Shifts in a World Without BlueGene". Nextplatform.com. 14 April 2015. Retrieved 13 October 2017.
  6. ^ "IBM to Build DoE's Next-Gen Coral Supercomputers - EE Times". EETimes. Archived from the original on 30 April 2017. Retrieved 13 October 2017.
  7. ^ Supercomputing 2020 conference, Test of Time award video presentation
  8. ^ "Blue Gene: A Vision for Protein Science using a Petaflop Supercomputer" (PDF). IBM Systems Journal. 40 (2). 2017-10-23.
  9. ^ "A Talk with the Brain behind Blue Gene", BusinessWeek, November 6, 2001, archived from the original on December 11, 2014
  10. ^ "BlueGene/L". Archived from the original on 2011-07-18. Retrieved 2007-10-05.
  11. ^ "hpcwire.com". Archived from the original on September 28, 2007.
  12. ^ "SC06". sc06.supercomputing.org. Retrieved 13 October 2017.
  13. ^ "HPC Challenge Award Competition". Archived from the original on 2006-12-11. Retrieved 2006-12-03.
  14. ^ "Mouse brain simulated on computer". BBC News. April 27, 2007. Archived from the original on 2007-05-25.
  15. ^ "IBM100 - Blue Gene". 03.ibm.com. 7 March 2012. Archived from the original on April 3, 2012. Retrieved 13 October 2017.
  16. ^ Kunkel, Julian M.; Ludwig, Thomas; Meuer, Hans (12 June 2013). Supercomputing: 28th International Supercomputing Conference, ISC 2013, Leipzig, Germany, June 16-20, 2013. Proceedings. Springer. ISBN 9783642387500. Retrieved 13 October 2017 – via Google Books.
  17. ^ "Blue Gene". IBM Journal of Research and Development. 49 (2/3). 2005.
  18. ^ Kissel, Lynn. "BlueGene/L Configuration". asc.llnl.gov. Archived from the original on 17 February 2013. Retrieved 13 October 2017.
  19. ^ "Compute Node Ruby for Bluegene/L". www.ece.iastate.edu. Archived from the original on February 11, 2009.
  20. ^ William Scullin (March 12, 2011). Python for High Performance Computing. Atlanta, GA.
  21. ^ Blue Matter source code, retrieved February 28, 2020
  22. ^ "IBM Triples Performance of World's Fastest, Most Energy-Efficient Supercomputer". 2007-06-27. Archived from the original on July 8, 2007. Retrieved 2011-12-24.
  23. ^ "Overview of the IBM Blue Gene/P project". IBM Journal of Research and Development. 52: 199–220. Jan 2008. doi:10.1147/rd.521.0199.
  24. ^ "Supercomputing: Jülich Amongst World Leaders Again". IDG News Service. 2007-11-12.
  25. ^ "IBM Press room - 2009-02-10 New IBM Petaflop Supercomputer at German Forschungszentrum Juelich to Be Europe's Most Powerful". 03.ibm.com. 2009-02-10. Archived from the original on February 12, 2009. Retrieved 2011-03-11.
  26. ^ "Argonne's Supercomputer Named World's Fastest for Open Science, Third Overall". Mcs.anl.gov. Archived from the original on 8 February 2009. Retrieved 13 October 2017.
  27. ^ "Rice University, IBM partner to bring first Blue Gene supercomputer to Texas". news.rice.edu. Archived from the original on 2012-04-05. Retrieved 2012-04-01.
  28. ^ Вече си имаме и суперкомпютър Archived 2009-12-23 at the Wayback Machine, Dir.bg, 9 September 2008
  29. ^ "IBM Press room - 2010-02-11 IBM to Collaborate with Leading Australian Institutions to Push the Boundaries of Medical Research - Australia". 03.ibm.com. 2010-02-11. Archived from the original on July 16, 2012. Retrieved 2011-03-11.
  30. ^ "Rutgers Gets Big Data Weapon in IBM Supercomputer - Hardware -". Archived from the original on 2013-03-06. Retrieved 2013-09-07.
  31. ^ "University of Rochester and IBM Expand Partnership in Pursuit of New Frontiers in Health". University of Rochester Medical Center. May 11, 2012. Archived from the original on 2012-05-11.
  32. ^ "IBM and Universiti Brunei Darussalam to Collaborate on Climate Modeling Research". IBM News Room. 2010-10-13. Archived from the original on December 12, 2010. Retrieved 18 October 2012.
  33. ^ Ronda, Rainier Allan. "DOST's supercomputer for scientists now operational". Philstar.com. Retrieved 13 October 2017.
  34. ^ "Topalov training with super computer Blue Gene P". Players.chessdo.com. Archived from the original on 19 May 2013. Retrieved 13 October 2017.
  35. ^ Kaku, Michio. Physics of the Future (New York: Doubleday, 2011), 91.
  36. ^ "Project Kittyhawk: A Global-Scale Computer". Research.ibm.com. Retrieved 13 October 2017.
  37. ^ Appavoo, Jonathan; Uhlig, Volkmar; Waterland, Amos. "Project Kittyhawk: Building a Global-Scale Computer" (PDF). Yorktown Heights, NY: IBM T.J. Watson Research Center. Archived from the original on 2008-10-31. Retrieved 2018-03-13.{{[cite web](/wiki/Template:Cite%5Fweb "Template:Cite web")}}: CS1 maint: bot: original URL status unknown (link)
  38. ^ "Rutgers-led Experts Assemble Globe-Spanning Supercomputer Cloud". News.rutgers.edu. 2011-07-06. Archived from the original on 2011-11-10. Retrieved 2011-12-24.
  39. ^ "Memory Speculation of the Blue Gene/Q Compute Chip". Retrieved 2011-12-23.
  40. ^ "The Blue Gene/Q Compute chip" (PDF). Archived from the original (PDF) on 2015-04-29. Retrieved 2011-12-23.
  41. ^ "IBM Blue Gene/Q supercomputer delivers petascale computing for high-performance computing applications" (PDF). 01.ibm.com. Retrieved 13 October 2017.
  42. ^ a b c "IBM uncloaks 20 petaflops BlueGene/Q super". The Register. 2010-11-22. Retrieved 2010-11-25.
  43. ^ "IBM announces 20-petaflops supercomputer". Kurzweil. 18 November 2011. Retrieved 13 November 2012. IBM has announced the Blue Gene/Q supercomputer, with peak performance of 20 petaflops
  44. ^ Feldman, Michael (2009-02-03). "Lawrence Livermore Prepares for 20 Petaflop Blue Gene/Q". HPCwire. Archived from the original on 2009-02-12. Retrieved 2011-03-11.
  45. ^ B Johnston, Donald (2012-06-18). "NNSA's Sequoia supercomputer ranked as world's fastest". Archived from the original on 2014-09-02. Retrieved 2012-06-23.
  46. ^ "TOP500 Press Release". Archived from the original on June 24, 2012.
  47. ^ "MIRA: World's fastest supercomputer - Argonne Leadership Computing Facility". Alcf.anl.gov. Retrieved 13 October 2017.
  48. ^ "Mira - Argonne Leadership Computing Facility". Alcf.anl.gov. Retrieved 13 October 2017.
  49. ^ "Vulcan—decommissioned". hpc.llnl.gov. Retrieved 10 April 2019.
  50. ^ "HPC Innovation Center". hpcinnovationcenter.llnl.gov. Retrieved 13 October 2017.
  51. ^ "Lawrence Livermore's Vulcan brings 5 petaflops computing power to collaborations with industry and academia to advance science and technology". Llnl.gov. 11 June 2013. Archived from the original on 9 December 2013. Retrieved 13 October 2017.
  52. ^ "Ibm-Fermi | Scai". Archived from the original on 2013-10-30. Retrieved 2013-05-13.
  53. ^ "DiRAC BlueGene/Q". epcc.ed.ac.uk.
  54. ^ "Rensselaer at Petascale: AMOS Among the World's Fastest and Most Powerful Supercomputers". News.rpi.edu. Retrieved 13 October 2017.
  55. ^ Michael Mullaneyvar. "AMOS Ranks 1st Among Supercomputers at Private American Universities". News.rpi.edi. Retrieved 13 October 2017.
  56. ^ "World's greenest supercomputer comes to Melbourne - The Melbourne Engineer". Themelbourneengineer.eng.unimelb.edu.au/. 16 February 2012. Archived from the original on 2 October 2017. Retrieved 13 October 2017.
  57. ^ "Melbourne Bioinformatics - For all researchers and students based in Melbourne's biomedical and bioscience research precinct". Melbourne Bioinformatics. Retrieved 13 October 2017.
  58. ^ "Access to High-end Systems - Melbourne Bioinformatics". Vlsci.org.au. Retrieved 13 October 2017.
  59. ^ "University of Rochester Inaugurates New Era of Health Care Research". Rochester.edu. Retrieved 13 October 2017.
  60. ^ "Resources - Center for Integrated Research Computing". Circ.rochester.edu. Retrieved 13 October 2017.
  61. ^ "EPFL BlueGene/L Homepage". Archived from the original on 2007-12-10. Retrieved 2021-03-10.
  62. ^ Utilisateur, Super. "À propos". Cadmos.org. Archived from the original on 10 January 2016. Retrieved 13 October 2017.
  63. ^ "A*STAR Computational Resource Centre". Acrc.a-star.edu.sg. Archived from the original on 2016-12-20. Retrieved 2016-08-24.
  64. ^ S. Habib; V. Morozov; H. Finkel; A. Pope; K. Heitmann; K. Kumaran; T. Peterka; J. Insley; D. Daniel; P. Fasel; N. Frontiere & Z. Lukic (2012). "The Universe at Extreme Scale: Multi-Petaflop Sky Simulation on the BG/Q". arXiv:1211.4864 [cs.DC].
  65. ^ "Cardioid Cardiac Modeling Project". Researcher.watson.ibm.com. 25 July 2016. Archived from the original on 21 May 2013. Retrieved 13 October 2017.
  66. ^ "Venturing into the Heart of High-Performance Computing Simulations". Str.llnl.gov. Archived from the original on 14 February 2013. Retrieved 13 October 2017.
  67. ^ Rossinelli, Diego; Hejazialhosseini, Babak; Hadjidoukas, Panagiotis; Bekas, Costas; Curioni, Alessandro; Bertsch, Adam; Futral, Scott; Schmidt, Steffen J.; Adams, Nikolaus A.; Koumoutsakos, Petros (17 November 2013). "11 PFLOP/S simulations of cloud cavitation collapse". Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. SC '13. pp. 1–13. doi:10.1145/2503210.2504565. ISBN 9781450323789. S2CID 12651650.
Records
Preceded byNEC Earth Simulator35.86 teraflops World's most powerful supercomputerBlue Gene/L70.72 - 478.20 teraflops November 2004 – November 2007 Succeeded byIBM Roadrunner1.026 petaflops
Preceded byFujitsu K computer10.51 petaflops Blue Gene/Q16.32 petaflops June 2012 – November 2012 Succeeded byCray Titan17.59 petaflops