Portable Hardware Locality (hwloc) (original) (raw)
![]()
![]()
Upgrading to v2.0 API![]()
Guide for Porting your Code
![]()
XML topology database![]()
Repository of XML topologies
![]()
The Best of lstopo![]()
Best lstopo graphical outputs
![]()
The Portable Hardware Locality (hwloc) software package provides aportable abstraction (across OS, versions, architectures, ...) of thehierarchical topology of modern architectures, including NUMA memory nodes (DRAM, HBM, non-volatile memory, CXL, etc.), processor packages, shared caches, cores and simultaneous multithreading. It also gathers various system attributes such as cache and memory information as well as the locality of I/O devices such as network interfaces, InfiniBand HCAs or GPUs.
hwloc primarily aims at helping applications with gathering information about increasingly complex parallel computing platforms so as to exploit them accordingly and efficiently. For instance, two tasks that tightly cooperate should probably be placed onto cores sharing a cache. However, two independent memory-intensive tasks should better be spread out onto different processor packages so as to maximize their memory throughput. As described in this paper, OpenMP threads have to be placed according to their affinities and to the hardware characteristics. MPI implementations apply similar techniques while also adapting their communication strategies to the network locality as described inthis paperor this one.
hwloc may also help many applications just by providinga portable CPU and memory binding APIand a reliable way tofind out how many cores and/or hardware threads are available.
Portability and support
hwloc supports the following operating systems:
- Linux (with knowledge of cgroups, heterogeneous memory, hybrid CPUs, offline CPUs, ScaleMP vSMP, and NumaScale NumaConnect) on all supported hardware.
- Solaris, AIX and HP-UX
- NetBSD, FreeBSD and kFreeBSD/GNU
- Darwin / OS X
- Microsoft Windows (either using MinGW, Cygwin, CMake, or a native Visual Studio solution)
- IBM BlueGene/Q Compute Node Kernel (CNK)
- Android
Additionally hwloc can detect the locality PCI devices as well as software devices to manipulate accelerators (OpenCL, NVIDIA CUDA, AMD ROCm, Intel LevelZero, NEC Vector Engine, etc.), network and InfiniBand interfaces, etc. See the Best of lstopo for more examples of supported platforms. The topologies of many existing platforms are also available in theXML topology databasefor testing your software on architectures you don't have access to.
hwloc may display the topology in multiple convenient formats (seev2.12.2 examples and the Best of lstopo). It also offers a powerful programming interface to gather information about the hardware, bind processes, and much more.
Since it uses standard Operating System information, hwloc's support is almost always independent from the processor type (x86, ARM, RISC-V, POWER, etc), and just relies on the Operating System support. Whenever the OS does not support topology information (e.g. some BSDs), hwloc uses an x86-only CPUID-based backend.
To check whether hwloc works on a particular machine, just try to build it and run lstopo or lstopo-no-graphics. If some things do not look right (e.g. bogus or missing cache information), see Questions and bugs below
Documentation
More details are available in the Documentation(in both PDF and HTML). The documentation for each version containsexamples of outputs and an API interface example (these links are for v2.12.2).
The materials from several hwloc tutorials areavailable online.
Getting and using hwloc
hwloc is open-source, available under theBSD license.
The latest **hwloc releases are available on thedownload page.**The GIT repository is also accessible foronline browsingor checkout.
The version string of the latest release is available from thelatest_release.txt link. For the latest release or release candidate, rather use thelatest_snapshot.txt link. For the latest on a specific series, replace "current" with "v2.9" in these links for instance.
hwloc is already available as official packages for many Linux distributions (at least Debian/Ubuntu, Fedora/RHEL, SUSE, ArchLinux, Slackware, Gentoo and their derivatives), as well as NetBSD, FreeBSD, Cygwin, Mac OS X ports (Homebrew), Windows vcpkgand HP-UX. It is also available as EasyBuild and Spack packages. The lstopo Android app is available in thePlay Store and in F-Droid.
The following langages also have dedicated bindings:
- Julia on GitHub (thanks to Erik Schnetter).
- Perl on CPAN (thanks to Bernd Kallies).
- Python git tree (thanks to Guy Streeter).
- Rust on GitHub (thanks to Michael Nitschinger).
The following software already benefit from hwloc or are being ported to it:
- MPI implementations:
- Open MPI.
- The MPICH process launcher Hydra.
- MVAPICH2.
- CEA and Paratools' Multiprocessor Computing framework (MPC).
- The Newmadeleine communication library.
- Runtime systems and compilers:
- The Chapel Parallel Programming Language.
- The Legion Programming System.
- The StarPU runtime system for heterogeneous multicore architectures.
- The Parallel Runtime Scheduling and Execution Controller (PaRSEC) project.
- The Nanos++ runtime library for OmpSs.
- The High Performance ParalleX (HPX) runtime system.
- LLVM's OpenMP runtime.
- The Kokkos C++ Performance Portability Programming EcoSystem.
- Intel's oneTBB project.
- The memkind heap manager.
- The Qthreads project.
- The Rose compiler.
- The Portable Computing Language (POCL).
- The H2M runtime for managing heterogeneous memory.
- The former ForestGOMP OpenMP platform for hierarchical architectures.
- Parallel scientific applications, libraries and toolkits:
- The Gromacs software suite for high-performance molecular dynamics.
- The CP2K quantum chemistry and solid state physics software package.
- The Feel++ library for partial differential equations.
- The Parallel Sparse matriX (PaStiX) package.
- The Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) project.
- The Portable Extensible Toolkit for Scientific Computation (PETSc).
- The librsb sparse linear algebra library.
- The Magma dense linear algebra library.
- Resource manager and job schedulers:
- The SLURM workload manager.
- The Open Grid Scheduler.
- The TORQUE resource manager.
- Univa Grid Engine.
- Altair's PBS Professional solution (part of PBS Works).
- The Ceph distributed storage system.
- Performance analysis and debugging tools:
- The performance-oriented tool suite LIKWID.
- The Modular Assembly Quality Analyzer and Optimizer (MAQAO).
- The interactive process viewer htop for Linux.
- The parallel job inspector Padb.
- The EasyPAP parallel-programming learning environment.
- and even more!
- The TensorFlow library for numerical computation using data flow graphs.
- Apache Traffic Server.
- The Scylla NoSQL server.
- Global Energy Optimization Power Management (GEOPM).
- The Reference Implementation of the HPC Power API.
- The OpenPMIx implementation of the Process Management Interface Exascale (PMIx) standard.
- The aircrack-ng WiFi network security assesser.
Questions and bugs
Bugs should be reported inthe tracker. Opening a new issue automatically displays lots of hints about how to debug and report issues.
See also thewiki page about Linux kernel bugs (or BIOS bugs) affecting locality information in hwloc.
Questions may be sent to the users or developersmailing lists.
There is also a #hwloc IRC channel on Libera Chat (irc.libera.chat) and Freenode (irc.freenode.net).
Publications
For a general-purpose hwloc citations, please use the following one. This paper introduces hwloc, its goals and its implementation. It then shows how hwloc may be used by MPI implementations and OpenMP runtime systems as a way to carefully place processes and adapt communication strategies to the underlying hardware.
François Broquedis, Jérôme Clet-Ortega, Stéphanie Moreaud, Nathalie Furmento, Brice Goglin, Guillaume Mercier, Samuel Thibault, and Raymond Namyst.hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications. In Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010), Pisa, Italia, February 2010. IEEE Computer Society Press.https://hal.inria.fr/inria-00429889
For citing how hwloc deals with new heterogeneous memory hierarchies (Knights Landing's MCDRAM, high-bandwidth memory (HBM), non-volatile memory (NVDIMM), etc), use this paper:
Brice Goglin and Andrès Rubio Proaño.Using Performance Attributes for Managing Heterogeneous Memory in HPC Applications. In Proceedings of the 23rd IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2022), held in conjunction with IPDPS 2022, Lyon, France, May 2022.https://hal.inria.fr/hal-03599360
When discussing the overhead of topology discovery and why XML or synthetic topologies are useful, use this paper:
Brice Goglin.On the Overhead of Topology Discovery for Locality-aware Scheduling in HPC. In Proceedings of the 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2017), St Petersburg, Russia, March 2017.https://hal.inria.fr/hal-01402755
About the memory footprint of hwloc and the new shmem topology API in hwloc 2.0:
Brice Goglin.Memory Footprint of Locality Information on Many-Core Platforms. In Proceedings of the 6th Workshop on Runtime and Operating Systems for the Many-core Era (ROME 2018), held in conjunction with IPDPS, Vancouvert, BC, Canada, May 2018.https://hal.inria.fr/hal-01644087
For citing hwloc's I/O device locality and cluster/multi-node support, please use the following one instead. This paper explains how I/O locality is managed in hwloc, how device details are represented, how hwloc interacts with other libraries, and how multiple nodes such as a cluster can be efficiently managed.
Brice Goglin.Managing the Topology of Heterogeneous Cluster Nodes with Hardware Locality (hwloc). In Proceedings of 2014 International Conference on High Performance Computing & Simulation (HPCS 2014), Bologna, Italy, July 2014.https://hal.inria.fr/hal-00985096
For citing hwloc's hierarchical modeling of computing, memory and I/O resources as well as multi-node support, use this paper:
Brice Goglin.Towards the Structural Modeling of the Topology of next-generation heterogeneous cluster Nodes with hwloc. Inria, November 2016.https://hal.inria.fr/hal-01400264
History / credits
hwloc is the evolution and merger of the libtopology andPortable Linux Processor Affinity (PLPA) projects. Because of functional and ideological overlap, these two code bases and ideas were merged and released under the name "hwloc" as an Open MPI sub-project. hwloc is now mostly developed by the TADaaM team at Inria (Bordeaux, France).
libtopology was initially developed by the Inria Runtime team-project as a way to discover hardware affinities inside the Marcel threading library. With the advent of multicore machines, this work became interesting for much more than multithreading. So libtopology was extracted from Marcel and became an independent library.
Portability tests are performed thanks to the Inria Continuous Integration platform.
How do you pronounce "hwloc"?
When in doubt, say "hardware locality."
Some of the core developers say "H. W. Loke"; others say "H. W. Lock". We've heard several other pronunciations as well. We don't really have a strong preference for how you say it; we chose the name for its Google-ability, not its pronunciation.
But now at least you know how we pronounce it. :-)
