Benchmarking Highly Parallel Hardware for Spiking Neural Networks in Robotics (original) (raw)

Brian2GeNN: accelerating spiking neural network simulations with graphics hardware

Scientific Reports, 2020

is a popular Python-based simulator for spiking neural networks, commonly used in computational neuroscience. GeNN is a C++-based meta-compiler for accelerating spiking neural network simulations using consumer or high performance grade graphics processing units (GPUs). Here we introduce a new software package, Brian2GeNN, that connects the two systems so that users can make use of GeNN GPU acceleration when developing their models in Brian, without requiring any technical knowledge about GPUs, C++ or GeNN. The new Brian2GeNN software uses a pipeline of code generation to translate Brian scripts into C++ code that can be used as input to GeNN, and subsequently can be run on suitable NVIDIA GPU accelerators. From the user's perspective, the entire pipeline is invoked by adding two simple lines to their Brian scripts. We have shown that using Brian2GeNN, two non-trivial models from the literature can run tens to hundreds of times faster than on CPU. GPU acceleration emerged when creative academics discovered that modern graphics processing units (GPUs) could be used to execute general purpose algorithms, e.g. for neural network simulations 1,2. The real revolution occurred when NVIDIA corporation embraced the idea of GPUs as general purpose computing accelerators and developed the CUDA application programming interface 3 in 2006. Since then, GPU acceleration has become a major factor in high performance computing and has fueled much of the recent renaissance in artificial intelligence. One of the remaining challenges when using GPU acceleration is the high degree of insight into GPU computing architecture and careful optimizations needed in order to achieve good acceleration, in spite of the abstractions that CUDA offers. A number of simulators have used GPUs to accelerate spiking neural network simulations, but the majority do not allow for easily defining new models, relying instead on a fixed set of existing models 4-8. Since 2010 we have been developing the GPU enhanced neuronal networks (GeNN) framework 9 that uses code generation techniques 10,11 to simplify the use of GPU accelerators for the simulation of spiking neural networks. GPUs, and in particular GeNN, have been shown to enable efficient simulations compared to CPUs and even compared to dedicated neuromorphic hardware 12. Other simulators that have taken this code generation approach are Brian2CUDA 13 (currently under development) and ANNarchy 14 (Linux only). Brian is a general purpose simulator for spiking neural networks written in Python, with the aim of simplifying the process of developing models 15-17. Version 2 of Brian 18 introduced a code generation framework 10,19 to allow for higher performance than was possible in pure Python. The design separates the Brian front-end (written in Python) from the back-end computational engine (multiple possibilities in different languages, including C++), and allows for the development of third party packages to add new back-ends. Here, we introduce the Brian2GeNN software interface we have developed to allow running Brian models on a GPU via GeNN. We analysed the performance for some typical models and find that-depending on the CPU and GPU used-performance can be tens to hundreds of times faster. Results We benchmarked Brian2GeNN on two model networks that we named "COBAHH" and "Mbody". COBAHH is an implementation of a benchmark network described by Brette et al. 20 (for details, see Methods). Essentially, this benchmark model consists of N Hodgkin-Huxley-type neurons, modified from the model by Traub and Miles 21 , 80% of which form excitatory synapses and 20% inhibitory synapses. All neurons were connected to all

Efficient simulation of large-scale Spiking Neural Networks using CUDA graphics processors

2009 International Joint Conference on Neural Networks, 2009

Neural network simulators that take into account the spiking behavior of neurons are useful for studying brain mechanisms and for engineering applications. Spiking Neural Network (SNN) simulators have been traditionally simulated on large-scale clusters, super-computers, or on dedicated hardware architectures. Alternatively, Graphics Processing Units (GPUs) can provide a low-cost, programmable, and highperformance computing platform for simulation of SNNs. In this paper we demonstrate an efficient, Izhikevich neuron based large-scale SNN simulator that runs on a single GPU. The GPU-SNN model (running on an NVIDIA GTX-280 with 1GB of memory), is up to 26 times faster than a CPU version for the simulation of 100K neurons with 50 Million synaptic connections, firing at an average rate of 7Hz. For simulation of 100K neurons with 10 Million synaptic connections, the GPU-SNN model is only 1.5 times slower than real-time. Further, we present a collection of new techniques related to parallelism extraction, mapping of irregular communication, and compact network representation for effective simulation of SNNs on GPUs. The fidelity of the simulation results were validated against CPU simulations using firing rate, synaptic weight distribution, and inter-spike interval analysis. We intend to make our simulator available to the modeling community so that researchers will have easy access to large-scale SNN simulations.

Brian2GeNN: a system for accelerating a large variety of spiking neural networks with graphics hardware

2018

"Brian" is a popular Python-based simulator for spiking neural networks, commonly used in computational neuroscience. GeNN is a C++-based meta-compiler for accelerating spiking neural network simulations using consumer or high performance grade graphics processing units (GPUs). Here we introduce a new software package, Brian2GeNN, that connects the two systems so that users can make use of GeNN GPU acceleration when developing their models in Brian, without requiring any technical knowledge about GPUs, C++ or GeNN. The new Brian2GeNN software uses a pipeline of code generation to translate Brian scripts into C++ code that can be used as input to GeNN, and subsequently can be run on suitable NVIDIA GPU accelerators. From the user's perspective, the entire pipeline is invoked by adding two simple lines to their Brian scripts. We have shown that using Brian2GeNN, typical models can run tens to hundreds of times faster than on CPU.

Power, Energy and Speed of Embedded and Server Multi-Cores applied to Distributed Simulation of Spiking Neural Networks: ARM in NVIDIA Tegra vs Intel Xeon quad-cores

This short note regards a comparison of instantaneous power, total energy consumption, execution time and energetic cost per synaptic event of a spiking neural network simulator (DPSNN-STDP) distributed on MPI processes when executed either on an embedded platform (based on a dual socket quad-core ARM platform) or a server platform (INTEL-based quad-core dual socket platform). We also compare the measure with those reported by leading custom and semi-custom designs: TrueNorth and SpiNNaker. In summary, we observed that: 1- we spent 2.2 micro-Joule per simulated event on the "embedded platform", approx. 4.4 times lower than what was spent by the "server platform"; 2- the instantaneous power consumption of the "embedded platform" was 14.4 times better than the "server" one; 3- the server platform is a factor 3.3 faster. The "embedded platform" is made of NVIDIA Jetson TK1 boards, interconnected by Ethernet, each mounting a Tegra K1 chi...

Comparing Neuromorphic Solutions in Action: Implementing a Bio-Inspired Solution to a Benchmark Classification Task on Three Parallel-Computing Platforms

Frontiers in Neuroscience, 2016

Neuromorphic computing employs models of neuronal circuits to solve computing problems. Neuromorphic hardware systems are now becoming more widely available and "neuromorphic algorithms" are being developed. As they are maturing toward deployment in general research environments, it becomes important to assess and compare them in the context of the applications they are meant to solve. This should encompass not just task performance, but also ease of implementation, speed of processing, scalability, and power efficiency. Here, we report our practical experience of implementing a bio-inspired, spiking network for multivariate classification on three different platforms: the hybrid digital/analog Spikey system, the digital spike-based SpiNNaker system, and GeNN, a meta-compiler for parallel GPU hardware. We assess performance using a standard hand-written digit classification task. We found that whilst a different implementation approach was required for each platform, classification performances remained in line. This suggests that all three implementations were able to exercise the model's ability to solve the task rather than exposing inherent platform limits, although differences emerged when capacity was approached. With respect to execution speed and power consumption, we found that for each platform a large fraction of the computing time was spent outside of the neuromorphic device, on the host machine. Time was spent in a range of combinations of preparing the model, encoding suitable input spiking data, shifting data, and decoding spike-encoded results. This is also where a large proportion of the total power was consumed, most markedly for the SpiNNaker and Spikey systems. We conclude that the simulation efficiency advantage of the assessed specialized hardware systems is easily lost in excessive host-device communication, or non-neuronal parts of the computation. These results emphasize the need to optimize the host-device communication architecture for scalability, maximum throughput, and minimum latency. Moreover, our results indicate that special attention should be paid to minimize host-device communication when designing and implementing networks for efficient neuromorphic computing.

GPUs Outperform Current HPC and Neuromorphic Solutions in Terms of Speed and Energy When Simulating a Highly-Connected Cortical Model

Frontiers in Neuroscience, 2018

While neuromorphic systems may be the ultimate platform for deploying spiking neural networks (SNNs), their distributed nature and optimization for specific types of models makes them unwieldy tools for developing them. Instead, SNN models tend to be developed and simulated on computers or clusters of computers with standard von Neumann CPU architectures. Over the last decade, as well as becoming a common fixture in many workstations, NVIDIA GPU accelerators have entered the High Performance Computing field and are now used in 50 % of the Top 10 super computing sites worldwide. In this paper we use our GeNN code generator to re-implement two neo-cortex-inspired, circuit-scale, point neuron network models on GPU hardware. We verify the correctness of our GPU simulations against prior results obtained with NEST running on traditional HPC hardware and compare the performance with respect to speed and energy consumption against published data from CPU-based HPC and neuromorphic hardware. A full-scale model of a cortical column can be simulated at speeds approaching 0.5× real-time using a single NVIDIA Tesla V100 accelerator-faster than is currently possible using a CPU based cluster or the SpiNNaker neuromorphic system. In addition, we find that, across a range of GPU systems, the energy to solution as well as the energy per synaptic event of the microcircuit simulation is as much as 14× lower than either on SpiNNaker or in CPU-based simulations. Besides performance in terms of speed and energy consumption of the simulation, efficient initialization of models is also a crucial concern, particularly in a research context where repeated runs and parameter-space exploration are required. Therefore, we also introduce in this paper some of the novel parallel initialization methods implemented in the latest version of GeNN and demonstrate how they can enable further speed and energy advantages.

POETS: A Parallel Cluster Architecture for Spiking Neural Network

International Journal of Machine Learning and Computing

Spiking Neural Networks (SNNs) are known as a branch of neuromorphic computing and are currently used in neuroscience applications to understand and model the biological brain. SNNs could also potentially be used in many other application domains such as classification, pattern recognition, and autonomous control. This work presents a highly-scalable hardware platform called POETS, and uses it to implement SNN on a very large number of parallel and reconfigurable FPGA-based processors. The current system consists of 48 FPGAs, providing 3072 processing cores and 49152 threads. We use this hardware to implement up to four million neurons with one thousand synapses. Comparison to other similar platforms shows that the current POETS system is twenty times faster than the Brian simulator, and at least two times faster than SpiNNaker.

A Large Scale Digital Simulation of Spiking Neural Networks (SNN) on Fast SystemC Simulator

UKSIM, 2012

Since biological neural systems contain big number of neurons working in parallel, simulation of such dynamic system is a real challenge. The main objective of this paper is to speed up the simulation performance of SystemC designs at the RTL abstraction level using the high degree of parallelism afforded by graphics processors (GPUs) for large scale SNN with proposed structure in pattern classification field. Simulation results show 100 times speedup for the proposed SNN structure on the GPU compared with the CPU version. In addition, CPU memory has problems when trained for more than 120K cells but GPU can simulate up to 40 million neurons.

Spiking Neurons Computing Platform

Lecture Notes in Computer Science, 2005

A computing platform is described for simulating arbitrary networks of spiking neurons in real time. A hybrid computing scheme is adopted that uses both software and hardware components. We focus on conductance-based models for neurons that emulate the temporal dynamics of the synaptic integration process. We have designed an efficient computing architecture using reconfigurable hardware in which the different stages of the neuron model are processed in parallel (using a customized pipeline structure). Further improvements occur by computing multiple neurons in parallel using multiple processing units. The computing platform is described and its scalability and performance evaluated. The goal is to investigate biologically realistic models for the control of robots operating within closed perception-action loops.