Graham Riley | The University of Manchester (original) (raw)
Papers by Graham Riley
FPGAs have been around for over 30 years and are a viable accelerator for compute-intensive workl... more FPGAs have been around for over 30 years and are a viable accelerator for compute-intensive workloads on HPC systems. The adoption of FPGAs for scientific applications has been stimulated recently by the emergence of better programming environments such as High-Level Synthesis (HLS) and OpenCL available through the Xilinx SDSoC design tool. The mapping of the multi-level concurrency available within applications onto HPC systems with FPGAs is a challenge. OpenCL and HLS provide different mechanisms for exploiting concurrency within a node leading to a concurrency mapping design problem. In addition to considering the performance of different mappings, there are also questions of resource usage, programmability (development effort), ease-of-use and robustness. This paper examines the concurrency levels available in a case study kernel from a shallow water model and explores the programming options available in OpenCL and HLS. We conclude that the use of SDSoC Dataflow over functions ...
On June 30, 2020, the Workshop on Emerging Technologies for Weather and Climate Modelling<br&g... more On June 30, 2020, the Workshop on Emerging Technologies for Weather and Climate Modelling<br> was held as a virtual event within the framework of ESiWACE2, the Centre of Excellence in<br> Simulation of Weather and Climate in Europe.<br> The workshop organized by Giovanni Aloisio (CMCC), Graham Riley (UNIMAN), Carlos Osuna<br> (METEOSWISS) and Sandro Fiore (CMCC), was hosted by DKRZ with the local support by Dela<br> Spickermann and Florian Ziemen under the supervision of the ESiWACE2 Coordinator Joachim<br> Biercamp. The workshop was funded by the Horizon 2020 project ESiWACE2. Due to the situation<br> with COVID-19, the event was held as a virtual conference with approximately 143 participants<br> mainly from Europe and the US, but also from Brazil, India and Israel.<br> The workshop brought together scientists from the fields of earth system modeling, machine<br> learning, exascale hardware/computing, and programming mode...
Proceedings of the Computing Frontiers Conference, 2017
2019 International Conference on Robotics and Automation (ICRA), 2019
The Time Reversal Mirror (TRM) typically composes of an array of transducers. These transducers a... more The Time Reversal Mirror (TRM) typically composes of an array of transducers. These transducers are used for recording propagations reflected off from a source, which are then processed by and re-emitted from the transducers back to the source location known as the time reversed propagation. The TRM technique has been studied in acoustics, water, and electromagnetic waves for various applications; commonly in medical imaging [1], nondestructive testing [2], underwater target detection [3], and seismic sources localization [4]. Although TRM is a well studied method, it is difficult to achieve perfect time reversed propagations converging to the source with high resolution due to the limited spatial sampling available [5]. Additionally, in medical applications, the heterogeneous lossy characteristic of human tissue is responsible for attenuation and dispersion, reducing the potential of obtaining perfect spatial resolution and accuracy at the source [6].
Energy use is a key concern when deploying deep learning models on mobile and embedded platforms.... more Energy use is a key concern when deploying deep learning models on mobile and embedded platforms. Current studies develop energy predictive models based on application-level features to provide researchers a way to estimate the energy consumption of their deep learning models. This information is useful for building resource-aware models that can make efficient use of the hard-ware resources. However, previous works on predictive modelling provide little insight into the trade-offs involved in the choice of features on the final predictive model accuracy and model complexity. To address this issue, we provide a comprehensive analysis of building regression-based predictive models for deep learning on mobile devices, based on empirical measurements gathered from the SyNERGY framework.Our predictive modelling strategy is based on two types of predictive models used in the literature:individual layers and layer-type. Our analysis of predictive models show that simple layer-type feature...
Scientific Programming, 2019
In recent years, there has been renewed interest in the use of field-programmable gate arrays (FP... more In recent years, there has been renewed interest in the use of field-programmable gate arrays (FPGAs) for high-performance computing (HPC). In this paper, we explore the techniques required by traditional HPC programmers in porting HPC applications to FPGAs, using as an example the LFRic weather and climate model. We report on the first steps in porting LFRic to the FPGAs of the EuroExa architecture. We have used Vivado High-Level Syntheusywwi to implement a matrix-vector kernel from the LFRic code on a Xilinx UltraScale+ development board containing an XCZU9EG multiprocessor system-on-chip. We describe the porting of the code, discuss the optimization decisions, and report performance of 5.34 Gflop/s with double precision and 5.58 Gflop/s with single precision. We discuss sources of inefficiencies, comparisons with peak performance, comparisons with CPU and GPU performance (taking into account power and price), comparisons with published techniques, and comparisons with published p...
Journal of Parallel and Distributed Computing, 2019
Geoscientific Model Development, 2012
Geoscientific Model Development Discussions, 2017
We present an approach which we call PSyKAl that is designed to achieve portable performance for ... more We present an approach which we call PSyKAl that is designed to achieve portable performance for parallel, finite-difference Ocean models. In PSyKAl the code related to the underlying science is formally separated from code related to parallelisation and single-core optimisations. This separation of concerns allows scientists to code their science independently of the underlying hardware architecture and for optimisation specialists to be able to tailor the code for a particular machine independently of the science code. We have taken the free-surface part of the NEMO ocean model and created a new, shallow-water model named NEMOLite2D. In doing this we have a code which is of a manageable size and yet which incorporates elements of full ocean models (input/output, boundary conditions, <i>etc.</i>). We have then manually constructed a PSyKAl version of this code and investigated the transformations that must be applied to the middle/PSy layer in order to achieve good perf...
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the ACM International Conference on Computing Frontiers - CF '16, 2016
Computational Statistics Data Analysis, Oct 28, 1999
A portable parallel implementation of MLn, a multilevel modelling package, for shared memory para... more A portable parallel implementation of MLn, a multilevel modelling package, for shared memory parallel machines is described. Particular attention is paid to cross-classified and multiple membership models, which are more computationally demanding than those with simple hierarchical structure. Performance results are presented for a range of shared-memory parallel architectures, demonstrating a significant increase in the size of models which can
Concurrency and Computation Practice and Experience Eds J Gurd T Hey J Papay and G Riley 2005 17 2 4 95 98, 2005
To develop a good parallel implementation requires understanding of where run-time is spent and c... more To develop a good parallel implementation requires understanding of where run-time is spent and comparing this to some realistic best possible time.
FPGAs have been around for over 30 years and are a viable accelerator for compute-intensive workl... more FPGAs have been around for over 30 years and are a viable accelerator for compute-intensive workloads on HPC systems. The adoption of FPGAs for scientific applications has been stimulated recently by the emergence of better programming environments such as High-Level Synthesis (HLS) and OpenCL available through the Xilinx SDSoC design tool. The mapping of the multi-level concurrency available within applications onto HPC systems with FPGAs is a challenge. OpenCL and HLS provide different mechanisms for exploiting concurrency within a node leading to a concurrency mapping design problem. In addition to considering the performance of different mappings, there are also questions of resource usage, programmability (development effort), ease-of-use and robustness. This paper examines the concurrency levels available in a case study kernel from a shallow water model and explores the programming options available in OpenCL and HLS. We conclude that the use of SDSoC Dataflow over functions ...
On June 30, 2020, the Workshop on Emerging Technologies for Weather and Climate Modelling<br&g... more On June 30, 2020, the Workshop on Emerging Technologies for Weather and Climate Modelling<br> was held as a virtual event within the framework of ESiWACE2, the Centre of Excellence in<br> Simulation of Weather and Climate in Europe.<br> The workshop organized by Giovanni Aloisio (CMCC), Graham Riley (UNIMAN), Carlos Osuna<br> (METEOSWISS) and Sandro Fiore (CMCC), was hosted by DKRZ with the local support by Dela<br> Spickermann and Florian Ziemen under the supervision of the ESiWACE2 Coordinator Joachim<br> Biercamp. The workshop was funded by the Horizon 2020 project ESiWACE2. Due to the situation<br> with COVID-19, the event was held as a virtual conference with approximately 143 participants<br> mainly from Europe and the US, but also from Brazil, India and Israel.<br> The workshop brought together scientists from the fields of earth system modeling, machine<br> learning, exascale hardware/computing, and programming mode...
Proceedings of the Computing Frontiers Conference, 2017
2019 International Conference on Robotics and Automation (ICRA), 2019
The Time Reversal Mirror (TRM) typically composes of an array of transducers. These transducers a... more The Time Reversal Mirror (TRM) typically composes of an array of transducers. These transducers are used for recording propagations reflected off from a source, which are then processed by and re-emitted from the transducers back to the source location known as the time reversed propagation. The TRM technique has been studied in acoustics, water, and electromagnetic waves for various applications; commonly in medical imaging [1], nondestructive testing [2], underwater target detection [3], and seismic sources localization [4]. Although TRM is a well studied method, it is difficult to achieve perfect time reversed propagations converging to the source with high resolution due to the limited spatial sampling available [5]. Additionally, in medical applications, the heterogeneous lossy characteristic of human tissue is responsible for attenuation and dispersion, reducing the potential of obtaining perfect spatial resolution and accuracy at the source [6].
Energy use is a key concern when deploying deep learning models on mobile and embedded platforms.... more Energy use is a key concern when deploying deep learning models on mobile and embedded platforms. Current studies develop energy predictive models based on application-level features to provide researchers a way to estimate the energy consumption of their deep learning models. This information is useful for building resource-aware models that can make efficient use of the hard-ware resources. However, previous works on predictive modelling provide little insight into the trade-offs involved in the choice of features on the final predictive model accuracy and model complexity. To address this issue, we provide a comprehensive analysis of building regression-based predictive models for deep learning on mobile devices, based on empirical measurements gathered from the SyNERGY framework.Our predictive modelling strategy is based on two types of predictive models used in the literature:individual layers and layer-type. Our analysis of predictive models show that simple layer-type feature...
Scientific Programming, 2019
In recent years, there has been renewed interest in the use of field-programmable gate arrays (FP... more In recent years, there has been renewed interest in the use of field-programmable gate arrays (FPGAs) for high-performance computing (HPC). In this paper, we explore the techniques required by traditional HPC programmers in porting HPC applications to FPGAs, using as an example the LFRic weather and climate model. We report on the first steps in porting LFRic to the FPGAs of the EuroExa architecture. We have used Vivado High-Level Syntheusywwi to implement a matrix-vector kernel from the LFRic code on a Xilinx UltraScale+ development board containing an XCZU9EG multiprocessor system-on-chip. We describe the porting of the code, discuss the optimization decisions, and report performance of 5.34 Gflop/s with double precision and 5.58 Gflop/s with single precision. We discuss sources of inefficiencies, comparisons with peak performance, comparisons with CPU and GPU performance (taking into account power and price), comparisons with published techniques, and comparisons with published p...
Journal of Parallel and Distributed Computing, 2019
Geoscientific Model Development, 2012
Geoscientific Model Development Discussions, 2017
We present an approach which we call PSyKAl that is designed to achieve portable performance for ... more We present an approach which we call PSyKAl that is designed to achieve portable performance for parallel, finite-difference Ocean models. In PSyKAl the code related to the underlying science is formally separated from code related to parallelisation and single-core optimisations. This separation of concerns allows scientists to code their science independently of the underlying hardware architecture and for optimisation specialists to be able to tailor the code for a particular machine independently of the science code. We have taken the free-surface part of the NEMO ocean model and created a new, shallow-water model named NEMOLite2D. In doing this we have a code which is of a manageable size and yet which incorporates elements of full ocean models (input/output, boundary conditions, <i>etc.</i>). We have then manually constructed a PSyKAl version of this code and investigated the transformations that must be applied to the middle/PSy layer in order to achieve good perf...
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the ACM International Conference on Computing Frontiers - CF '16, 2016
Computational Statistics Data Analysis, Oct 28, 1999
A portable parallel implementation of MLn, a multilevel modelling package, for shared memory para... more A portable parallel implementation of MLn, a multilevel modelling package, for shared memory parallel machines is described. Particular attention is paid to cross-classified and multiple membership models, which are more computationally demanding than those with simple hierarchical structure. Performance results are presented for a range of shared-memory parallel architectures, demonstrating a significant increase in the size of models which can
Concurrency and Computation Practice and Experience Eds J Gurd T Hey J Papay and G Riley 2005 17 2 4 95 98, 2005
To develop a good parallel implementation requires understanding of where run-time is spent and c... more To develop a good parallel implementation requires understanding of where run-time is spent and comparing this to some realistic best possible time.