NMPO: Near-Memory Computing Profiling and Offloading (original) (raw)

Platform Independent Software Analysis for Near Memory Computing

Euromicro Conference on Digital System Design (DSD), 2019

Near-memory Computing (NMC) promises improved performance for the applications that can exploit the features of emerging memory technologies such as 3D-stacked memory. However, it is not trivial to find such applications and specialized tools are needed to identify them. In this paper, we present PISA-NMC, which extends a state-of-the-art hardware agnostic profiling tool with metrics concerning memory and parallelism, which are relevant for NMC. The metrics include memory entropy, spatial locality, data-level, and basic-block-level parallelism. By profiling a set of representative applications and correlating the metrics with the application's performance on a simulated NMC system, we verify the importance of those metrics. Finally, we demonstrate which metrics are useful in identifying applications suitable for NMC architectures.

A Review of Near-Memory Computing Architectures: Opportunities and Challenges

The conventional approach of moving stored data to the CPU for computation has become a major performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in integration technologies have made the decade-old concept of coupling compute units close to the memory (called Near-Memory Computing) more viable. Processing right at the home of data can completely diminish the data movement problem of data-intensive applications. This paper focuses on analyzing and organizing the extensive body of literature on near-memory computing across various dimensions: starting from the memory level where this paradigm is applied, to the granularity of the application that could be executed on the near-memory units. We highlight the challenges as well as the critical need of evaluation methodologies that can be employed in designing these special architectures. Using a case study, we present our methodology and also identify topics for future research to unlock the full potential of near-memory computing.

Near-Memory Computing: Past, Present, and Future

Journal of Microprocessors and Microsystemss, 2019

The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D integration technologies has made the decade-old concept of coupling compute units close to the memory-called near-memory computing (NMC)-more viable. Processing right at the "home" of data can significantly diminish the data movement problem of data-intensive applications. In this paper, we survey the prior art on NMC across various dimensions (architecture, applications, tools, etc.) and identify the key challenges and open issues with future research directions. We also provide a glimpse of our approach to near-memory computing that includes i) NMC specific microarchitecture independent application characterization ii) a compiler framework to offload the NMC kernels on our target NMC platform and iii) an analytical model to evaluate the potential of NMC.

Memory and Parallelism Analysis Using a Platform-Independent Approach

22nd ACM International Workshop on Software and Compilers for Embedded Systems (SCOPES '19), 2019

Emerging computing architectures such as near-memory computing (NMC) promise improved performance for applications by reducing the data movement between CPU and memory. However, detecting such applications is not a trivial task. In this ongoing work, we extend the state-of-the-art platform-independent software analysis tool with NMC related metrics such as memory en-tropy, spatial locality, data-level, and basic-block-level parallelism. These metrics help to identify the applications more suitable for NMC architectures. CCS CONCEPTS • Software and its engineering → Dynamic analysis.

Predicting Runtime in HPC Environments for an Efficient Use of Computational Resources

Anais do XXII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD 2021)

Understanding the computational impact of scientific applications on computational architectures through runtime should guide the use of computational resources in high-performance computing systems. In this work, we propose an analysis of Machine Learning (ML) algorithms to gather knowledge about the performance of these applications through hardware events and derived performance metrics. Nine NAS benchmarks were executed and the hardware events were collected. These experimental results were used to train a Neural Network, a Decision Tree Regressor and a Linear Regression focusing on predicting the runtime of scientific applications according to the performance metrics.