Hoda Naghibijouybari - Profile on Academia.edu (original) (raw)

Papers by Hoda Naghibijouybari

Research paper thumbnail of Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU Systems

2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021

Graphics Processing Units (GPUs) are ubiquitous components used across the range of today's compu... more Graphics Processing Units (GPUs) are ubiquitous components used across the range of today's computing platforms, from phones and tablets, through personal computers, to high-end server class platforms. With the increasing importance of graphics and video workloads, recent processors are shipped with GPU devices that are integrated on the same chip. Integrated GPUs share some resources with the CPU and as a result, there is a potential for microarchitectural attacks from the GPU to the CPU or vice versa. We consider the potential for covert channel attacks that arise either from shared microarchitectural components (such as caches) or through shared contention domains (e.g., shared buses). We illustrate these two types of channels by developing two reliable covert channel attacks. The first covert channel uses the shared LLC cache in Intel's integrated GPU architectures. The second is a contention based channel targeting the ring bus connecting the CPU and GPU to the LLC. This is the first demonstrated microarchitectural attack crossing the component boundary (GPU to CPU or vice versa). Cross-component channels introduce a number of new challenges that we had to overcome since they occur across heterogeneous components that use different computation models and are interconnected using asymmetric memory hierarchies. We also exploit GPU parallelism to increase the bandwidth of the communication, even without relying on a common clock. The LLC based channel achieves a bandwidth of 120 kbps with a low error rate of 2%, while the contention based channel delivers up to 400 kbps with a 0.8% error rate. We also demonstrate a proofof-concept prime-and-probe side channel attack that probes the full LLC from the GPU.

Research paper thumbnail of Security of Graphics Processing Units (GPUs) in Heterogeneous Systems

Author(s): Naghibijouybari, Hoda | Advisor(s): Abu-Ghazaleh, Nael | Abstract: Modern computing pl... more Author(s): Naghibijouybari, Hoda | Advisor(s): Abu-Ghazaleh, Nael | Abstract: Modern computing platforms are becoming increasingly heterogeneous, combining a main processor with accelerators/co-processors to perform data-intensive computations. As the most common accelerator, Graphics Processing Units (GPUs) are widely integrated in all computing devices to enhance the performance of both graphics and computational workloads. GPUs as new components in heterogeneous systems introduce potential vulnerabilities and other security problems.This dissertation studies the security of modern GPUs in terms of micro-architectural covert and side channels attacks and defenses. In micro-architectural attacks, information leakage is measured through processes interactions through the shared hardware resources on a processor.The first contribution of my dissertation is a study of covert channel attacks on General Purpose GPUs (GPGPUs). I first reverse engineer the hardware scheduler to create co-...

Research paper thumbnail of Securing Machine Learning Architectures and Systems

Securing Machine Learning Architectures and Systems

Machine learning (ML), and deep learning in particular, have become a critical workload as they a... more Machine learning (ML), and deep learning in particular, have become a critical workload as they are becoming increasingly applied at the core of a wide range of application spaces. Computer systems, from the architecture up, have been impacted by ML in two primary directions: (1) ML is an increasingly important computing workload, with new accelerators and systems targeted to support both training and inference at scale; and (2) ML supporting computer system decisions, both during design and run times, with new machine learning based algorithms controlling systems to optimize their performance, reliability and robustness. In this paper, we will explore the intersection of security, ML and computing systems, identifying both security challenges and opportunities. Machine learning systems are vulnerable to new attacks including adversarial attacks crafted to fool a classifier to the attacker's advantage, membership inference attacks attempting to compromise the privacy of the trai...

Research paper thumbnail of GPUGuard

Proceedings of the ACM International Conference on Supercomputing, 2019

Graphics processing units (GPUs) are moving towards supporting concurrent kernel execution where ... more Graphics processing units (GPUs) are moving towards supporting concurrent kernel execution where multiple kernels may be co-executed on the same GPU and even on the same streaming multiprocessor (SM) core. While concurrent kernel execution improves hardware resource utilization, it opens up vulnerabilities to covert-channel and side-channel attacks. These attacks exploit information leakage across kernels that results from contention on shared resources; they have been shown to be a dangerous threat on CPUs, and are starting to be demonstrated on GPUs. The unique micro-architectural features of GPUs, such as specialized cache structures and massive parallel thread support, create opportunities for GPU-speciic channels to be formed. In this paper, we propose GPUGuard, a decision tree based detection and a hierarchical defense framework that can reliably close the covert channels. Our results show that GPUGuard can detect contention with 100% sensitivity and a small (8.5%) false positive rate. The timing channels are mitigated through Tangram, a GPU-speciic contention channel elimination scheme, with only 8% to 23% overhead when there is an attack and zero performance overhead when no attacks are detected. Compared to temporal partitioning, GPUGuard is 69%-96% faster in various architectures even when active, showing that it is possible to gain substantial performance from executing concurrent kernels on a single SM while securing GPUs against these attacks. • Computer systems organization → Single instruction, multiple data; • Security and privacy → Artiicial immune systems; Side-channel analysis and countermeasures.

Research paper thumbnail of Rendered Insecure

Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018

Graphics Processing Units (GPUs) are commonly integrated with computing devices to enhance the pe... more Graphics Processing Units (GPUs) are commonly integrated with computing devices to enhance the performance and capabilities of graphical workloads. In addition, they are increasingly being integrated in data centers and clouds such that they can be used to accelerate data intensive workloads. Under a number of scenarios the GPU can be shared between multiple applications at a ne granularity allowing a spy application to monitor side channels and attempt to infer the behavior of the victim. For example, OpenGL and WebGL send workloads to the GPU at the granularity of a frame, allowing an attacker to interleave the use of the GPU to measure the side-e ects of the victim computation through performance counters or other resource tracking APIs. We demonstrate the vulnerability using two applications. First, we show that an OpenGL based spy can ngerprint websites accurately, track user activities within the website, and even infer the keystroke timings for a password text box with high accuracy. The second application demonstrates how a CUDA spy application can derive the internal parameters of a neural network model being used by another CUDA application, illustrating these threats on the cloud. To counter these attacks, the paper suggests mitigations based on limiting the rate of the calls, or limiting the granularity of the returned information. • Security and privacy → Side-channel analysis and countermeasures;

Research paper thumbnail of Constructing and characterizing covert channels on GPGPUs

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

General Purpose Graphics Processing Units (GPGPUs) are present in most modern computing platforms... more General Purpose Graphics Processing Units (GPGPUs) are present in most modern computing platforms. They are also increasingly integrated as a computational resource on clusters, data centers, and cloud infrastructure, making them possible targets for attacks. We present a first study of covert channel attacks on GPGPUs. GPGPU attacks offer a number of attractive properties relative to CPU covert channels. These channels also have characteristics different from their counterparts on CPUs. To enable the attack, we first reverse engineer the hardware block scheduler as well as the warp to warp scheduler to characterize how co-location is established. We exploit this information to manipulate the scheduling algorithms to create co-residency between the trojan and the spy. We study contention on different resources including caches, functional units and memory, and construct operational covert channels on all these resources. We also investigate approaches to increase the bandwidth of the channel including: (1) using synchronization to reduce the communication cycle and increase robustness of the channel; (2) exploiting the available parallelism on the GPU to increase the bandwidth; and (3) exploiting the scheduling algorithms to create exclusive co-location to prevent interference from other possible applications. We demonstrate operational versions of all channels on three different Nvidia GPGPUs, obtaining error-free bandwidth of over 4 Mbps, making it the fastest known microarchitectural covert channel under realistic conditions.

Research paper thumbnail of Side Channel Attacks on GPUs

IEEE Transactions on Dependable and Secure Computing, 2019

Graphics Processing Units (GPUs) are commonly integrated with computing devices to enhance the pe... more Graphics Processing Units (GPUs) are commonly integrated with computing devices to enhance the performance and capabilities of graphical workloads. In addition, they are increasingly being integrated in data centers and clouds such that they can be used to accelerate data intensive workloads. Under a number of scenarios the GPU can be shared between multiple applications at a fine granularity allowing a spy application to monitor side channels and attempt to infer the behavior of the victim. For example, OpenGL and WebGL send workloads to the GPU at the granularity of a frame, allowing an attacker to interleave the use of the GPU to measure the side-effects of the victim computation through performance counters or other resource tracking APIs. We demonstrate the vulnerability by implementing three end-to-end attacks. We show that an OpenGL or CUDA based spy can fingerprint websites accurately (attack I), track user activities within the website, and even infer the keystroke timings for a password text box (attack II) with high accuracy. The third attack demonstrates how a CUDA spy application can derive the internal parameters of a neural network model being used by another CUDA application on the cloud. To counter these attacks, the paper suggests mitigations based on limiting the rate of the calls, or limiting the granularity of the returned information.

Research paper thumbnail of Covert Channels on GPGPUs

IEEE Computer Architecture Letters, 2017

GPUs are increasingly used to accelerate the performance of not only graphics workloads, but also... more GPUs are increasingly used to accelerate the performance of not only graphics workloads, but also data intensive applications. In this paper, we explore the feasibility of covert channels in General Purpose Graphics Processing Units (GPGPUs). We consider the possibility of two colluding malicious applications using the GPGPU as a covert channel to communicate, in the absence of a direct channel between them. Such a situation may arise in cloud environments, or in environments employing containment mechanisms such as dynamic information flow tracking. We reverse engineer the block placement algorithm to understand co-residency of blocks from different applications on the same Streaming Multiprocessor (SM) core, or on different SMs concurrently. In either mode, we identify the shared resources that may be used to create contention. We demonstrate the bandwidth of two example channels: one that uses the L1 constant memory cache to enable communication on the same SM, and another that uses the L2 constant memory caches to enable communication between different SMs. We also examine the possibility of increasing the bandwidth of the channel by using the available parallelism on the GPU, achieving a bandwidth of over 400Kbps. This study demonstrates that GPGPUs are a feasible medium for covert communication.

Research paper thumbnail of FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links

International Journal of Computer Applications, 2015

Three-dimensional Network-On-Chips (3D NOC) are the most efficient communication structures for c... more Three-dimensional Network-On-Chips (3D NOC) are the most efficient communication structures for complex multi-processor System-On-Chips (SOC). Such structures utilize short vertical interconnects in 3D ICs together with scalability of NOC to improve performance of communications in SOCs. By scaling trends in 3D integration, probability of fault occurrence increases that leads to low yield of links, especially TSV-based vertical links in 3D NOCs. In this paper, FT-Z-OE (Fault Tolerant Z Odd-Even) routing, a distributed routing to tolerate permanent faults on vertical links of 3D NOCs is proposed. FT-Z-OE is designed to have low overhead because of no need to any routing table or global information of faults in the network. The proposed routing is evaluated using a cycle-accurate network simulator and compared to planar-adaptive routing for a 3D mesh-based network. It is shown that FT-Z-OE significantly outperforms planar-adaptive in the terms of latency and throughput under synthetic traffic patterns.

Research paper thumbnail of Beyond the CPU: Side Channel Attacks on GPUs

Beyond the CPU: Side Channel Attacks on GPUs

IEEE Design & Test, 2021

Editor’s notes: This article demonstrates that architectural side-channel attacks are possible al... more Editor’s notes: This article demonstrates that architectural side-channel attacks are possible also on applications that use the GPUs. —Rosario Cammarota, Intel Labs —Francesco Regazzoni, University of Amsterdam and Università della Svizzera Italiana

Research paper thumbnail of Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU Systems

2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021

Graphics Processing Units (GPUs) are ubiquitous components used across the range of today's compu... more Graphics Processing Units (GPUs) are ubiquitous components used across the range of today's computing platforms, from phones and tablets, through personal computers, to high-end server class platforms. With the increasing importance of graphics and video workloads, recent processors are shipped with GPU devices that are integrated on the same chip. Integrated GPUs share some resources with the CPU and as a result, there is a potential for microarchitectural attacks from the GPU to the CPU or vice versa. We consider the potential for covert channel attacks that arise either from shared microarchitectural components (such as caches) or through shared contention domains (e.g., shared buses). We illustrate these two types of channels by developing two reliable covert channel attacks. The first covert channel uses the shared LLC cache in Intel's integrated GPU architectures. The second is a contention based channel targeting the ring bus connecting the CPU and GPU to the LLC. This is the first demonstrated microarchitectural attack crossing the component boundary (GPU to CPU or vice versa). Cross-component channels introduce a number of new challenges that we had to overcome since they occur across heterogeneous components that use different computation models and are interconnected using asymmetric memory hierarchies. We also exploit GPU parallelism to increase the bandwidth of the communication, even without relying on a common clock. The LLC based channel achieves a bandwidth of 120 kbps with a low error rate of 2%, while the contention based channel delivers up to 400 kbps with a 0.8% error rate. We also demonstrate a proofof-concept prime-and-probe side channel attack that probes the full LLC from the GPU.

Research paper thumbnail of Security of Graphics Processing Units (GPUs) in Heterogeneous Systems

Author(s): Naghibijouybari, Hoda | Advisor(s): Abu-Ghazaleh, Nael | Abstract: Modern computing pl... more Author(s): Naghibijouybari, Hoda | Advisor(s): Abu-Ghazaleh, Nael | Abstract: Modern computing platforms are becoming increasingly heterogeneous, combining a main processor with accelerators/co-processors to perform data-intensive computations. As the most common accelerator, Graphics Processing Units (GPUs) are widely integrated in all computing devices to enhance the performance of both graphics and computational workloads. GPUs as new components in heterogeneous systems introduce potential vulnerabilities and other security problems.This dissertation studies the security of modern GPUs in terms of micro-architectural covert and side channels attacks and defenses. In micro-architectural attacks, information leakage is measured through processes interactions through the shared hardware resources on a processor.The first contribution of my dissertation is a study of covert channel attacks on General Purpose GPUs (GPGPUs). I first reverse engineer the hardware scheduler to create co-...

Research paper thumbnail of Securing Machine Learning Architectures and Systems

Securing Machine Learning Architectures and Systems

Machine learning (ML), and deep learning in particular, have become a critical workload as they a... more Machine learning (ML), and deep learning in particular, have become a critical workload as they are becoming increasingly applied at the core of a wide range of application spaces. Computer systems, from the architecture up, have been impacted by ML in two primary directions: (1) ML is an increasingly important computing workload, with new accelerators and systems targeted to support both training and inference at scale; and (2) ML supporting computer system decisions, both during design and run times, with new machine learning based algorithms controlling systems to optimize their performance, reliability and robustness. In this paper, we will explore the intersection of security, ML and computing systems, identifying both security challenges and opportunities. Machine learning systems are vulnerable to new attacks including adversarial attacks crafted to fool a classifier to the attacker's advantage, membership inference attacks attempting to compromise the privacy of the trai...

Research paper thumbnail of GPUGuard

Proceedings of the ACM International Conference on Supercomputing, 2019

Graphics processing units (GPUs) are moving towards supporting concurrent kernel execution where ... more Graphics processing units (GPUs) are moving towards supporting concurrent kernel execution where multiple kernels may be co-executed on the same GPU and even on the same streaming multiprocessor (SM) core. While concurrent kernel execution improves hardware resource utilization, it opens up vulnerabilities to covert-channel and side-channel attacks. These attacks exploit information leakage across kernels that results from contention on shared resources; they have been shown to be a dangerous threat on CPUs, and are starting to be demonstrated on GPUs. The unique micro-architectural features of GPUs, such as specialized cache structures and massive parallel thread support, create opportunities for GPU-speciic channels to be formed. In this paper, we propose GPUGuard, a decision tree based detection and a hierarchical defense framework that can reliably close the covert channels. Our results show that GPUGuard can detect contention with 100% sensitivity and a small (8.5%) false positive rate. The timing channels are mitigated through Tangram, a GPU-speciic contention channel elimination scheme, with only 8% to 23% overhead when there is an attack and zero performance overhead when no attacks are detected. Compared to temporal partitioning, GPUGuard is 69%-96% faster in various architectures even when active, showing that it is possible to gain substantial performance from executing concurrent kernels on a single SM while securing GPUs against these attacks. • Computer systems organization → Single instruction, multiple data; • Security and privacy → Artiicial immune systems; Side-channel analysis and countermeasures.

Research paper thumbnail of Rendered Insecure

Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018

Graphics Processing Units (GPUs) are commonly integrated with computing devices to enhance the pe... more Graphics Processing Units (GPUs) are commonly integrated with computing devices to enhance the performance and capabilities of graphical workloads. In addition, they are increasingly being integrated in data centers and clouds such that they can be used to accelerate data intensive workloads. Under a number of scenarios the GPU can be shared between multiple applications at a ne granularity allowing a spy application to monitor side channels and attempt to infer the behavior of the victim. For example, OpenGL and WebGL send workloads to the GPU at the granularity of a frame, allowing an attacker to interleave the use of the GPU to measure the side-e ects of the victim computation through performance counters or other resource tracking APIs. We demonstrate the vulnerability using two applications. First, we show that an OpenGL based spy can ngerprint websites accurately, track user activities within the website, and even infer the keystroke timings for a password text box with high accuracy. The second application demonstrates how a CUDA spy application can derive the internal parameters of a neural network model being used by another CUDA application, illustrating these threats on the cloud. To counter these attacks, the paper suggests mitigations based on limiting the rate of the calls, or limiting the granularity of the returned information. • Security and privacy → Side-channel analysis and countermeasures;

Research paper thumbnail of Constructing and characterizing covert channels on GPGPUs

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

General Purpose Graphics Processing Units (GPGPUs) are present in most modern computing platforms... more General Purpose Graphics Processing Units (GPGPUs) are present in most modern computing platforms. They are also increasingly integrated as a computational resource on clusters, data centers, and cloud infrastructure, making them possible targets for attacks. We present a first study of covert channel attacks on GPGPUs. GPGPU attacks offer a number of attractive properties relative to CPU covert channels. These channels also have characteristics different from their counterparts on CPUs. To enable the attack, we first reverse engineer the hardware block scheduler as well as the warp to warp scheduler to characterize how co-location is established. We exploit this information to manipulate the scheduling algorithms to create co-residency between the trojan and the spy. We study contention on different resources including caches, functional units and memory, and construct operational covert channels on all these resources. We also investigate approaches to increase the bandwidth of the channel including: (1) using synchronization to reduce the communication cycle and increase robustness of the channel; (2) exploiting the available parallelism on the GPU to increase the bandwidth; and (3) exploiting the scheduling algorithms to create exclusive co-location to prevent interference from other possible applications. We demonstrate operational versions of all channels on three different Nvidia GPGPUs, obtaining error-free bandwidth of over 4 Mbps, making it the fastest known microarchitectural covert channel under realistic conditions.

Research paper thumbnail of Side Channel Attacks on GPUs

IEEE Transactions on Dependable and Secure Computing, 2019

Graphics Processing Units (GPUs) are commonly integrated with computing devices to enhance the pe... more Graphics Processing Units (GPUs) are commonly integrated with computing devices to enhance the performance and capabilities of graphical workloads. In addition, they are increasingly being integrated in data centers and clouds such that they can be used to accelerate data intensive workloads. Under a number of scenarios the GPU can be shared between multiple applications at a fine granularity allowing a spy application to monitor side channels and attempt to infer the behavior of the victim. For example, OpenGL and WebGL send workloads to the GPU at the granularity of a frame, allowing an attacker to interleave the use of the GPU to measure the side-effects of the victim computation through performance counters or other resource tracking APIs. We demonstrate the vulnerability by implementing three end-to-end attacks. We show that an OpenGL or CUDA based spy can fingerprint websites accurately (attack I), track user activities within the website, and even infer the keystroke timings for a password text box (attack II) with high accuracy. The third attack demonstrates how a CUDA spy application can derive the internal parameters of a neural network model being used by another CUDA application on the cloud. To counter these attacks, the paper suggests mitigations based on limiting the rate of the calls, or limiting the granularity of the returned information.

Research paper thumbnail of Covert Channels on GPGPUs

IEEE Computer Architecture Letters, 2017

GPUs are increasingly used to accelerate the performance of not only graphics workloads, but also... more GPUs are increasingly used to accelerate the performance of not only graphics workloads, but also data intensive applications. In this paper, we explore the feasibility of covert channels in General Purpose Graphics Processing Units (GPGPUs). We consider the possibility of two colluding malicious applications using the GPGPU as a covert channel to communicate, in the absence of a direct channel between them. Such a situation may arise in cloud environments, or in environments employing containment mechanisms such as dynamic information flow tracking. We reverse engineer the block placement algorithm to understand co-residency of blocks from different applications on the same Streaming Multiprocessor (SM) core, or on different SMs concurrently. In either mode, we identify the shared resources that may be used to create contention. We demonstrate the bandwidth of two example channels: one that uses the L1 constant memory cache to enable communication on the same SM, and another that uses the L2 constant memory caches to enable communication between different SMs. We also examine the possibility of increasing the bandwidth of the channel by using the available parallelism on the GPU, achieving a bandwidth of over 400Kbps. This study demonstrates that GPGPUs are a feasible medium for covert communication.

Research paper thumbnail of FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links

International Journal of Computer Applications, 2015

Three-dimensional Network-On-Chips (3D NOC) are the most efficient communication structures for c... more Three-dimensional Network-On-Chips (3D NOC) are the most efficient communication structures for complex multi-processor System-On-Chips (SOC). Such structures utilize short vertical interconnects in 3D ICs together with scalability of NOC to improve performance of communications in SOCs. By scaling trends in 3D integration, probability of fault occurrence increases that leads to low yield of links, especially TSV-based vertical links in 3D NOCs. In this paper, FT-Z-OE (Fault Tolerant Z Odd-Even) routing, a distributed routing to tolerate permanent faults on vertical links of 3D NOCs is proposed. FT-Z-OE is designed to have low overhead because of no need to any routing table or global information of faults in the network. The proposed routing is evaluated using a cycle-accurate network simulator and compared to planar-adaptive routing for a 3D mesh-based network. It is shown that FT-Z-OE significantly outperforms planar-adaptive in the terms of latency and throughput under synthetic traffic patterns.

Research paper thumbnail of Beyond the CPU: Side Channel Attacks on GPUs

Beyond the CPU: Side Channel Attacks on GPUs

IEEE Design & Test, 2021

Editor’s notes: This article demonstrates that architectural side-channel attacks are possible al... more Editor’s notes: This article demonstrates that architectural side-channel attacks are possible also on applications that use the GPUs. —Rosario Cammarota, Intel Labs —Francesco Regazzoni, University of Amsterdam and Università della Svizzera Italiana