Sahil Suneja - Academia.edu (original) (raw)

Papers by Sahil Suneja

Research paper thumbnail of VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

Automatically locating vulnerable statements in source code is crucial to assure software securit... more Automatically locating vulnerable statements in source code is crucial to assure software security and alleviate developers' debugging efforts. This becomes even more important in today's software ecosystem, where vulnerable code can flow easily and unwittingly within and across software repositories like GitHub. Across such millions of lines of code, traditional static and dynamic approaches struggle to scale. Although existing machine-learning-based approaches look promising in such a setting, most work detects vulnerable code at a higher granularity-at the method or file level. Thus, developers still need to inspect a significant amount of code to locate the vulnerable statement(s) that need to be fixed. This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph and effectively understand code semantics and vulnerable patterns. To study VELVET's effectiveness, we use an off-the-shelf synthetic dataset and a recently published real-world dataset. In the static analysis setting, where vulnerable functions are not detected in advance, VELVET achieves 4.5× better performance than the baseline static analyzers on the real-world data. For the isolated vulnerability localization task, where we assume the vulnerability of a function is known while the specific vulnerable statement is unknown, we compare VELVET with several neural networks that also attend to local and global context of code. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively, outperforming the baseline deep learning models by 5.3-29.0%.

Research paper thumbnail of ConfAdvisor: A Performance-centric Configuration Tuning Framework for Containers on Kubernetes

2019 IEEE International Conference on Cloud Engineering (IC2E)

Research paper thumbnail of Opvis

Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference on Industrial Track - Middleware '17

Operational visibility is an important administrative capability and a critical factor in decidin... more Operational visibility is an important administrative capability and a critical factor in deciding the success or failure of a cloud service. It is becoming increasingly complex along many dimensions including the need to track both persistent and volatile system state across heterogeneous endpoints, as well as provide higher level services such as log analytics, software discovery, anomaly detection, and drift analysis. In this paper we present OpVis, our unified monitoring and analytics framework to provide operational visibility, which overcomes the limitations of traditional monitoring solutions and provides a uniform platform as opposed to requiring the configuration, installation, and maintenance of multiple isolated solutions. We highlight our framework's extensibility model, enabling custom data collection and analytics based on the cloud user's requirements, describe its monitoring and analytics capabilities, present performance measurements, and discuss our experiences while supporting operational visibility in our cloud.

Research paper thumbnail of Accelerating the cloud with heterogeneous computing

ieee international conference on cloud computing technology and science, Jun 14, 2011

Heterogeneous multiprocessors that combine multiple CPUs and GPUs on a single die are posed to be... more Heterogeneous multiprocessors that combine multiple CPUs and GPUs on a single die are posed to become commonplace in the market. As seen recently from the high performance computing community, leveraging a GPU can yield performance increases of several orders of magnitude. We propose using GPU acceleration to greatly speed up cloud management tasks in VMMs. This is only becoming possible now that the GPU is moving on-chip, since the latency across the PCIe bus was too great to make fast, informed decisions about the state of a system at any given point. We explore various examples of cloud management tasks that can greatly benefit from GPU acceleration. We also tackle tough questions of how to manage this hardware in a multi-tenant system. Finally, we present a case study that explores a common cloud operation, memory deduplication, and show that GPU acceleration can improve the performance of its hashing component by a factor of over 80.

Research paper thumbnail of EnVi: Energy Efï¬cient Video Player for Mobiles

CellNet 2013 workshop, Jun 25, 2013

Research paper thumbnail of Data-Driven AI Model Signal-Awareness Enhancement and Introspection

AI modeling for source code understanding tasks has been making significant progress, and is bein... more AI modeling for source code understanding tasks has been making significant progress, and is being adopted in production development pipelines. However, reliability concerns, especially whether the models are actually learning task-related aspects of source code, are being raised. While recent model-probing approaches have observed a lack of signal awareness in many AI-for-code models, i.e. models not capturing task-relevant signals, they do not offer solutions to rectify this problem. In this paper, we explore data-driven approaches to enhance models’ signal-awareness: 1) we combine the SE concept of code complexity with the AI technique of curriculum learning; 2) we incorporate SE assistance into AI models by customizing Delta Debugging to generate simplified signal-preserving programs, augmenting them to the training dataset. With our techniques, we achieve up to 4.8x improvement in model signal awareness. Using the notion of code complexity, we further present a novel model lear...

Research paper thumbnail of Towards Non-Intrusive Software Introspection and Beyond

2020 IEEE International Conference on Cloud Engineering (IC2E), 2020

Continuous verification and security analysis of software systems are of paramount importance to ... more Continuous verification and security analysis of software systems are of paramount importance to many organizations. The state-of-the-art for such operations implements agent-based approaches to inspect the provisioned software stack for security and compliance issues. However, this approach, which runs agents on the systems being analyzed, is vulnerable to some attacks, can incur substantial performance impact, and can introduce significant complexity. In this paper, we present the design and prototype implementation of a general-purpose approach for Non-intrusive Software Introspection (NSI). By adhering to NSI, organizations hosting in the cloud can as well control the software introspection workflow with reduced trust in the provider. Experimental analysis of real-world applications demonstrates that NSI presents a lightweight and scalable approach, and has a negligible impact on the performance of applications running on the instance being introspected.

Research paper thumbnail of Unified Monitoring and Analytics in the Cloud

Modern cloud applications are distributed across a wide range of instances of multiple types, inc... more Modern cloud applications are distributed across a wide range of instances of multiple types, including virtual machines, containers, and baremetal servers. Traditional approaches to monitoring and analytics fail in these complex, distributed and diverse environments. They are too intrusive and heavy-handed for short-lived, lightweight cloud instances, and cannot keep up with rapid the pace of change in the cloud with continuous dynamic scheduling, provisioning and auto-scaling. We introduce a unified monitoring and analytics architecture designed for the cloud. Our approach leverages virtualization and containerization to decouple monitoring from instance execution and health. Moreover, it provides a uniform view of systems regardless of instance type, and operates without intervening with the end-user context. We describe an implementation of our approach in an actual deployment, and discuss our experiences and observed results.

Research paper thumbnail of Can Container Fusion Be Securely Achieved?

Proceedings of the 5th International Workshop on Container Technologies and Container Clouds, 2019

Linux containers are key enablers for building microservices. The application's microservices... more Linux containers are key enablers for building microservices. The application's microservices fall broadly under two categories, the core-microservices implementing the business logic and the utility-microservices implementing middleware functionalities. Such functionalities include vulnerability scanning, monitoring, telemetry, etc. Segregating the utility-microservices in separate containers from the core-microservice containers may prevent them from achieving their functionality. This is due to the strong isolation between containers. By diffusing the boundaries between containers we can fuse them together and enable close collaboration. However, this raises several security concerns, especially that the utility-microservices may include vulnerabilities that threaten the entire application. In this paper, we analyze the different techniques to enhance the security of container fusion and present an automated solution based on Kubernetes to configure utility-microservices cont...

Research paper thumbnail of Learning to map source code to software vulnerability using code-as-a-graph

ArXiv, 2020

We explore the applicability of Graph Neural Networks in learning the nuances of source code from... more We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. Specifically, whether signatures of vulnerabilities in source code can be learned from its graph representation, in terms of relationships between nodes and edges. We create a pipeline we call AI4VA, which first encodes a sample source code into a Code Property Graph. The extracted graph is then vectorized in a manner which preserves its semantic information. A Gated Graph Neural Network is then trained using several such graphs to automatically extract templates differentiating the graph of a vulnerable sample from a healthy one. Our model outperforms static analyzers, classic machine learning, as well as CNN and RNN-based deep learning models on two of the three datasets we experiment with. We thus show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches. (Submitted Oct...

Research paper thumbnail of RECap: Run-Escape Capsule for On-demand Managed Service Delivery in the Cloud

Application runtimes are undergoing a fundamental transformation in the cloud, from general-purpo... more Application runtimes are undergoing a fundamental transformation in the cloud, from general-purpose operating systems (OSes) in virtual machines (VMs) to lightweight, minimal OSes in microcontainers. On one hand, such transformation is helping reduce application footprint in the cloud to increase agility, density and to minimize attack surface. On the other hand it makes it challenging to implement system and application management tasks. Inspired from the on-demand Function as a Service (FaaS) model in serverless computing, in RECap we are designing a cloud-native solution to deliver systems and application management tasks through specially-managed Capsule containers. Capsule containers are dynamically attached to the running containers for the duration of their implemented function and are safely removed from application context afterwards. More generally, RECap framework allows us to design disaggregated on-demand managed service delivery for containers in the cloud. In this pap...

Research paper thumbnail of Secure Extensibility for System State Extraction via Plugin Sandboxing

ArXiv, 2019

We introduce a new mechanism to securely extend systems data collection software with potentially... more We introduce a new mechanism to securely extend systems data collection software with potentially untrusted third-party code. Unlike existing tools which run extension modules or plugins directly inside the monitored endpoint (the guest), we run plugins inside a specially crafted sandbox, so as to protect the guest as well as the software core. To get the right mix of accessibility and constraints required for systems data extraction, we create our sandbox by combining multiple features exported by an unmodified kernel. We have tested its applicability by successfully sandboxing plugins of an opensourced data collection software for containerized guest systems. We have also verified its security posture in terms of successful containment of several exploits, which would have otherwise directly impacted a guest, if shipped inside third-party plugins.

Research paper thumbnail of Cryptomining Detection in Container Clouds Using System Calls and Explainable Machine Learning

IEEE Transactions on Parallel and Distributed Systems, 2021

The use of containers in cloud computing has been steadily increasing. With the emergence of Kube... more The use of containers in cloud computing has been steadily increasing. With the emergence of Kubernetes, the management of applications inside containers (or pods) is simplified. Kubernetes allows automated actions like self-healing, scaling, rolling back, and updates for the application management. At the same time, security threats have also evolved with attacks on pods to perform malicious actions. Out of several recent malware types, cryptomining has emerged as one of the most serious threats with its hijacking of server resources for cryptocurrency mining. During application deployment and execution in the pod, a cryptomining process, started by a hidden malware executable can be run in the background, and a method to detect malicious cryptomining software running inside Kubernetes pods is needed. One feasible strategy is to use machine learning (ML) to identify and classify pods based on whether or not they contain a running process of cryptomining. In addition to such detection, the system administrator will need an explanation as to the reason(s) of the ML's classification outcome. The explanation will justify and support disruptive administrative decisions such as pod removal or its restart with a new image. In this article, we describe the design and implementation of an ML-based detection system of anomalous pods in a Kubernetes cluster by monitoring Linux-kernel system calls (syscalls). Several types of cryptominers images are used as containers within an anomalous pod, and several ML models are built to detect such pods in the presence of numerous healthy cloud workloads. Explainability is provided using SHAP, LIME, and a novel auto-encoding-based scheme for LSTM models. Seven evaluation metrics are used to compare and contrast the explainable models of the proposed ML cryptomining detection engine.

Research paper thumbnail of Paracloud: Bringing Application Insight into Cloud Operations

Applications have commonly been oblivious to their cloud runtimes. This is primarily because they... more Applications have commonly been oblivious to their cloud runtimes. This is primarily because they started their journey in IaaS clouds, running on a guestOS inside VMs. Then to increase performance, many guestOSes have been paravirtualized making them virtualization aware, so that they can bypass some of the virtualization layers, as in virtio. This approach still kept applications unmodified. Recently, we are witnessing a rapid adoption of containers due to their packaging benefits, high density, fast start-up and low overhead. Applications are increasingly being on-boarded to PaaS clouds in the form of application containers or appc, where they are run directly on a cloud substrate like Kubernetes or Docker Swarm. This shift in deployment practices present an opportunity to make applications aware of their cloud. In this paper, we present Paracloud framework for application containers and discuss the Paracloud interface (PaCI) for three cloud operations namely migration, auto-scal...

Research paper thumbnail of Non-intrusive Virtual Systems Monitoring

Research paper thumbnail of Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation

Identifying vulnerable code is a precautionary measure to counter software security breaches. Ted... more Identifying vulnerable code is a precautionary measure to counter software security breaches. Tedious expert effort has been spent to build static analyzers, yet insecure patterns are barely fully enumerated. This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program, in order to improve prediction performance. Compared with a generic GNN, our enhancements include a synthesis of multiple representations learned from the several parsed graphs of a program, and a new training loss metric that leverages the fine granularity of labeling. Our model outperforms multiple text, image and graph-based approaches, across two real-world datasets.

Research paper thumbnail of Towards Reliable AI for Source Code Understanding

Proceedings of the ACM Symposium on Cloud Computing, 2021

Cloud maturity and popularity have resulted in Open source software (OSS) proliferation. And, in ... more Cloud maturity and popularity have resulted in Open source software (OSS) proliferation. And, in turn, managing OSS code quality has become critical in ensuring sustainable Cloud growth. On this front, AI modeling has gained popularity in source code understanding tasks, promoted by the ready availability of large open codebases. However, we have been observing certain peculiarities with these black-boxes, motivating a call for their reliability to be verified before offsetting traditional code analysis. In this work, we highlight and organize different reliability issues affecting AI-for-code into three stages of an AI pipeline- data collection, model training, and prediction analysis. We highlight the need for concerted efforts from the research community to ensure credibility, accountability, and traceability for AI-for-code. For each stage, we discuss unique opportunities afforded by the source code and software engineering setting to improve AI reliability.

Research paper thumbnail of Probing model signal-awareness via prediction-preserving input minimization

Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021

This work explores the signal awareness of AI models for source code understanding. Using a softw... more This work explores the signal awareness of AI models for source code understanding. Using a software vulnerability detection use case, we evaluate the models' ability to capture the correct vulnerability signals to produce their predictions. Our predictionpreserving input minimization (P2IM) approach systematically reduces the original source code to a minimal snippet which a model needs to maintain its prediction. The model's reliance on incorrect signals is then uncovered when the vulnerability in the original code is missing in the minimal snippet, both of which the model however predicts as being vulnerable. We measure the signal awareness of models using a new metric we propose-Signal-aware Recall (SAR). We apply P2IM on three different neural network architectures across multiple datasets. The results show a sharp drop in the model's Recall from the high 90s to sub-60s with the new metric, highlighting that the models are presumably picking up a lot of noise or dataset nuances while learning their vulnerability detection logic. Although the drop in model performance may be perceived as an adversarial attack, but this isn't P2IM's objective. The idea is rather to uncover the signal-awareness of a black-box model in a data-driven manner via controlled queries. SAR's purpose is to measure the impact of task-agnostic model training, and not to suggest a shortcoming in the Recall metric. The expectation, in fact, is for SAR to match Recall in the ideal scenario where the model truly captures task-specific signals.

Research paper thumbnail of Usable declarative configuration specification and validation for applications, systems, and cloud

Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference on Industrial Track - Middleware '17, 2017

Diagnosing misconfiguration across modern software stacks is increasingly difficult. These stacks... more Diagnosing misconfiguration across modern software stacks is increasingly difficult. These stacks comprise multiple microservices which are deployed across a combination of containers and hosts (VMs, physical machines) in a cloud or a data center. The existing approaches for detecting misconfiguration, whether rule-based or inference, are highly specialized (e.g., security only), cumbersome to write and maintain, geared towards a host (instead of container images), and can result into false-positives or false-negatives. This paper introduces configuration validation language (CVL), a declarative language for writing rules to detect misconfigurations that can, for instance, impact security, performance, functionality. We have built a system, ConfigValidator, which applies the CVL rules across a multitude of environments such as Docker images, running containers, host, and cloud. The system is running in production and has scanned thousands of Docker images and running containers for identifying misconfigurations.

Research paper thumbnail of Security Analysis of Container Images Using Cloud Analytics Framework

Web Services – ICWS 2018, 2018

Research paper thumbnail of VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

Automatically locating vulnerable statements in source code is crucial to assure software securit... more Automatically locating vulnerable statements in source code is crucial to assure software security and alleviate developers' debugging efforts. This becomes even more important in today's software ecosystem, where vulnerable code can flow easily and unwittingly within and across software repositories like GitHub. Across such millions of lines of code, traditional static and dynamic approaches struggle to scale. Although existing machine-learning-based approaches look promising in such a setting, most work detects vulnerable code at a higher granularity-at the method or file level. Thus, developers still need to inspect a significant amount of code to locate the vulnerable statement(s) that need to be fixed. This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph and effectively understand code semantics and vulnerable patterns. To study VELVET's effectiveness, we use an off-the-shelf synthetic dataset and a recently published real-world dataset. In the static analysis setting, where vulnerable functions are not detected in advance, VELVET achieves 4.5× better performance than the baseline static analyzers on the real-world data. For the isolated vulnerability localization task, where we assume the vulnerability of a function is known while the specific vulnerable statement is unknown, we compare VELVET with several neural networks that also attend to local and global context of code. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively, outperforming the baseline deep learning models by 5.3-29.0%.

Research paper thumbnail of ConfAdvisor: A Performance-centric Configuration Tuning Framework for Containers on Kubernetes

2019 IEEE International Conference on Cloud Engineering (IC2E)

Research paper thumbnail of Opvis

Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference on Industrial Track - Middleware '17

Operational visibility is an important administrative capability and a critical factor in decidin... more Operational visibility is an important administrative capability and a critical factor in deciding the success or failure of a cloud service. It is becoming increasingly complex along many dimensions including the need to track both persistent and volatile system state across heterogeneous endpoints, as well as provide higher level services such as log analytics, software discovery, anomaly detection, and drift analysis. In this paper we present OpVis, our unified monitoring and analytics framework to provide operational visibility, which overcomes the limitations of traditional monitoring solutions and provides a uniform platform as opposed to requiring the configuration, installation, and maintenance of multiple isolated solutions. We highlight our framework's extensibility model, enabling custom data collection and analytics based on the cloud user's requirements, describe its monitoring and analytics capabilities, present performance measurements, and discuss our experiences while supporting operational visibility in our cloud.

Research paper thumbnail of Accelerating the cloud with heterogeneous computing

ieee international conference on cloud computing technology and science, Jun 14, 2011

Heterogeneous multiprocessors that combine multiple CPUs and GPUs on a single die are posed to be... more Heterogeneous multiprocessors that combine multiple CPUs and GPUs on a single die are posed to become commonplace in the market. As seen recently from the high performance computing community, leveraging a GPU can yield performance increases of several orders of magnitude. We propose using GPU acceleration to greatly speed up cloud management tasks in VMMs. This is only becoming possible now that the GPU is moving on-chip, since the latency across the PCIe bus was too great to make fast, informed decisions about the state of a system at any given point. We explore various examples of cloud management tasks that can greatly benefit from GPU acceleration. We also tackle tough questions of how to manage this hardware in a multi-tenant system. Finally, we present a case study that explores a common cloud operation, memory deduplication, and show that GPU acceleration can improve the performance of its hashing component by a factor of over 80.

Research paper thumbnail of EnVi: Energy Efï¬cient Video Player for Mobiles

CellNet 2013 workshop, Jun 25, 2013

Research paper thumbnail of Data-Driven AI Model Signal-Awareness Enhancement and Introspection

AI modeling for source code understanding tasks has been making significant progress, and is bein... more AI modeling for source code understanding tasks has been making significant progress, and is being adopted in production development pipelines. However, reliability concerns, especially whether the models are actually learning task-related aspects of source code, are being raised. While recent model-probing approaches have observed a lack of signal awareness in many AI-for-code models, i.e. models not capturing task-relevant signals, they do not offer solutions to rectify this problem. In this paper, we explore data-driven approaches to enhance models’ signal-awareness: 1) we combine the SE concept of code complexity with the AI technique of curriculum learning; 2) we incorporate SE assistance into AI models by customizing Delta Debugging to generate simplified signal-preserving programs, augmenting them to the training dataset. With our techniques, we achieve up to 4.8x improvement in model signal awareness. Using the notion of code complexity, we further present a novel model lear...

Research paper thumbnail of Towards Non-Intrusive Software Introspection and Beyond

2020 IEEE International Conference on Cloud Engineering (IC2E), 2020

Continuous verification and security analysis of software systems are of paramount importance to ... more Continuous verification and security analysis of software systems are of paramount importance to many organizations. The state-of-the-art for such operations implements agent-based approaches to inspect the provisioned software stack for security and compliance issues. However, this approach, which runs agents on the systems being analyzed, is vulnerable to some attacks, can incur substantial performance impact, and can introduce significant complexity. In this paper, we present the design and prototype implementation of a general-purpose approach for Non-intrusive Software Introspection (NSI). By adhering to NSI, organizations hosting in the cloud can as well control the software introspection workflow with reduced trust in the provider. Experimental analysis of real-world applications demonstrates that NSI presents a lightweight and scalable approach, and has a negligible impact on the performance of applications running on the instance being introspected.

Research paper thumbnail of Unified Monitoring and Analytics in the Cloud

Modern cloud applications are distributed across a wide range of instances of multiple types, inc... more Modern cloud applications are distributed across a wide range of instances of multiple types, including virtual machines, containers, and baremetal servers. Traditional approaches to monitoring and analytics fail in these complex, distributed and diverse environments. They are too intrusive and heavy-handed for short-lived, lightweight cloud instances, and cannot keep up with rapid the pace of change in the cloud with continuous dynamic scheduling, provisioning and auto-scaling. We introduce a unified monitoring and analytics architecture designed for the cloud. Our approach leverages virtualization and containerization to decouple monitoring from instance execution and health. Moreover, it provides a uniform view of systems regardless of instance type, and operates without intervening with the end-user context. We describe an implementation of our approach in an actual deployment, and discuss our experiences and observed results.

Research paper thumbnail of Can Container Fusion Be Securely Achieved?

Proceedings of the 5th International Workshop on Container Technologies and Container Clouds, 2019

Linux containers are key enablers for building microservices. The application's microservices... more Linux containers are key enablers for building microservices. The application's microservices fall broadly under two categories, the core-microservices implementing the business logic and the utility-microservices implementing middleware functionalities. Such functionalities include vulnerability scanning, monitoring, telemetry, etc. Segregating the utility-microservices in separate containers from the core-microservice containers may prevent them from achieving their functionality. This is due to the strong isolation between containers. By diffusing the boundaries between containers we can fuse them together and enable close collaboration. However, this raises several security concerns, especially that the utility-microservices may include vulnerabilities that threaten the entire application. In this paper, we analyze the different techniques to enhance the security of container fusion and present an automated solution based on Kubernetes to configure utility-microservices cont...

Research paper thumbnail of Learning to map source code to software vulnerability using code-as-a-graph

ArXiv, 2020

We explore the applicability of Graph Neural Networks in learning the nuances of source code from... more We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. Specifically, whether signatures of vulnerabilities in source code can be learned from its graph representation, in terms of relationships between nodes and edges. We create a pipeline we call AI4VA, which first encodes a sample source code into a Code Property Graph. The extracted graph is then vectorized in a manner which preserves its semantic information. A Gated Graph Neural Network is then trained using several such graphs to automatically extract templates differentiating the graph of a vulnerable sample from a healthy one. Our model outperforms static analyzers, classic machine learning, as well as CNN and RNN-based deep learning models on two of the three datasets we experiment with. We thus show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches. (Submitted Oct...

Research paper thumbnail of RECap: Run-Escape Capsule for On-demand Managed Service Delivery in the Cloud

Application runtimes are undergoing a fundamental transformation in the cloud, from general-purpo... more Application runtimes are undergoing a fundamental transformation in the cloud, from general-purpose operating systems (OSes) in virtual machines (VMs) to lightweight, minimal OSes in microcontainers. On one hand, such transformation is helping reduce application footprint in the cloud to increase agility, density and to minimize attack surface. On the other hand it makes it challenging to implement system and application management tasks. Inspired from the on-demand Function as a Service (FaaS) model in serverless computing, in RECap we are designing a cloud-native solution to deliver systems and application management tasks through specially-managed Capsule containers. Capsule containers are dynamically attached to the running containers for the duration of their implemented function and are safely removed from application context afterwards. More generally, RECap framework allows us to design disaggregated on-demand managed service delivery for containers in the cloud. In this pap...

Research paper thumbnail of Secure Extensibility for System State Extraction via Plugin Sandboxing

ArXiv, 2019

We introduce a new mechanism to securely extend systems data collection software with potentially... more We introduce a new mechanism to securely extend systems data collection software with potentially untrusted third-party code. Unlike existing tools which run extension modules or plugins directly inside the monitored endpoint (the guest), we run plugins inside a specially crafted sandbox, so as to protect the guest as well as the software core. To get the right mix of accessibility and constraints required for systems data extraction, we create our sandbox by combining multiple features exported by an unmodified kernel. We have tested its applicability by successfully sandboxing plugins of an opensourced data collection software for containerized guest systems. We have also verified its security posture in terms of successful containment of several exploits, which would have otherwise directly impacted a guest, if shipped inside third-party plugins.

Research paper thumbnail of Cryptomining Detection in Container Clouds Using System Calls and Explainable Machine Learning

IEEE Transactions on Parallel and Distributed Systems, 2021

The use of containers in cloud computing has been steadily increasing. With the emergence of Kube... more The use of containers in cloud computing has been steadily increasing. With the emergence of Kubernetes, the management of applications inside containers (or pods) is simplified. Kubernetes allows automated actions like self-healing, scaling, rolling back, and updates for the application management. At the same time, security threats have also evolved with attacks on pods to perform malicious actions. Out of several recent malware types, cryptomining has emerged as one of the most serious threats with its hijacking of server resources for cryptocurrency mining. During application deployment and execution in the pod, a cryptomining process, started by a hidden malware executable can be run in the background, and a method to detect malicious cryptomining software running inside Kubernetes pods is needed. One feasible strategy is to use machine learning (ML) to identify and classify pods based on whether or not they contain a running process of cryptomining. In addition to such detection, the system administrator will need an explanation as to the reason(s) of the ML's classification outcome. The explanation will justify and support disruptive administrative decisions such as pod removal or its restart with a new image. In this article, we describe the design and implementation of an ML-based detection system of anomalous pods in a Kubernetes cluster by monitoring Linux-kernel system calls (syscalls). Several types of cryptominers images are used as containers within an anomalous pod, and several ML models are built to detect such pods in the presence of numerous healthy cloud workloads. Explainability is provided using SHAP, LIME, and a novel auto-encoding-based scheme for LSTM models. Seven evaluation metrics are used to compare and contrast the explainable models of the proposed ML cryptomining detection engine.

Research paper thumbnail of Paracloud: Bringing Application Insight into Cloud Operations

Applications have commonly been oblivious to their cloud runtimes. This is primarily because they... more Applications have commonly been oblivious to their cloud runtimes. This is primarily because they started their journey in IaaS clouds, running on a guestOS inside VMs. Then to increase performance, many guestOSes have been paravirtualized making them virtualization aware, so that they can bypass some of the virtualization layers, as in virtio. This approach still kept applications unmodified. Recently, we are witnessing a rapid adoption of containers due to their packaging benefits, high density, fast start-up and low overhead. Applications are increasingly being on-boarded to PaaS clouds in the form of application containers or appc, where they are run directly on a cloud substrate like Kubernetes or Docker Swarm. This shift in deployment practices present an opportunity to make applications aware of their cloud. In this paper, we present Paracloud framework for application containers and discuss the Paracloud interface (PaCI) for three cloud operations namely migration, auto-scal...

Research paper thumbnail of Non-intrusive Virtual Systems Monitoring

Research paper thumbnail of Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation

Identifying vulnerable code is a precautionary measure to counter software security breaches. Ted... more Identifying vulnerable code is a precautionary measure to counter software security breaches. Tedious expert effort has been spent to build static analyzers, yet insecure patterns are barely fully enumerated. This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program, in order to improve prediction performance. Compared with a generic GNN, our enhancements include a synthesis of multiple representations learned from the several parsed graphs of a program, and a new training loss metric that leverages the fine granularity of labeling. Our model outperforms multiple text, image and graph-based approaches, across two real-world datasets.

Research paper thumbnail of Towards Reliable AI for Source Code Understanding

Proceedings of the ACM Symposium on Cloud Computing, 2021

Cloud maturity and popularity have resulted in Open source software (OSS) proliferation. And, in ... more Cloud maturity and popularity have resulted in Open source software (OSS) proliferation. And, in turn, managing OSS code quality has become critical in ensuring sustainable Cloud growth. On this front, AI modeling has gained popularity in source code understanding tasks, promoted by the ready availability of large open codebases. However, we have been observing certain peculiarities with these black-boxes, motivating a call for their reliability to be verified before offsetting traditional code analysis. In this work, we highlight and organize different reliability issues affecting AI-for-code into three stages of an AI pipeline- data collection, model training, and prediction analysis. We highlight the need for concerted efforts from the research community to ensure credibility, accountability, and traceability for AI-for-code. For each stage, we discuss unique opportunities afforded by the source code and software engineering setting to improve AI reliability.

Research paper thumbnail of Probing model signal-awareness via prediction-preserving input minimization

Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021

This work explores the signal awareness of AI models for source code understanding. Using a softw... more This work explores the signal awareness of AI models for source code understanding. Using a software vulnerability detection use case, we evaluate the models' ability to capture the correct vulnerability signals to produce their predictions. Our predictionpreserving input minimization (P2IM) approach systematically reduces the original source code to a minimal snippet which a model needs to maintain its prediction. The model's reliance on incorrect signals is then uncovered when the vulnerability in the original code is missing in the minimal snippet, both of which the model however predicts as being vulnerable. We measure the signal awareness of models using a new metric we propose-Signal-aware Recall (SAR). We apply P2IM on three different neural network architectures across multiple datasets. The results show a sharp drop in the model's Recall from the high 90s to sub-60s with the new metric, highlighting that the models are presumably picking up a lot of noise or dataset nuances while learning their vulnerability detection logic. Although the drop in model performance may be perceived as an adversarial attack, but this isn't P2IM's objective. The idea is rather to uncover the signal-awareness of a black-box model in a data-driven manner via controlled queries. SAR's purpose is to measure the impact of task-agnostic model training, and not to suggest a shortcoming in the Recall metric. The expectation, in fact, is for SAR to match Recall in the ideal scenario where the model truly captures task-specific signals.

Research paper thumbnail of Usable declarative configuration specification and validation for applications, systems, and cloud

Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference on Industrial Track - Middleware '17, 2017

Diagnosing misconfiguration across modern software stacks is increasingly difficult. These stacks... more Diagnosing misconfiguration across modern software stacks is increasingly difficult. These stacks comprise multiple microservices which are deployed across a combination of containers and hosts (VMs, physical machines) in a cloud or a data center. The existing approaches for detecting misconfiguration, whether rule-based or inference, are highly specialized (e.g., security only), cumbersome to write and maintain, geared towards a host (instead of container images), and can result into false-positives or false-negatives. This paper introduces configuration validation language (CVL), a declarative language for writing rules to detect misconfigurations that can, for instance, impact security, performance, functionality. We have built a system, ConfigValidator, which applies the CVL rules across a multitude of environments such as Docker images, running containers, host, and cloud. The system is running in production and has scanned thousands of Docker images and running containers for identifying misconfigurations.

Research paper thumbnail of Security Analysis of Container Images Using Cloud Analytics Framework

Web Services – ICWS 2018, 2018