James Joshi - Academia.edu (original) (raw)

Papers by James Joshi

arXiv (Cornell University), Mar 5, 2021

Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML... more Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties where each party can keep its data private. In this paradigm, only model updates, such as model weights or gradients, are shared. Many existing approaches have focused on horizontal FL, where each party has the entire feature set and labels in the training data set. However, many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes; this allows FedV to achieve faster training times. It also works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple types of ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer with respect to the state-of-the-art approaches. CCS CONCEPTS • Security and privacy → Privacy-preserving protocols; • Computing methodologies → Distributed artificial intelligence; Cooperation and coordination.

arXiv (Cornell University), Dec 18, 2020

Training complex neural network models using third-party cloud-based infrastructure among multipl... more Training complex neural network models using third-party cloud-based infrastructure among multiple data sources is a promising approach among existing machine learning solutions. However, privacy concerns of large-scale data collections and recent regulations have restricted the availability and use of privacy sensitive data in the third-party infrastructure. To address such privacy issues, a promising emerging approach is to train a neural network model over an encrypted dataset. Specifically, the model training process can be outsourced to a third party such as a cloud service that is backed by significant computing power, while the encrypted training data keeps the data confidential from the third party. Compared to training a traditional machine learning model over encrypted data, however, it is extremely challenging to train a deep neural network (DNN) model over encrypted data for two reasons: first, it requires large-scale computation over huge datasets; second, the existing solutions for computation over encrypted data, such as using homomorphic encryption, is inefficient. Further, for enhanced performance of a DNN model, we also need to use huge training datasets composed of data from multiple data sources that may not have pre-established trust relationships among each other. We propose a novel framework, NN-EMD, to train DNN over encrypted multiple datasets collected from multiple sources. Toward this, we propose a set of secure computation protocols using hybrid functional encryption schemes. We evaluate our framework for performance with regards to the training time and model accuracy on the MNIST datasets. We show that compared to other existing frameworks, our proposed NN-EMD framework can significantly reduce the training time, while providing comparable model accuracy and privacy guarantees as well as supporting multiple data sources. Furthermore, the depth and complexity of neural networks do not affect the training time despite introducing a privacy-preserving NN-EMD setting. Index Terms-secure computation, neural networks, deep learning, privacy-preserving, functional encryption ! 1 INTRODUCTION Deep neural networks (DNN), also known as deep learning, have been increasingly used in many fields such as computer vision, natural language processing, and speech/audio recognition, [1]. Such DNN-based solutions usually consist of two phases: the training phase and the inference phase. In the training phase, a well-designed neural network is provided as input a training dataset and an appropriate optimization algorithm to generate optimal parameters for the neural network; then, in the inference phase, the generated model (i.e., optimal parameters) is used for inference tasks, namely, predicting a label for an input sample. One of the critically needed components in DNN-based applications is a powerful computing infrastructure with higher performance CPUs and GPUs, larger memory storage, etc., [2]. The volume of training data is another critical component. For instance, existing commercial Machine Learning (ML) service providers such as Google, Microsoft, and IBM have devoted significant efforts toward building infrastructure as a service (IaaS) platforms for clients that do not have such powerful computing resources. The clients can employ these ML-related IaaS to manage a large-scale

arXiv (Cornell University), Apr 15, 2019

Emerging neural networks based machine learning techniques such as deep learning and its variants... more Emerging neural networks based machine learning techniques such as deep learning and its variants have shown tremendous potential in many application domains. However, they raise serious privacy concerns due to the risk of leakage of highly privacy-sensitive data when data collected from users is used to train neural network models to support predictive tasks. To tackle such serious privacy concerns, several privacypreserving approaches have been proposed in the literature that use either secure multi-party computation (SMC) or homomorphic encryption (HE) as the underlying mechanisms. However, neither of these cryptographic approaches provides an efficient solution towards constructing a privacy-preserving machine learning model, as well as supporting both the training and inference phases. To tackle the above issue, we propose a CryptoNN framework that supports training a neural network model over encrypted data by using the emerging functional encryption scheme instead of SMC or HE. We also construct a functional encryption scheme for basic arithmetic computation to support the requirement of the proposed CryptoNN framework. We present performance evaluation and security analysis of the underlying crypto scheme and show through our experiments that CryptoNN achieves accuracy that is similar to those of the baseline neural network models on the MNIST dataset.

IEEE Transactions on Dependable and Secure Computing

Increasingly, information systems rely on computational, storage, and network resources deployed ... more Increasingly, information systems rely on computational, storage, and network resources deployed in third-party facilities such as cloud centers and edge nodes. Such an approach further exacerbates cybersecurity concerns constantly raised by numerous incidents of security and privacy attacks resulting in data leakage and identity theft, among others. These have, in turn, forced the creation of stricter security and privacy-related regulations and have eroded the trust in cyberspace. In particular, security-related services and infrastructures, such as Certificate Authorities (CAs) that provide digital certificate services and Third-Party Authorities (TPAs) that provide cryptographic key services, are critical components for establishing trust in crypto-based privacy-preserving applications and services. To address such trust issues, various transparency frameworks and approaches have been recently proposed in the literature. This paper proposes TAB framework that provides transparency and trustworthiness of third-party authority and third-party facilities using blockchain techniques for emerging crypto-based privacy-preserving applications. TAB employs the Ethereum blockchain as the underlying public ledger and also includes a novel smart contract to automate accountability with an incentive mechanism that motivates users to participate in auditing, and punishes unintentional or malicious behaviors. We implement TAB and show through experimental evaluation in the Ethereum official test network, Rinkeby, that the framework is efficient. We also formally show the security guarantee provided by TAB, and analyze the privacy guarantee and trustworthiness it provides.

arXiv (Cornell University), Aug 10, 2021

Machine learning (ML) is increasingly being adopted in a wide variety of application domains. Usu... more Machine learning (ML) is increasingly being adopted in a wide variety of application domains. Usually, a well-performing ML model relies on a large volume of training data and high-powered computational resources. Such a need for and the use of huge volumes of data raise serious privacy concerns because of the potential risks of leakage of highly privacy-sensitive information; further, the evolving regulatory environments that increasingly restrict access to and use of privacy-sensitive data add significant challenges to fully benefiting from the power of ML for data-driven applications. A trained ML model may also be vulnerable to adversarial attacks such as membership, attribute, or property inference attacks and model inversion attacks. Hence, well-designed privacy-preserving ML (PPML) solutions are critically needed for many emerging applications. Increasingly, significant research efforts from both academia and industry can be seen in PPML areas that aim toward integrating privacy-preserving techniques into ML pipeline or specific algorithms, or designing various PPML architectures. In particular, existing PPML research cross-cut ML, systems and applications design, as well as security and privacy areas; hence, there is a critical need to understand state-of-the-art research, related challenges and a research roadmap for future research in PPML area. In this paper, we systematically review and summarize existing privacy-preserving approaches and propose a Phase, Guarantee, and Utility (PGU) triad based model to understand and guide the evaluation of various PPML solutions by decomposing their privacy-preserving functionalities. We discuss the unique characteristics and challenges of PPML and outline possible research directions that leverage as well as benefit multiple research communities such as ML, distributed systems, security and privacy.

2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC), 2016

IEEE Transactions on Dependable and Secure Computing, 2021

Recent advances in computing have enabled cloud storage service, among others, that collect and p... more Recent advances in computing have enabled cloud storage service, among others, that collect and provide efficient long term storage of huge amounts data that may include users’ privacy sensitive information. Concerns about the security and privacy of the sensitive data stored in the cloud is one key obstacle to the success of these cloud based applications and services. To tackle these issues, Attribute based Encryption (ABE) approaches, especially the Ciphertext-Policy Attribute based Encryption (CP-ABE), have been shown to be very promising. ABE helps provide access control solutions to protect the privacy-sensitive information stored in the cloud storage centers. However, use of an ABE approach in such cases suffers from two key insider threats: insider threat due to colluding users; and that due to a potentially malicious or compromised authority center. Even though the users’ collusion has been addressed in the literature, to our best knowledge, the authority center as an insid...

ArXiv, 2021

Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security, 2021

Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML... more Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties to keep their data private and only model updates are shared. Most existing approaches have focused on horizontal FL, while many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes and works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer to the comparable state-of-the-art approaches. CCS CONCEPTS • Security and privacy → Privacy-preserving protocols; • Computing methodologies → Distributed artificial intelligence.

IEEE Transactions on Dependable and Secure Computing, 2019

Recent advances in information technologies have facilitated applications to generate, collect or... more Recent advances in information technologies have facilitated applications to generate, collect or process large amounts of sensitive personal data. Emerging cloud storage services provide a better paradigm to support the needs of such applications. Such cloud based solutions introduce additional security and privacy challenges when dealing with outsourced data including that of supporting fine-grained access control over such data stored in the cloud. In this paper, we propose an integrated, privacy-preserving user-centric attribute based access control framework to ensure the security and privacy of users' data outsourced and stored by a cloud service provider (CSP). The core component of the proposed framework is a novel privacy-preserving, revocable ciphertext policy attribute-based encryption (PR-CP-ABE) scheme. To support advanced access control features like write access on encrypted data and privacy-preserving access policy updates, we propose extended Path-ORAM access protocol that can also prevent privacy disclosure of access patterns. We also propose an integrated secure deduplication approach to improve the storage efficiency of CSPs while protecting data privacy. Finally, we evaluate the proposed framework and compare it with other existing solutions with regards to the security and performance issues.

Computers & Security, 2018

Cloaking-based location privacy preserving mechanisms have been widely adopted to protect users' ... more Cloaking-based location privacy preserving mechanisms have been widely adopted to protect users' location privacy when using location-based services. A fundamental limitation of such mechanisms is that users and their location information in the system are inherently trusted by the Anonymization Server without any verification. In this paper, we show that such an issue could lead to a new class of attacks called location injection attacks which can successfully violate users' in-distinguishability (guaranteed by k-Anonymity) among a set of users. We propose and characterize location injection attacks by presenting a set of attack models and quantify the costs associated with them. We then propose and evaluate k-Trustee, a trust-aware location cloaking mechanism that is resilient to location injection attacks and guarantees a lower bound on the user's in-distinguishability. k-Trustee guarantees that each user in a given cloaked region can achieve the required privacy level of k-Anonymity by including at least k-1 other trusted users in the cloaked region. We demonstrate the effectiveness of k-Trustee through extensive experiments in a real-world geographic map and our experimental results show that the proposed cloaking algorithm guaranteeing k-Trustee is effective against various location injection attacks.

Proceedings of the first ACM conference on Data and application security and privacy, 2011

Online social networks (OSNs) are becoming increasingly popular and Identity Clone Attacks (ICAs)... more Online social networks (OSNs) are becoming increasingly popular and Identity Clone Attacks (ICAs) that aim at creating fake identities for malicious purposes on OSNs are becoming a significantly growing concern. Such attacks severely affect the trust relationships a victim has built with other users if no active protection is applied. In this paper, we first analyze and characterize the behaviors of ICAs. Then we propose a detection framework that is focused on discovering suspicious identities and then validating them. Towards detecting suspicious identities, we propose two approaches based on attribute similarity and similarity of friend networks. The first approach addresses a simpler scenario where mutual friends in friend networks are considered; and the second one captures the scenario where similar friend identities are involved. We also present experimental results to demonstrate flexibility and effectiveness of the proposed approaches. Finally, we discuss some feasible solutions to validate suspicious identities.

Advances in Digital Government, 2002

A digital government can be viewed as an amalgam of heterogeneous information systems that exchan... more A digital government can be viewed as an amalgam of heterogeneous information systems that exchange high-volume information among government agencies and public and private sectors engaged in government business. This gives rise to several daunting multidomain security challenges as well as concern for citizen privacy. The success of a digital government infrastructure depends on how well it meets these challenges and its preparedness against numerous potential threats ranging from simple act of hacking to cyber-terrorism. In this paper, we outline these crucial security and privacy issues and present various solutions that are available and need to be further investigated.

Proceedings of the 5th International ICST Conference on Collaborative Computing: Networking, Applications, Worksharing, 2009

Considering the growth of wireless communication and mobile positioning technologies, location-ba... more Considering the growth of wireless communication and mobile positioning technologies, location-based services (LBSs) have been generating increasing research interest in recent years. One of the critical issues for the deployment of LBS applications is how to reconcile their quality of service with privacy concerns. Location privacy based on k-anonymity is a very common way to hide the real locations of the users from the LBS provider. Several k-anonymity approaches have been proposed in the literature, each with some drawbacks. They need either a trusted third party or the users (or providers) to trust each other in collaborative approaches. In this paper, we propose a collaborative approach that provides k-anonymity in a distributed manner and does not require a trusted third party nor the users (or providers) to trust each other. Furthermore, our approach integrates well with the existing communication infrastructure. A user's location is known to only his/her location provider (e.g., cell phone operator). By using cryptographic schemes, user with the help of location providers determines whether the k-anonymity property is satisfied in a query area or not. We start with a simple scenario where user and location providers are honest-but-curious and then we progressively extend our protocol to deal with scenarios where entities may collude with each other. Moreover, we analyze possible threats and discuss how our proposed approach defends against such threats.

Concepts, Methodologies, Tools, and Applications

Lecture Notes in Computer Science, 2006

A key challenge in emerging multi-domain open environments is the need to establish trust-based, ... more A key challenge in emerging multi-domain open environments is the need to establish trust-based, loosely coupled partnerships between previously unknown domains. An efficient trust framework is essential to facilitate trust negotiation based on the service requirements of the partner domains. While several trust mechanisms have been proposed, none address the issue of integrating the trust mechanisms with the process of integrating access control policies of partner domains to facilitate secure interoperation. In this paper, we propose a requirements-driven trust framework for secure interoperation in open environments. Our framework tightly integrates game-theory based trust negotiation with service negotiation, and policy mapping to ensure secure interoperation.

2010 IEEE 34th Annual Computer Software and Applications Conference Workshops, 2010

Cloud computing has recently gained tremendous momentum but still is in its infancy. It has the p... more Cloud computing has recently gained tremendous momentum but still is in its infancy. It has the potential for significant cost reduction and the increased operating efficiencies in computing. Although security issues are delaying its fast adoption, cloud computing is an unstoppable force and we need to provide security mechanisms to ensure its secure adoption. In this paper, we propose a comprehensive security framework for cloud computing environments. We also discuss challenges, existing solutions, approaches, and future work needed to provide a trustworthy cloud computing environment.

Concepts, Methodologies, Tools, and Applications

Mobile cloud computing has grown out of two hot technology trends, mobility and cloud. The emerge... more Mobile cloud computing has grown out of two hot technology trends, mobility and cloud. The emergence of cloud computing and its extension into the mobile domain creates the potential for a global, interconnected mobile cloud computing environment that will allow the entire mobile ecosystem to enrich their services across multiple networks. We can utilize significant optimization and increased operating power offered by cloud computing to enable seamless and transparent use of cloud resources to extend the capability of resource constrained mobile devices. However, in order to realize mobile cloud computing, we need to develop mechanisms to achieve interoperability among heterogeneous and distributed devices. We need solutions to discover best available resources in the cloud servers based on the user demands and approaches to deliver desired resources and services efficiently and in a timely fashion to the mobile terminals. Furthermore, while mobile cloud computing has tremendous potential to enable the mobile terminals to have access to powerful and reliable computing resources anywhere and anytime, we must consider several issues including privacy and security, and reliability in realizing mobile cloud computing. In this chapter, the authors first explore the architectural components required to realize a mobile cloud computing infrastructure. They then discuss mobile cloud computing features with their unique privacy and security implications. They present unique issues of mobile cloud computing that exacerbate privacy and security challenges. They also discuss various approaches to address these challenges and explore the future work needed to provide a trustworthy mobile cloud computing environment.

Advanced Location-Based Technologies and Services, 2013

Location-based Services (LBS) have gained popularity as a result of the advances in mobile and co... more Location-based Services (LBS) have gained popularity as a result of the advances in mobile and communication technologies. LBS provide users with relevant information based on their location. In spite of the desirable features provided by LBS, the geographic locations of users are not adequately protected. Location privacy is one of the major challenges in vehicular and mobile networks. In this article, we analyse the security and privacy requirements for LBS in vehicular and mobile networks. Specifically, this paper covers privacy enhancing technologies and cryptographic approaches that provide location privacy in vehicular and mobile networks. The different approaches proposed in literature are compared and open research areas are identified.

arXiv (Cornell University), Mar 5, 2021

arXiv (Cornell University), Dec 18, 2020

arXiv (Cornell University), Apr 15, 2019

IEEE Transactions on Dependable and Secure Computing

arXiv (Cornell University), Aug 10, 2021

2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC), 2016

IEEE Transactions on Dependable and Secure Computing, 2021

ArXiv, 2021

Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security, 2021

Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML... more Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties to keep their data private and only model updates are shared. Most existing approaches have focused on horizontal FL, while many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes and works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer to the comparable state-of-the-art approaches. CCS CONCEPTS • Security and privacy → Privacy-preserving protocols; • Computing methodologies → Distributed artificial intelligence.

IEEE Transactions on Dependable and Secure Computing, 2019

Computers & Security, 2018

Proceedings of the first ACM conference on Data and application security and privacy, 2011

Advances in Digital Government, 2002

Proceedings of the 5th International ICST Conference on Collaborative Computing: Networking, Applications, Worksharing, 2009

Concepts, Methodologies, Tools, and Applications

Lecture Notes in Computer Science, 2006

2010 IEEE 34th Annual Computer Software and Applications Conference Workshops, 2010

Concepts, Methodologies, Tools, and Applications

Advanced Location-Based Technologies and Services, 2013