Shantanu Sharma - Profile on Academia.edu (original) (raw)

Papers by Shantanu Sharma

ArXiv, 2020

ArXiv, 2015

MapReduce has proven to be one of the most useful paradigms in the revolution of distributed comp... more MapReduce has proven to be one of the most useful paradigms in the revolution of distributed computing, where cloud services and cluster computing become the standard venue for computing. The federation of cloud and big data activities is the next challenge where MapReduce should be modified to avoid (big) data migration across remote (cloud) sites. This is exactly our scope of research, where only the very essential data for obtaining the result is transmitted, reducing communication, processing and preserving data privacy as much as possible. In this work, we propose an algorithmic technique for MapReduce algorithms, called Meta-MapReduce, that decreases the communication cost by allowing us to process and move metadata to clouds and from the map phase to reduce phase. In Meta-MapReduce, the reduce phase fetches only the required data at required iterations, which in turn, assists in preserving the data privacy.

24th International Conference on Extending Database Technology (EDBT) 2021., 2021

This paper proposes a system, entitled Concealer that allows sharing time-varying spatial data (e... more This paper proposes a system, entitled Concealer that allows sharing time-varying spatial data (e.g., as produced by sensors) in encrypted form to an untrusted third-party service provider to provide location-based applications (involving aggregation queries over selected regions over time windows) to users. Concealer exploits carefully selected encryption techniques to use indexes supported by database systems and combines ways to add fake tuples in order to realize an efficient system that protects against leakage based on output-size. Thus, the design of Concealer overcomes two limitations of existing symmetric searchable encryption (SSE) techniques: (i) it avoids the need of specialized data structures that limit usability/practicality of SSE in large scale deployments, and (ii) it avoids information leakages based on the output-size, which may leak data distributions. Experimental results validate the efficiency of the proposed algorithms over a spatial time-series dataset (collected from a smart space) and TPC-H datasets, each of 136 Million rows, the size of which prior approaches have not scaled to.

Contact tracing has emerged as one of the main mitigation strategies to prevent the spread of pan... more Contact tracing has emerged as one of the main mitigation strategies to prevent the spread of pandemics such as COVID-19. Recently, several efforts have been initiated to track individuals, their movements, and interactions using technologies, e.g., Bluetooth beacons, cellular data records, and smartphone applications. Such solutions are often intrusive, potentially violating individual privacy rights and are often subject to regulations (e.g., GDPR and CCPR) that mandate the need for opt-in policies to gather and use personal information. In this paper, we introduce QUEST, a system that empowers organizations to observe individuals and spaces to implement policies for social distancing and contact tracing using WiFi connectivity data in a passive and privacy-preserving manner. The goal is to ensure the safety of employees and occupants at an organization, while protecting the privacy of all parties. QUEST incorporates computationally-and informationtheoretically-secure protocols that prevent adversaries from gaining knowledge of an individual's location history (based on WiFi data); it includes support for accurately identifying users who were in the vicinity of a confirmed patient, and then informing them via opt-in mechanisms. QUEST supports a range of privacy-enabled applications to ensure adherence to social distancing, monitor the flow of people through spaces, identify potentially impacted regions, and raise exposure alerts. We describe the architecture, design choices, and implementation of the proposed security/privacy techniques in QUEST. We, also, validate the practicality of QUEST and evaluate it thoroughly via an actual campus-scale deployment at UC Irvine over a very large dataset of over 50M tuples.

ACM Transactions on Management Information Systems. , 2020

Despite extensive research on cryptography, secure and efficient query processing over outsourced... more Despite extensive research on cryptography, secure and efficient query processing over outsourced data remains an open challenge. This paper continues along with the emerging trend in secure data processing that recognizes that the entire dataset may not be sensitive, and hence, non-sensitivity of data can be exploited to overcome limitations of existing encryption-based approaches. We, first, provide a new security definition, entitled partitioned data security for guaranteeing that the joint processing of non-sensitive data (in cleartext) and sensitive data (in encrypted form) does not lead to any leakage. Then, this paper proposes a new secure approach, entitled query binning (QB) that allows secure execution of queries over non-sensitive and sensitive parts of the data. QB maps a query to a set of queries over the sensitive and non-sensitive data in a way that no leakage will occur due to the joint processing over sensitive and non-sensitive data. In particular, we propose secure algorithms for selection, range, and join queries to be executed over encrypted sensitive and cleartext non-sensitive datasets. Interestingly, in addition to improving performance, we show that QB actually strengthens the security of the underlying cryptographic technique by preventing size, frequency-count, and workload-skew attacks.

IEEE Transactions on Knowledge and Data Engineering (TKDE), 2020

Despite exciting progress on cryptography, secure and efficient query processing over outsourced ... more Despite exciting progress on cryptography, secure and efficient query processing over outsourced data remains an open challenge. We develop a communication-efficient and information-theoretically secure system, entitled \textsc{Obscure} for aggregation queries with conjunctive or disjunctive predicates, using secret-sharing. \textsc{Obscure} is strongly secure (\textit{i}.\textit{e}., secure regardless of the computational-capabilities of an adversary) and prevents the network, as well as, the (adversarial) servers to learn the user's queries, results, or the database. In addition, \textsc{Obscure} provides additional security features, such as hiding access-patterns (\textit{i}.\textit{e}., hiding the identity of the tuple satisfying a query) and hiding query-patterns (\textit{i}.\textit{e}., hiding which two queries are identical). Also, \textsc{Obscure} does not require any communication between any two servers that store the secret-shared data before/during/after the query execution. Moreover, our techniques deal with the secret-shared data that is outsourced by a single or multiple database owners, as well as, allows a user, which may not be the database owner, to execute the query over secret-shared data. We further develop (non-mandatory) privacy-preserving result verification algorithms that detect malicious behaviors, and experimentally validate the efficiency of \textsc{Obscure} on large datasets, the size of which prior approaches of secret-sharing or multi-party computation systems have not scaled to.

ACM Transactions on Cyber-Physical Systems (TCPS), 2020

This paper focuses on the new privacy challenges that arise in smart homes. Specifically, the pap... more This paper focuses on the new privacy challenges that arise in smart homes. Specifically, the paper focuses on inferring the user’s activities – which may, in turn, lead to the user’s privacy – via inferences through device activities and network traffic analysis. We develop techniques that are based on a cryptographically secure token circulation in a ring network consisting of smart home devices to prevent inferences from device activities, via device workflow, i.e., inferences from a coordinated sequence of devices’ actuation. The solution hides the device activity and corresponding channel activities, and thus, preserve the individual’s activities. We also extend our solution to deal with a large number of devices and devices that produce large-sized data by implementing parallel rings. Our experiments also evaluate the performance in terms of communication overheads of the proposed approach and the obtained privacy.

Proceedings of the Sixth International Workshop on Security and Privacy Analytics, 2020

publication description10th ACM Conference on Data and Application Security and Privacy (CPDASPY), 2020

The growing deployment of Internet of Things (IoT) systems aims to ease the daily life of end-use... more The growing deployment of Internet of Things (IoT) systems aims to ease the daily life of end-users by providing several value-added services. However, IoT systems may capture and store sensitive, personal data about individuals in the cloud, thereby jeopardizing user-privacy. Emerging legislation, such as California's CalOPPA and GDPR in Europe, support strong privacy laws to protect an individual's data in the cloud. One such law relates to strict enforcement of data retention policies. This paper proposes a framework, entitled IoT Expunge that allows sensor data providers to store the data in cloud platforms that will ensure enforcement of retention policies. Additionally, the cloud provider produces verifiable proofs of its adherence to the retention policies. Experimental results on a real-world smart building testbed show that IoT Expunge imposes minimal overheads to the user to verify the data against data retention policies. CCS CONCEPTS • Security and privacy → Security protocols; Mobile and wireless security; Domain-specific security and privacy architectures; Social aspects of security and privacy.

10th ACM Conference on Data and Application Security and Privacy, 2020

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2019

Data outsourcing allows data owners to keep their data at untrusted clouds that do not ensure the... more Data outsourcing allows data owners to keep their data at untrusted clouds that do not ensure the privacy of data and/or computations. One useful framework for fault-tolerant data processing in a distributed fashion is MapReduce, which was developed for trusted private clouds. This paper presents algorithms for data outsourcing based on Shamir's secret-sharing scheme and for executing privacy-preserving SQL queries such as count, selection including range selection, projection, and join while using MapReduce as an underlying programming model. Our proposed algorithms prevent an adversary from knowing the database or the query while also preventing output-size and access-pattern attacks. Interestingly, our algorithms do not involve the database owner, which only creates and distributes secret-shares once, in answering any query, and hence, the database owner also cannot learn the query. Logically and experimentally, we evaluate the efficiency of the algorithms on the following parameters: (i) the number of communication rounds (between a user and a server), (ii) the total amount of bit flow (between a user and a server), and (iii) the computational load at the user and the server.

PVLDB, 2019

Despite extensive research on cryptography, secure and efficient query processing over outsourced... more Despite extensive research on cryptography, secure and efficient query processing over outsourced data remains an open challenge. We develop communication-efficient and information-theoretically secure algorithms for privacy-preserving aggregation queries using multi-party computation (MPC). Specifically, query processing techniques over secret-shared data outsourced by single or multiple database owners are developed. These algorithms allow a user to execute queries on the secret-shared database and also prevent the network and the (adversarial) clouds to learn the user's queries, results , or the database. We further develop (non-mandatory) privacy-preserving result verification algorithms that detect malicious behaviors , and experimentally validate the efficiency of our approach over large datasets, the size of which prior approaches to secret-sharing or MPC systems have not scaled to.

ACM Conference on Data and Application Security and Privacy (CODASPY), 2019., 2019

Advances in sensing, networking, and actuation technologies have resulted in the IoT wave that is... more Advances in sensing, networking, and actuation technologies have
resulted in the IoT wave that is expected to revolutionize all aspects
of modern society. This paper focuses on the new challenges of
privacy that arise in IoT in the context of smart homes. Specifically,
the paper focuses on preventing the user’s privacy via inferences
through channel and in-home device activities. We propose
a method for securely scheduling the devices while decoupling the
device and channels activities. The proposed solution avoids any
attacks that may reveal the coordinated schedule of the devices,
and hence, also, assures that inferences that may compromise individual’s privacy are not leaked due to device and channel level
activities. Our experiments also validate the proposed approach,
and consequently, an adversary cannot infer device and channel
activities by just observing the network traffic.

Despite extensive research on cryptography, secure and efficient query processing over outsourced... more Despite extensive research on cryptography, secure and efficient query processing over outsourced data remains an open challenge. This paper continues along the emerging trend in secure data processing that recognizes that the entire dataset may not be sensitive, and hence, non-sensitivity of data can be exploited to overcome limitations of existing encryption-based approaches. We propose a new secure approach, entitled query binning (QB) that allows non-sensitive parts of the data to be outsourced in clear-text while guaranteeing that no information is leaked by the joint processing of non-sensitive data (in cleartext) and sensitive data (in encrypted form). QB maps a query to a set of queries over the sensitive and non-sensitive data in a way that no leakage will occur due to the joint processing over sensitive and non-sensitive data. Interestingly, in addition to improve performance, we show that QB actually strengthens the security of the underlying cryptographic technique by preventing size, frequency-count, and workload-skew attacks.

IEEE TRANSACTIONS ON BIG DATA

Hadoop and Spark are widely used distributed processing frameworks for large-scale data processin... more Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many industries, e.g., Google, Facebook, and Amazon, for solving a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern matching, and social network analysis. However, all these popular systems have a major drawback in terms of locally distributed computations, which prevent them in implementing geographically distributed data processing. The increasing amount of geographically distributed massive data is pushing industries and academia to rethink the current big-data processing systems. The novel frameworks, which will be beyond state-of-the-art architectures and technologies involved in the current system, are expected to process geographically distributed data at their locations without moving entire raw datasets to a single location. In this paper, we investigate and discuss challenges and requirements in designing geographically distributed data processing frameworks and protocols. We classify and study batch processing (MapReduce-based systems), stream processing (Spark-based systems), and SQL-style processing geo-distributed frameworks, models, and algorithms with their overhead issues.

A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of redu... more A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of reducers, such that for each required output there exists a reducer that receives all the inputs that participate in the computation of this output. Reducers have a capacity, which limits the sets of inputs that they can be assigned. However, individual inputs may vary in terms of size. We consider, for the first time, mapping schemas where input sizes are part of the considerations and restrictions. One of the significant parameters to optimize in any MapReduce job is communication cost between the map and reduce phases. The communication cost can be optimized by minimizing the number of copies of inputs sent to the reducers. The communication cost is closely related to the number of reducers of constrained capacity that are used to accommodate appropriately the inputs, so that the requirement of how the inputs must meet in a reducer is satisfied. In this work, we consider a family of problems where it is required that each input meets with each other input in at least one reducer. We also consider a slightly different family of problems in which, each input of a list, X, is required to meet each input of another list, Y , in at least one reducer. We prove that finding an optimal mapping schema for these families of problems is NP-hard, and present a bin-packing-based approximation algorithm for finding a near optimal mapping schema.

Data outsourcing allows data owners to keep their data in public clouds, which do not ensure the ... more Data outsourcing allows data owners to keep their data in public clouds, which do not ensure the privacy of data and computations. One fundamental and useful framework for processing data in a distributed fashion is MapReduce. In this paper, we investigate and present techniques for executing MapReduce computations in the public cloud while preserving privacy. Specifically , we propose a technique to outsource a database using Shamir secret-sharing scheme to public clouds, and then, provide privacy-preserving algorithms for performing search and fetch, equijoin, and range queries using MapReduce. Consequently , in our proposed algorithms, the public cloud cannot learn the database or computations. All the proposed algorithms eliminate the role of the database owner, which only creates and distributes secret-shares once, and minimize the role of the user, which only needs to perform a simple operation for result reconstructing. We evaluate the efficiency by (i) the number of communication rounds (between a user and a cloud), (ii) the total amount of bit flow (between a user and a cloud), and (iii) the computational load at the user-side and the cloud-side.

MapReduce is a programming system for distributed processing large-scale data in an efficient and... more MapReduce is a programming system for distributed processing large-scale data in an efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is extensively used daily around the world as an efficient distributed computation tool for a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern matching, and analysis of social networks. Security and privacy of data and MapReduce computations are essential concerns when a MapReduce computation is executed in public or hybrid clouds. In order to execute a MapReduce job in public and hybrid clouds, authentication of mappers-reducers, confidentiality of data-computations, integrity of data-computations, and correctness-freshness of the outputs are required. Satisfying these requirements shield the operation from several types of attacks on data and MapReduce computations. In this paper, we investigate and discuss security and privacy challenges and requirements, considering a variety of adversarial capabilities, and characteristics in the scope of MapReduce. We also provide a review of existing security and privacy protocols for MapReduce and discuss their overhead issues.

ArXiv, 2020

ArXiv, 2015

24th International Conference on Extending Database Technology (EDBT) 2021., 2021

ACM Transactions on Management Information Systems. , 2020

IEEE Transactions on Knowledge and Data Engineering (TKDE), 2020

ACM Transactions on Cyber-Physical Systems (TCPS), 2020

Proceedings of the Sixth International Workshop on Security and Privacy Analytics, 2020

publication description10th ACM Conference on Data and Application Security and Privacy (CPDASPY), 2020

10th ACM Conference on Data and Application Security and Privacy, 2020

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2019

PVLDB, 2019

Despite extensive research on cryptography, secure and efficient query processing over outsourced... more Despite extensive research on cryptography, secure and efficient query processing over outsourced data remains an open challenge. We develop communication-efficient and information-theoretically secure algorithms for privacy-preserving aggregation queries using multi-party computation (MPC). Specifically, query processing techniques over secret-shared data outsourced by single or multiple database owners are developed. These algorithms allow a user to execute queries on the secret-shared database and also prevent the network and the (adversarial) clouds to learn the user's queries, results , or the database. We further develop (non-mandatory) privacy-preserving result verification algorithms that detect malicious behaviors , and experimentally validate the efficiency of our approach over large datasets, the size of which prior approaches to secret-sharing or MPC systems have not scaled to.

ACM Conference on Data and Application Security and Privacy (CODASPY), 2019., 2019

Despite extensive research on cryptography, secure and efficient query processing over outsourced... more Despite extensive research on cryptography, secure and efficient query processing over outsourced data remains an open challenge. This paper continues along the emerging trend in secure data processing that recognizes that the entire dataset may not be sensitive, and hence, non-sensitivity of data can be exploited to overcome limitations of existing encryption-based approaches. We propose a new secure approach, entitled query binning (QB) that allows non-sensitive parts of the data to be outsourced in clear-text while guaranteeing that no information is leaked by the joint processing of non-sensitive data (in cleartext) and sensitive data (in encrypted form). QB maps a query to a set of queries over the sensitive and non-sensitive data in a way that no leakage will occur due to the joint processing over sensitive and non-sensitive data. Interestingly, in addition to improve performance, we show that QB actually strengthens the security of the underlying cryptographic technique by preventing size, frequency-count, and workload-skew attacks.

IEEE TRANSACTIONS ON BIG DATA

Contact tracing has emerged as one of the main mitigation strategies to prevent the spread of pan... more Contact tracing has emerged as one of the main mitigation strategies to prevent the spread of pandemics such as COVID-19. Recently , several efforts have been initiated to track individuals, their movements, and interactions using technologies, e.g., Bluetooth beacons, cellular data records, and smartphone applications. Such solutions are often intrusive, potentially violating individual privacy rights and are often subject to regulations (e.g., GDPR and CCPR) that mandate the need for opt-in policies to gather and use personal information. In this paper, we introduce QUEST, a system that empowers organizations to observe individuals and spaces to implement policies for social distancing and contact tracing using WiFi connectivity data in a passive and privacy-preserving manner. The goal is to ensure the safety of employees and occupants at an organization, while protecting the privacy of all parties. QUEST incorporates computationally-and information-theoretically-secure protocols that prevent adversaries from gaining knowledge of an individual's location history (based on WiFi data); it includes support for accurately identifying users who were in the vicinity of a confirmed patient, and then informing them via opt-in mechanisms. QUEST supports a range of privacy-enabled applications to ensure adherence to social distancing, monitor the flow of people through spaces, identify potentially impacted regions, and raise exposure alerts. We describe the architecture, design choices, and implementation of the proposed security/privacy techniques in QUEST. We, also, validate the practicality of QUEST and evaluate it thoroughly via an actual campus-scale deployment at UC Irvine over a very large dataset of over 50M tuples.

Data outsourcing allows data owners to keep their data at untrusted clouds that do not ensure the... more Data outsourcing allows data owners to keep their data at untrusted clouds that do not ensure the privacy of data and/or computations. One useful framework for fault-tolerant data processing in a distributed fashion is MapReduce, which was developed for trusted private clouds. This paper presents algorithms for data outsourcing based on Shamir's secret-sharing scheme and for executing privacy-preserving SQL queries such as count, selection including range selection, projection, and join while using MapReduce as an underlying programming model. The proposed algorithms prevent the untrusted cloud to know the database or the query while also preventing output size and access-pattern attacks. Interestingly, our algorithms do not need the database owner, which only creates and distributes secret-shares once, to be involved to answer any query, and hence, the database owner also cannot learn the query. We evaluate the efficiency of the algorithms on parameters: (i) the number of communication rounds (between a user and a cloud), (ii) the total amount of bit flow (between a user and a cloud), and (iii) the computational load at the user-side and the cloud-side.

IEEE Big Data, 2019

Over the last decade, public and private clouds emerged as de facto platforms for big-data analyt... more Over the last decade, public and private clouds emerged as de facto platforms for big-data analytical workloads. Outsourcing one’s data to the cloud, however, comes with multiple security and privacy challenges. In a world where service providers can be located anywhere in the world, fall under varying legal jurisdictions, i.e., be a subject of different laws governing privacy and confidentiality of one’s data, and be a target of well-sponsored (sometimes even government-sponsored) security attacks protecting data in a cloud is far from trivial. This tutorial focuses on two principal lines of research (cryptographic- and hardware-based) aimed to provide secure processing of big-data in a modern cloud. First, we focus on cryptographic (encryption- and secret- sharing-based) techniques developed over the last two decades and specifically compare them based on efficiency and information leakage. We demonstrate that despite extensive research on cryptography, secure query processing over outsourced data remains an open challenge. We then survey the landscape of emerging secure hardware, i.e., recent hardware extensions like Intel’s Software Guard Extensions (SGX) aimed to secure third-party computations in the cloud. Unfortunately, despite being designed to provide a secure execution environment, existing SGX implementations suffer from a range of side-channel attacks that require careful software techniques to make them practically secure. Taking SGX as an example, we will discuss representative classes of side-channel attacks, and security challenges involved in the construction of hardware-based data processing systems. We conclude that neither cryptographic techniques nor secure hardware are sufficient alone. To provide efficient and secure large-scale data processing at the cloud, a new line of work that combines software and hardware mechanisms is required. We discuss an orthogonal approach designed around the concept of data partitioning, i.e., splitting the data processing into cryptographically secure and non-secure parts. Finally, we will discuss some open questions in designing secure cryptographic techniques that can process large-sized data efficiently.

VLDB, 2019

Despite extensive research on cryptography, secure and efficient query processing over outsourced... more Despite extensive research on cryptography, secure and efficient query processing over outsourced data remains an open challenge. We develop communication-efficient and information-theoretically secure algorithms for privacy-preserving aggregation queries using multi-party computation (MPC). Specifically, query processing techniques over secret-shared data outsourced by single or multiple database owners are developed. These algorithms allow a user to execute queries on the secret-shared database and also prevent the network and the (adversarial) clouds to learn the user's queries, results, or the database. We further develop (non-mandatory) privacy-preserving result verification algorithms that detect malicious behaviors, and experimentally validate the efficiency of our approach over large datasets, the size of which prior approaches to secret-sharing or MPC systems have not scaled to.

IEEE International Conference on Data Engineering (ICDE), 2019

Despite extensive research on cryptography, secure and efficient query processing over outsourced... more Despite extensive research on cryptography, secure and efficient query processing over outsourced data remains an open challenge. This paper continues along with the emerging trend in secure data processing that recognizes that the entire dataset may not be sensitive, and hence, non-sensitivity of data can be exploited to overcome limitations of existing encryption-based approaches. We propose a new secure approach, entitled query binning (QB) that allows non-sensitive parts of the data to be outsourced in clear-text while guaranteeing that no information is leaked by the joint processing of non-sensitive data (in clear-text) and sensitive data (in encrypted form). QB maps a query to a set of queries over the sensitive and non-sensitive data in a way that no leakage will occur due to the joint processing over sensitive and non-sensitive data. Interestingly, in addition to improve performance, we show that QB actually strengthens the security of the underlying cryptographic technique by preventing size, frequency-count, and workload-skew attacks.