Secure Metric-Based Index for Similarity Cloud (original) (raw)

Privacy-Preserving Outsourced Similarity Search

Journal of Database Management, 2014

The general trend in data management is to outsource data to 3rd party systems that would provide data retrieval as a service. This approach naturally brings privacy concerns about the (potentially sensitive) data. Recently, quite extensive research has been done on privacy-preserving outsourcing of traditional exact-match and keyword search. However, not much attention has been paid to outsourcing of similarity search, which is essential in content-based retrieval in current multimedia, sensor or scientific data. In this paper, the authors propose a scheme of outsourcing similarity search. They define evaluation criteria for these systems with an emphasis on usability, privacy and efficiency in real applications. These criteria can be used as a general guideline for a practical system analysis and we use them to survey and mutually compare existing approaches. As the main result, the authors propose a novel dynamic similarity index EM-Index that works for an arbitrary metric space ...

Efficient Similarity Search over Encrypted Data

In recent years, due to the appealing features of cloud computing, large amount of data have been stored in the cloud. Although cloud based services offer many advantages, privacy and security of the sensitive data is a big concern. To mitigate the concerns, it is desirable to outsource sensitive data in encrypted form. Encrypted storage protects the data against illegal access, but it complicates some basic, yet important functionality such as the search on the data. To achieve search over encrypted data without compromising the privacy, considerable amount of searchable encryption schemes have been proposed in the literature. However, almost all of them handle exact query matching but not similarity matching; a crucial requirement for real world applications. Although some sophisticated secure multi-party computation based cryptographic techniques are available for similarity tests, they are computationally intensive and do not scale for large data sources.

Efficiency and security in similarity cloud services

Proceedings of the VLDB Endowment, 2013

With growing popularity of cloud services, the trend in the industry is to outsource the data to a 3rd party system that provides searching in the data as a service. This approach naturally brings privacy concerns about the (potentially sensitive) data. Recently, quite extensive research of outsourcing classic exact-match or keyword search has been done. However, not much attention has been paid to the outsourcing of the similarity search, which becomes more and more important in information retrieval applications. In this work, we propose to the research community a model of outsourcing similarity search to the cloud environment (so called similarity cloud). We establish privacy and efficiency requirements to be laid down for the similarity cloud with an emphasis on practical use of the system in real applications; this requirement list can be used as a general guideline for practical system analysis and we use it to analyze current existing approaches. We propose two new similarit...

Secure Metadata based Search over Encrypted Cloud data supporting Similarity Ranking

Cloud storage has become more and more popular in the recent trend since it provides various benefits over the traditional storage solutions. However, security and privacy issues arise by allowing a cloud service provider (CSP) to take the custody of sensitive data; hence making data encryption inevitable. Even though the encryption approach strengthens the data security it results in degraded data efficiency because of the reduced search ability. A major challenge confronting researchers especially in cloud computing environment, when sensitive data is stored over the cloud, is to store the data in hidden form and yet provide efficient retrieval of outsourced encrypted cloud data at the same time. In this paper we present a scheme for secure Search over Encrypted Cloud Data (SECD) to improve the data discovery and user searching experience by searching metadata instead of original outsourced data. It supports multi-keyword, extends keywords to include synonyms and related terms, by incorporating the concept of domain dictionary. The necessary preprocessing of the keywords is done at the user end in order to reduce the overhead and processing time at cloud end. By incorporating XML based index file, we reduced the index file storage requirements and search time by significantly large amounts. The security analysis testifies that data security has been improved by searching the cloud using trapdoor instead of keywords. The performance analysis of the proposed scheme on the dataset concluded that time taken to update the index file has been reduced quiet significantly over outsourced encrypted cloud data.

[Formula: see text]: Oblivious similarity based searching for encrypted data outsourced to an untrusted domain

PloS one, 2017

Public cloud storage services are becoming prevalent and myriad data sharing, archiving and collaborative services have emerged which harness the pay-as-you-go business model of public cloud. To ensure privacy and confidentiality often encrypted data is outsourced to such services, which further complicates the process of accessing relevant data by using search queries. Search over encrypted data schemes solve this problem by exploiting cryptographic primitives and secure indexing to identify outsourced data that satisfy the search criteria. Almost all of these schemes rely on exact matching between the encrypted data and search criteria. A few schemes which extend the notion of exact matching to similarity based search, lack realism as those schemes rely on trusted third parties or due to increase storage and computational complexity. In this paper we propose Oblivious Similarity based Search ([Formula: see text]) for encrypted data. It enables authorized users to model their own e...

Privacy-Preserving Multi-keyword Top-k Ranked Similarity Search Over Encrypted Data

International Journal of Advance Research in Computer Science and Management Studies [IJARCSMS] ijarcsms.com, 2020

Cloud computing provides the facility to store and manage data remotely. The volume of information is increasing perday. The owners choose to store the sensitive data on the cloud storage. To protect the data from unauthorized accesses, the data must be uploaded in encrypted form. Due to large amount of information is stored on the cloud storage; the association between the documents is hiding when the documents are encrypted. It is necessary to design a search technique which gives the results on the basis of the similarity values of the encrypted documents. In this paper a cosine similarity clustering method is proposed to make the clusters of similar documents based on the cosine values of the document vectors. We also proposed a MRSE-CSI model used to search the documents which are in encrypted form. The proposed search technique only finds the cluster of documents with the highest similarity value instead of searching on the whole dataset. Processing the dataset on two algorithms shows that the time needed to form the clusters in the proposed method is less. When the documents in the dataset increases, the time needed to form clusters also increases. The result of the search shows that increasing the documents also increases the search time of the proposed method.

Privacy-Preserving Multi-keyword Ranked Search over Encrypted Cloud Data

With the advent of cloud computing, data owners are motivated to outsource their complex data management systems from local sites to commercial public cloud for great flexibility and economic savings. But for protecting data privacy, sensitive data has to be encrypted before outsourcing, which obsoletes traditional data utilization based on plaintext keyword search. Thus, enabling an encrypted cloud data search service is of paramount importance. Considering the large number of data users and documents in cloud, it is crucial for the search service to allow multi-keyword query and provide result similarity ranking to meet the effective data retrieval need. Related works on searchable encryption focus on single keyword search or Boolean keyword search, and rarely differentiate the search results. In this paper, for the first time, we define and solve the challenging problem of privacy-preserving multi-keyword ranked search over encrypted cloud data (MRSE), and establish a set of strict privacy requirements for such a secure cloud data utilization system to become a reality. Among various multi-keyword semantics, we choose the efficient principle of "coordinate matching", i.e., as many matches as possible, to capture the similarity between search query and data documents, and further use "inner product similarity" to quantitatively formalize such principle for similarity measurement. We first propose a basic MRSE scheme using secure inner product computation, and then significantly improve it to meet different privacy requirements in two levels of threat models. Thorough analysis investigating privacy and efficiency guarantees of proposed schemes is given, and experiments on the real-world dataset further show proposed schemes indeed introduce low overhead on computation and communication.

A Unified Framework for Secure Search Over Encrypted Cloud Data

IACR Cryptol. ePrint Arch., 2017

This paper presents a unified framework that supports different types of privacy-preserving search queries over encrypted cloud data. In the framework, users can perform any of the multi-keyword search, range search and k-nearest neighbor search operations in a privacypreserving manner. All three types of queries are transformed into predicate-based search leveraging bucketization, locality sensitive hashing and homomorphic encryption techniques. The proposed framework is implemented using Hadoop MapReduce, and its efficiency and accuracy are evaluated using publicly available real data sets. The implementation results show that the proposed framework can effectively be used in moderate sized data sets and it is scalable for much larger data sets provided that the number of computers in the Hadoop cluster is increased. To the best of our knowledge, the proposed framework is the first privacy-preserving solution, in which three different types of search queries are effectively applie...

Secure Multi-Keyword Top K Similarity Search Using Asymmetric Encryption Over Encrypted Cloud Data

2020

Cloud computing provides the facility for both massive data storage and management. A cloud has large capacity to store data from number of users and individual and offers an easy option for the fetching of data anytime. Nowadays a number of organizations are shifting their fields towards cloud for the data storage and retrieval and the reason for so is the complex data management at economical cost along with great flexibility. But then comes the issue of security and hence cloud offers the security techniques that encrypt the data before outsourcing to the cloud server and in turn it also reduces the data utility. In the existing systems, the Symmetric Key Encryption is made to use but is less secure as compared to Asymmetric Key Encryption. Furthermore, the Symmetric Key Encryption offers a limited numbers of cluster that are used to access the secured data. Hence, to improve the query efficiency, data security and data utility the scheme named Group Multi-Keyword Top-K Search is...