Cache-Oblivious and Data-Oblivious Sorting and Applications (original) (raw)

Cache Complexity of Cache-Oblivious Approaches: A Review and Extension

International Journal of Advanced Computer Science and Applications

The latest direction in cache-aware/cache-efficient algorithms is to use cache-oblivious algorithms based on the cache-oblivious model, which is an improvement of the externalmemory model. The cache-oblivious model utilizes memory hierarchies without knowing memories' parameters in advance since algorithms of this model are automatically tuned according to the actual memory parameters. As a result, cache-oblivious algorithms are particularly applied to multi-level caches with changing parameters and to environments in which the amount of available memory for an algorithm can fluctuate. This paper shows the state of the art in cache-oblivious algorithms and data structures; each with its complexity concerning cache misses, which is called cache complexity. Additionally, this paper introduces an extension to minimize the cache complexity of neural networks by applying an appropriate cache-oblivious approach to neural networks.

Algorithms and Data Structures: Efficient and Cache-Oblivious

The Computer Architecture consists of memory hierarchy which varies from fast and expensive to cheap and slower. And with increasing speed of the processors the time taken to transfer data between memories is more than the actual time taken to process it. To alleviate this cache-oblivious algorithms and data structures are developed. In this paper we discuss various cache-oblivious data structures like B-tree and hash table implementing cacheoblivious hashing; and cache-oblivious algorithms like integer multiplication and string sorting with improvement.

Resource Oblivious Sorting on Multicores

Lecture Notes in Computer Science, 2010

We present a deterministic sorting algorithm, SPMS (Sample, Partition, and Merge Sort), that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(n log n) time cache-obliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(log n • log log n), which improves on previous bounds for optimal cache oblivious sorting. The algorithm also has low false sharing costs. When scheduled by a work-stealing scheduler in a multicore computing environment with a global shared memory and p cores, each having a cache of size M organized in blocks of size B, the costs of the additional cache misses and false sharing misses due to this parallel execution are bounded by the cost of O(S • M/B) and O(S • B) cache misses respectively, where S is the number of steals performed during the execution. Finally, SPMS is resource oblivious in that the dependence on machine parameters appear only in the analysis of its performance, and not within the algorithm itself.

Exponential Structures for Efficient Cache-Oblivious Algorithms

Lecture Notes in Computer Science, 2002

We present cache-oblivious data structures based upon exponential structures. These data structures perform well on a hierarchical memory but do not depend on any parameters of the hierarchy, including the block sizes and number of blocks at each level. The problems we consider are searching, partial persistence and planar point location. On a hierarchical memory where data is transferred in blocks of size B, some of the results we achieve are:-We give a linear-space data structure for dynamic searching that supports searches and updates in optimal O(log B N) worst-case I/Os, eliminating amortization from the result of Bender, Demaine, and Farach-Colton (FOCS '00). We also consider finger searches and updates and batched searches.-We support partially-persistent operations on an ordered set, namely, we allow searches in any previous version of the set and updates to the latest version of the set (an update creates a new version of the set). All operations take an optimal O(log B (m + N)) amortized I/Os, where N is the size of the version being searched/updated, and m is the number of versions.-We solve the planar point location problem in linear space, taking optimal O(log B N) I/Os for point location queries, where N is the number of line segments specifying the partition of the plane. The pre-processing requires O((N/B) log M/B N) I/Os, where M is the size of the 'inner' memory.

Oblivious Network RAM and Leveraging Parallelism to Achieve Obliviousness

Lecture Notes in Computer Science, 2015

Oblivious RAM (ORAM) is a cryptographic primitive that allows a trusted CPU to securely access untrusted memory, such that the access patterns reveal nothing about sensitive data. ORAM is known to have broad applications in secure processor design and secure multiparty computation for big data. Unfortunately, due to a logarithmic lower bound by Goldreich and Ostrovsky (J ACM 43(3):431-473, 1996), ORAM is bound to incur a moderate cost in practice. In particular, with the latest developments in ORAM constructions, we are quickly approaching this limit, and the room for performance improvement is small. In this paper, we consider new models of computation in which the cost of obliviousness can be fundamentally reduced in comparison with the standard ORAM model. We propose the oblivious network RAM model of computation, where a CPU communicates with multiple memory banks, such that the adversary observes only which bank the CPU is communicating with, but not the address offset within each memory bank. In other words, obliviousness within each bank comes for free-either because the architecture prevents a malicious party from observing the address accessed within a bank, or because another solution is used to obfuscate © International Association for Cryptologic Research 2018 942 D. Dachman-Soled et al. memory accesses within each bank-and hence we only need to obfuscate communication patterns between the CPU and the memory banks. We present new constructions for obliviously simulating general or parallel programs in the network RAM model. We describe applications of our new model in distributed storage applications with a network adversary.

The Pyramid Scheme: Oblivious RAM for Trusted Processors

ArXiv, 2017

Modern processors, e.g., Intel SGX, allow applications to isolate secret code and data in encrypted memory regions called enclaves. While encryption effectively hides the contents of memory, the sequence of address references issued by the secret code leaks information. This is a serious problem because these leaks can easily break the confidentiality guarantees of enclaves. In this paper, we explore Oblivious RAM (ORAM) designs that prevent these information leaks under the constraints of modern SGX processors. Most ORAMs are a poor fit for these processors because they have high constant overhead factors or require large private memories, which are not available in these processors. We address these limitations with a new hierarchical ORAM construction, the Pyramid ORAM, that is optimized towards online bandwidth cost and small blocks. It uses a new hashing scheme that circumvents the complexity of previous hierarchical schemes. We present an efficient x64-optimized implementation...

The Cache Complexity of Multithreaded Cache Oblivious Algorithms

Theory of Computing Systems, 2009

We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a private cache. We specialize this technique to computations executed by the Cilk work-stealing scheduler on a machine with dag-consistent shared memory. We show that a multithreaded cache oblivious matrix multiplication incurs O(n 3 / √ Z + (P n) 1/3 n 2 ) cache misses when executed by the Cilk scheduler on a machine with P processors, each with a cache of size Z, with high probability. This bound is tighter than previously published bounds. We also present a new multithreaded cache oblivious algorithm for 1D stencil computations incurring O(n 2 /Z + n + √ P n 3+ ) cache misses with high probability, one for Gaussian elimination and back substitution, and one for the length computation part of the longest common subsequence problem incurring O(n 2 /Z + √ P n 3.58 ) cache misses with high probability.

An Evaluation Framework for Fastest Oblivious RAM

2018

Oblivious RAM (ORAM) is security provable approach for memory access pattern hiding. However, since ORAM incurs high computational overheads due to repeated shuffles of data blocks in a memory, numerous constructions have been proposed to reduce it. While the computational cost has been improved by these constructions as compared to early ones, it is still expensive from the practical point of view. Specifically, in its application to IoT devices, less computational cost is expected for avoiding high energy consumption. We thus focus on an ORAM construction proposed by Nakano et al. in 2012, which we call the fastest ORAM. The computational cost of this construction is much less than any other conventional ORAM constructions. However, the security has not been analyzed sufficiently, due to the lack of practical security definitions. Therefore, we formulate a new security definition for the fastest ORAM on the basis of the average minentropy, and propose a framework for evaluating the security.

Data Oblivious ISA Extensions for Side Channel-Resistant and High Performance Computing

Proceedings 2019 Network and Distributed System Security Symposium, 2019

Blocking microarchitectural (digital) side channels is one of the most pressing challenges in hardware security today. Recently, there has been a surge of effort that attempts to block these leakages by writing programs data obliviously. In this model, programs are written to avoid placing sensitive data-dependent pressure on shared resources. Despite recent efforts, however, running data oblivious programs on modern machines today is insecure and low performance. First, writing programs obliviously assumes certain instructions in today's ISAs will not leak privacy, whereas today's ISAs and hardware provide no such guarantees. Second, writing programs to avoid data-dependent behavior is inherently high performance overhead. This paper tackles both the security and performance aspects of this problem by proposing a Data Oblivious ISA extension (OISA). On the security side, we present ISA design principles to block microarchitectural side channels, and embody these ideas in a concrete ISA capable of safely executing existing data oblivious programs. On the performance side, we design the OISA with support for efficient memory oblivious computation, and with safety features that allow modern hardware optimizations, e.g., out-of-order speculative execution, to remain enabled in the common case. We provide a complete hardware prototype of our ideas, built on top of the RISC-V out-of-order, speculative BOOM processor, and prove that the OISA can provide the advertised security through a formal analysis of an abstract BOOM-style machine. We evaluate area overhead of hardware mechanisms needed to support our prototype, and provide performance experiments showing how the OISA speeds up a variety of existing data oblivious codes (including "constant time" cryptography and memory oblivious data structures), in addition to improving their security and portability.

Oblivious Network RAM

IACR Cryptology ePrint Archive, 2015

Oblivious RAM (ORAM) is a cryptographic primitive that allows a trusted CPU to securely access untrusted memory, such that the access patterns reveal nothing about sensitive data. ORAM is known to have broad applications in secure processor design and secure multi-party computation for big data. Unfortunately, due to a logarithmic lower bound by Goldreich and Ostrovsky (Journal of the ACM, '96), ORAM is bound to incur a moderate cost in practice. In particular, with the latest developments in ORAM constructions, we are quickly approaching this limit, and the room for performance improvement is small. In this paper, we consider new models of computation in which the cost of obliviousness can be fundamentally reduced in comparison with the standard ORAM model. We propose the Oblivious Network RAM model of computation, where a CPU communicates with multiple memory banks, such that the adversary observes only which bank the CPU is communicating with, but not the address offset within each memory bank. In other words, obliviousness within each bank comes for free-either because the architecture prevents a malicious party from observing the address accessed within a bank, or because another solution is used to obfuscate memory accesses within each bank-and hence we only need to obfuscate communication patterns between the CPU and the memory banks. We present new constructions for obliviously simulating general or parallel programs in the Network RAM model. We describe applications of