A Faster Algorithm for Cuckoo Insertion and Bipartite Matching in Large Graphs (original) (raw)

0 O ct 2 01 3 On the Insertion Time of Cuckoo Hashing ∗

2013

Cuckoo hashing is an efficient technique for creating large hash tables with high space utilization and guaranteed constant access times. There, each item can be placed in a location given by any one out of k different hash functions. In this paper we investigate the random walk heuristic for inserting in an online fashion new items into the hash table. Provided that k ≥ 3 and that the number of items in the table is below (but arbitrarily close) to the theoretically achievable load threshold, we show a polylogarithmic bound for the maximum insertion time that holds with probability 1− o(1) as the size of the table grows large.

On the Insertion Time of Cuckoo Hashing

SIAM Journal on Computing, 2013

Sharp load thresholds for cuckoo hashing

2012

The paradigm of many choices has influenced significantly the design of efficient data structures and, most notably, hash tables. Cuckoo hashing is a technique that extends this concept. There, we are given a table with n locations, and we assume that each location can hold one item. Each item to be inserted chooses randomly k ≥ 2 locations and has to be placed in any one of them. How much load can cuckoo hashing handle before collisions prevent the successful assignment of the available items to the chosen locations? Practical evaluations of this method have shown that one can allocate a number of elements that is a large proportion of the size of the table, being very close to 1 even for small values of k such as 4 or 5.

Optimising large hash tables for lookup performance

Proceeding of the" IADIS International Conference …, 2008

Hash tables can provide fast mapping between keys and values even for voluminous data sets. Our main goal is to find a suitable implementation having compact structure and efficient collision avoidance method. Our attention is focused on maximizing the lookup performance when handling several millions of data items. This paper suggest a new memory consumption oriented way for comparing the significantly different approaches and analyses various types of hash table implementations in order to answer the question what structure needs to be used and how the parameters must be chosen in order to achieve a maximal lookup performance with the lowest possible memory consumption.

HashGraph—Scalable Hash Tables Using a Sparse Graph Data Structure

ACM Transactions on Parallel Computing (TOPC), 2021

In this article, we introduce HashGraph, a new scalable approach for building hash tables that uses concepts taken from sparse graph representations—hence, the name HashGraph. HashGraph introduces a new way to deal with hash-collisions that does not use “open-addressing” or “separate-chaining,” yet it has the benefits of both these approaches. HashGraph currently works for static inputs. Recent progress with dynamic graph data structures suggests that HashGraph might be extendable to dynamic inputs as well. We show that HashGraph can deal with a large number of hash values per entry without loss of performance. Last, we show a new querying algorithm for value lookups. We experimentally compare HashGraph to several state-of-the-art implementations and find that it outperforms them on average 2× when the inputs are unique and by as much as 40× when the input contains duplicates. The implementation of HashGraph in this article is for NVIDIA GPUs. HashGraph can build a hash table at a r...

Cuckoo Hashing with Pages

Lecture Notes in Computer Science, 2011

Although cuckoo hashing has significant applications in both theoretical and practical settings, a relevant downside is that it requires lookups to multiple locations. In many settings, where lookups are expensive, cuckoo hashing becomes a less compelling alternative. One such standard setting is when memory is arranged in large pages, and a major cost is the number of page accesses. We propose the study of cuckoo hashing with pages, advocating approaches where each key has several possible locations, or cells, on a single page, and additional choices on a second backup page. We show experimentally that with k cell choices on one page and a single backup cell choice, one can achieve nearly the same loads as when each key has k + 1 random cells to choose from, with most lookups requiring just one page access, even when keys are placed online using a simple algorithm. While our results are currently experimental, they suggest several interesting new open theoretical questions for cuckoo hashing with pages.

Optimal Hashing in External Memory

2018

Hash tables are a ubiquitous class of dictionary data structures. However, standard hash table implementations do not translate well into the external memory model, because they do not incorporate locality for insertions. Iacono and Patracsu established an update/query tradeoff curve for external hash tables: a hash table that performs insertions in O(lambda/B)O(\lambda/B)O(lambda/B) amortized IOs requires Omega(loglambdaN)\Omega(\log_\lambda N)Omega(loglambdaN) expected IOs for queries, where NNN is the number of items that can be stored in the data structure, BBB is the size of a memory transfer, MMM is the size of memory, and lambda\lambdalambda is a tuning parameter. They provide a hashing data structure that meets this curve for lambda\lambdalambda that is Omega(loglogM+logMN)\Omega(\log\log M + \log_M N)Omega(loglogM+logMN). Their data structure, which we call an \defn{IP hash table}, is complicated and, to the best of our knowledge, has not been implemented. In this paper, we present a new and much simpler optimal external memory hash table, the \defn{Bundle of Arrays Hash Table} (BO...

M-N Hashing: Search Time Optimization with Collision Resolution Using Balanced Tree

Futuristic Trends in Networks and Computing Technologies, 2020

In the field of networking, storing and fast lookup from the large amount of data are two important measures. Hashing is one of the well-known techniques for indexing and retrieving data from the database efficiently. It is mostly used for fast lookups in the fields which require quick results from the database. As the size of the database increases, the number of collisions in the hash table also increases which handled through collision resolution techniques. Traditional algorithms like Separate Chaining, Linear Probing, and Quadratic probing takes linear search time. The tremendous increase of data in recent years requires more profound hash table implementation. In this paper, we have proposed a new and innovative way of implementing hash table to handle collisions more efficiently in logarithmic time i.e. M-N Hashing. To handle the collisions with a scalable sized hash table, the proposed algorithm used the concept of the AVL tree. The performance of the proposed algorithm is analyzed using continuous integer dataset that varies from 0 to 100000. Through experiments, it is depicted that M-N hashing improved the search time up to 99.97% with respect to contemporary algorithms i.e. Separate Chaining, Linear Probing, and Quadratic probing.

Unique permutation hashing

Theoretical Computer Science, 2013

We propose a new hash function, the unique-permutation hash function, and a performance analysis of its hash computation. Following the notations of [15], the cost of a hash function h is denoted by C h (k, N) and stands for the expected number of table entries that are checked when inserting the (k + 1) st key to a hash table of size N , where k out of N table entries are filled by previous insertions. A hash function maps a key to a permutation of the table locations. A hash function h is simple uniform if items are equally likely to be hashed to any table location (in the first trial). A hash function h is random or strong uniform if the probability of any permutation to be a probe sequence, when using h, is 1 N ! , where N is the size of the table. According to [15], the cost of a random hash function, denoted by C 0 (k, N) = 1 + k/(N − k + 1), is a "lower bound" on hashing performance. A lower bound in a sense that if some hash function h has a cost C h < C 0 (k, N), for some k and N , then there exists k < k such that C h (k , N) > C 0 (k , N). We show that the unique-permutation hash function is not only a simple uniform hash function but also a random hash function, i.e., strong uniform and therefore has the cost of C 0 (k, N). Namely, each probe sequence is equally likely to be chosen, when the keys are uniformly chosen. Our hash function ensures that each empty table location has the same probability to be assigned with a uniformly chosen key. We also show that the expected time for computing the unique-permutation hash function is O(1) and the expected number of table locations that are checked before an empty location is found, during insertion (or search), is also O(1) for constant load factors α < 1, where α is the ratio between the number of inserted items and the table size. The unique-permutation hash function has an application for parallel, distributed and multi-core systems that avoid contention as much as possible.

Almost random graphs with simple hash functions

2003

We describe a simple randomized construction for generating pairs of hash functions h 1 , h 2 from a universe U to ranges V = [m] = {0, 1, . . . , m − 1} and W = [m] so that for every key set S ⊆ U with n = |S| ≤ m/(1 + ) the (random) bipartite (multi)graph with node set V W and edge set {(h 1 (x), h 2 (x)) | x ∈ S} exhibits a structure that is essentially random. The construction combines d-wise independent classes for d a relatively small constant with the wellknown technique of random offsets. While keeping the space needed to store the description of h 1 and h 2 at O(n ζ ), for ζ < 1 fixed arbitrarily, we obtain a much smaller (constant) evaluation time than previous constructions of this kind, which involved Siegel's high-performance hash classes. The main new technique is the combined analysis of the graph structure and the inner structure of the hash functions, as well as a new way of looking at the cycle structure of random (multi)graphs. The construction may be applied to improve on Pagh and Rodler's "cuckoo hashing" (2001), to obtain a simpler and faster alternative to a recent construction of Ostlin and Pagh (2002/03) for simulating uniform hashing on a key set S, and to the simulation of shared memory on distributed memory machines. We also describe a novel way of implementing (approximate) d-wise independent hashing without using polynomials.

A Faster Algorithm for Cuckoo Insertion and Bipartite Matching in Large Graphs (original) (raw)

Related papers