A Space Lower Bound for Dynamic Approximate Membership Data Structures (original) (raw)
Related papers
Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract)
Lecture Notes in Computer Science
The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f : U → {0, 1} r that has specified values on the elements of a given set S ⊆ U , |S| = n, but may have any value on elements outside S. All known methods (e. g. those based on perfect hash functions), induce a space overhead of Θ(n) bits over the optimum, regardless of the evaluation time. We show that for any k, query time O(k) can be achieved using space that is within a factor 1 + e −k of optimal, asymptotically for large n. The time to construct the data structure is O(n), expected. If we allow logarithmic evaluation time, the additive overhead can be reduced to O(log log n) bits whp. A general reduction transfers the results on retrieval into analogous results on approximate membership, a problem traditionally addressed using Bloom filters. Thus we obtain space bounds arbitrarily close to the lower bound for this problem as well. The evaluation procedures of our data structures are extremely simple. For the results stated above we assume free access to fully random hash functions. This assumption can be justified using space o(n) to simulate full randomness on a RAM.
2019
Dynamic Bloom filters (DBF) were proposed by Guo et. al. in 2010 to tackle the situation where the size of the set to be stored compactly is not known in advance or can change during the course of the application. We propose a novel competitor to DBF with the following important property that DBF is not able to achieve: our structure is able to maintain a bound on the false positive rate for the set membership query across all possible sizes of sets that are stored in it. The new data structure we propose is a dynamic structure that we call Dynamic Partition Bloom filter (DPBF). DPBF is based on our novel concept of a Bloom partition tree which is a tree structure with standard Bloom filters at the leaves. DPBF is superior to standard Bloom filters because it can efficiently handle a large number of unions and intersections of sets of different sizes while controlling the false positive rate. This makes DPBF the first structure to do so to the best of our knowledge. We provide theor...
Yes-no Bloom filter: A way of representing sets with fewer false positives
ArXiv, 2016
The Bloom filter (BF) is a space efficient randomized data structure particularly suitable to represent a set supporting approximate membership queries. BFs have been extensively used in many applications especially in networking due to their simplicity and flexibility. The performances of BFs mainly depends on query overhead, space requirements and false positives. The aim of this paper is to focus on false positives. Inspired by the recent application of the BF in a novel multicast forwarding fabric for information centric networks, this paper proposes the yes-no BF, a new way of representing a set, based on the BF, but with significantly lower false positives and no false negatives. Although it requires slightly more processing at the stage of its formation, it offers the same processing requirements for membership queries as the BF. After introducing the yes-no BF, we show through simulations, that it has better false positive performance than the BF.
Cardinality estimation and dynamic length adaptation for Bloom filters
2010
Bloom filters are extensively used in distributed applications, especially in distributed databases and distributed information systems, to reduce network requirements and to increase performance. In this work, we propose two novel Bloom filter features that are important for distributed databases and information systems. First, we present a new approach to encode a Bloom filter such that its length can be adapted to the cardinality of the set it represents, with negligible overhead with respect to computation and false positive probability. The proposed encoding allows for significant network savings in distributed databases, as it enables the participating nodes to optimize the length of each Bloom filter before sending it over the network, for example, when executing Bloom joins. Second, we show how to estimate the number of distinct elements in a Bloom filter, for situations where the represented set is not materialized. These situations frequently arise in distributed databases, where estimating the cardinality of the represented sets is necessary for constructing an efficient query plan. The estimation is highly accurate and comes with tight probabilistic bounds. For both features we provide a thorough probabilistic analysis and extensive experimental evaluation which confirm the effectiveness of our approaches.
Blooming Trees: Space-Efficient Structures for Data Representation
2008 IEEE International Conference on Communications, 2008
A Bloom Filter is an efficient randomized data structure for membership queries on a set with a certain known false positive probability. A Counting Bloom Filter (CBF) allows the same operations on dynamical sets that can be updated via insertions and deletions with larger memory requirements. This paper presents a novel hierarchical data structure, called Blooming Tree, that replicates the functionalities of a CBF with lower memory consumption and tunable false positive probability. The hierarchical multi-layer design of Blooming Trees allows for distributing the structure in different memory levels, thus exploiting small but fast on-chip memories for most frequently accessed substructures. The proposed algorithm is compared to previous existing schemes on a target platform: Intel IXP2XXX Network Processors (NPs).
An optimal Bloom filter replacement
2005
This paper considers space-efficient data structures for storing an approximation S to a set S such that S ⊆ S and any element not in S belongs to S with probability at most . The Bloom filter data structure, solving this problem, has found widespread use. Our main result is a new RAM data structure that improves Bloom filters in several ways:
On the analysis of Bloom filters
The Bloom filter is a simple random binary data structure which can be efficiently used for approximate set membership testing. When testing for membership of an object, the Bloom filter may give a false positive, whose probability is the main performance figure of the structure. We complete and extend the analysis of the Bloom filter available in the literature by means of the γ-transform approach. Known results are confirmed and new results are provided, including the variance of the number of bits set to 1 in the filter. We consider the choice of bits to be set to 1 when an object is inserted both with and without replacement, in what we call standard and classic Bloom filter, respectively. Simple iterative schemes for the computation of the false positive probability and a new non-iterative approximation, taking into account the variance of bits set to 1, are also provided.
Beyond bloom filters: from approximate membership checks to approximate state machines
ACM SIGCOMM …, 2006
Many networking applications require fast state lookups in a concurrent state machine, which tracks the state of a large number of flows simultaneously. We consider the question of how to compactly represent such concurrent state machines. To achieve compactness, we consider data structures for Approximate Concurrent State Machines (ACSMs) that can return false positives, false negatives, or a "don't know" response. We describe three techniques based on Bloom filters and hashing, and evaluate them using both theoretical analysis and simulation. Our analysis leads us to an extremely efficient hashing-based scheme with several parameters that can be chosen to trade off space, computation, and the impact of errors. Our hashing approach also yields a simple alternative structure with the same functionality as a counting Bloom filter that uses much less space.
fimpera: drastic improvement of Approximate Membership Query data-structures with counts
MotivationsApproximate membership query data structures (AMQ) such as Cuckoo filters or Bloom filters are widely used for representing and indexing large sets of elements. AMQ can be generalized for additionally counting indexed elements, they are then called “counting AMQ”. This is for instance the case of the “counting Bloom filters”. However, counting AMQs suffer from false positive and overestimated calls.ResultsIn this work we propose a novel computation method, called fimpera, consisting of a simple strategy for reducing the false-positive rate of any AMQ indexing all k-mers (words of length k) from a set of sequences, along with their abundance information.This method decreases the false-positive rate of a counting Bloom filter by an order of magnitude while reducing the number of overestimated calls, as well as lowering the average difference between the overestimated calls and the ground truth. In addition, it slightly decreases the query run time. fimpera does not require ...
2007
We present the Bitwise Bloom Filter, a data structure for maintaining counts for a large number of items. The bitwise filter is an extension of the Bloom filter, a space-efficient data structure for storing a large set efficiently by discarding the identity of the items being held while still being able to determine whether it is in the set or not, with high probability. We show how this idea can be extended to maintaining counts of items by maintaining a separate Bloom filter for every position in the bit representations of all the counts. We give ...