Weighted bloom filter (original) (raw)

Optimizing data popularity conscious bloom filters

2008

Bloom filters are compact set representations that support set membership queries with small, one-sided error probabilities. Standard Bloom filters are oblivious to object popularity in sets and membership queries. However, sets and queries in many distributed applications follow known, stable, highly skewed distributions (e.g., Zipf-like). This paper studies the problem of minimizing the false-positive probability of a Bloom filter by adapting the number of hashes used for each data object to its popularity in sets and membership queries. We model the problem as a constrained nonlinear integer program and propose two polynomial-time solutions with bounded approximation ratios-one is a 2approximation algorithm with O(N c) running time (c ≥ 6 in practice); the other is a (2 + ǫ)-approximation algorithm with running time O(N 2 ǫ), ǫ > 0. Here N denotes the total number of distinct data objects that appear in sets or queries. We quantitatively evaluate our proposed approach on two distributed applications (cooperative caching and full-text keyword searching) driven by real-life data traces. Compared to standard Bloom filters, our data popularityconscious Bloom filters achieve up to 24 and 27 times falsepositive probability reduction for the two applications respectively. The quantitative evaluation also validates our solution's bounded approximation ratio to the optimal.

A Bloom Filter Survey: Variants for Different Domain Applications

ArXiv, 2021

There is a plethora of data structures, algorithms, and frameworks dealing with major datastream problems like estimating the frequency of items, answering set membership, association and multiplicity queries, and several other statistics that can be extracted from voluminous data streams. In this survey, we are focusing on exploring randomized data structures called Bloom Filters. This data structure answers whether an item exists or not in a data stream with a false positive probability fpp. In this survey, many variants of the Bloom filter will be covered by showing the strengths of each structure and its drawbacks i.e. some Bloom filters deal with insertion and deletions and others don’t, some variants use the memory efficiently but increase the fpp where others pay the trade-off in the reversed way. Furthermore, in each Bloom filter structure, the false positive probability will be highlighted alongside the most important technical details showing the improvement it is presenti...

On the false-positive rate of Bloom filters

Information Processing Letters, 2008

Bloom filters are a randomized data structure for membership queries dating back to 1970. Bloom filters sometimes give erroneous answers to queries, called false positives. Bloom analyzed the probability of such erroneous answers, called the false-positive rate, and Bloom's analysis has appeared in many publications throughout the years. We show that Bloom's analysis is incorrect and give a correct analysis.

Yes-no Bloom filter: A way of representing sets with fewer false positives

ArXiv, 2016

The Bloom filter (BF) is a space efficient randomized data structure particularly suitable to represent a set supporting approximate membership queries. BFs have been extensively used in many applications especially in networking due to their simplicity and flexibility. The performances of BFs mainly depends on query overhead, space requirements and false positives. The aim of this paper is to focus on false positives. Inspired by the recent application of the BF in a novel multicast forwarding fabric for information centric networks, this paper proposes the yes-no BF, a new way of representing a set, based on the BF, but with significantly lower false positives and no false negatives. Although it requires slightly more processing at the stage of its formation, it offers the same processing requirements for membership queries as the BF. After introducing the yes-no BF, we show through simulations, that it has better false positive performance than the BF.

On the analysis of Bloom filters

The Bloom filter is a simple random binary data structure which can be efficiently used for approximate set membership testing. When testing for membership of an object, the Bloom filter may give a false positive, whose probability is the main performance figure of the structure. We complete and extend the analysis of the Bloom filter available in the literature by means of the γ-transform approach. Known results are confirmed and new results are provided, including the variance of the number of bits set to 1 in the filter. We consider the choice of bits to be set to 1 when an object is inserted both with and without replacement, in what we call standard and classic Bloom filter, respectively. Simple iterative schemes for the computation of the false positive probability and a new non-iterative approximation, taking into account the variance of bits set to 1, are also provided.

Cardinality estimation and dynamic length adaptation for Bloom filters

2010

Bloom filters are extensively used in distributed applications, especially in distributed databases and distributed information systems, to reduce network requirements and to increase performance. In this work, we propose two novel Bloom filter features that are important for distributed databases and information systems. First, we present a new approach to encode a Bloom filter such that its length can be adapted to the cardinality of the set it represents, with negligible overhead with respect to computation and false positive probability. The proposed encoding allows for significant network savings in distributed databases, as it enables the participating nodes to optimize the length of each Bloom filter before sending it over the network, for example, when executing Bloom joins. Second, we show how to estimate the number of distinct elements in a Bloom filter, for situations where the represented set is not materialized. These situations frequently arise in distributed databases, where estimating the cardinality of the represented sets is necessary for constructing an efficient query plan. The estimation is highly accurate and comes with tight probabilistic bounds. For both features we provide a thorough probabilistic analysis and extensive experimental evaluation which confirm the effectiveness of our approaches.

Shed More Light on Bloom Filter's Variants

2019

Bloom Filter is a probabilistic membership data structure and it is excessively used data structure for membership query. Bloom Filter becomes the predominant data structure in approximate membership filtering. Bloom Filter extremely enhances the query response time, and the response time is very fast. Bloom filter (BF) is used to detect whether an element belongs to a given set or not. The Bloom Filter returns True Positive (TP), False Positive (FP), or True Negative (TN). The Bloom Filter is widely adapted in numerous areas to enhance the performance of a system. In this paper, we present a) in-depth insight on the Bloom Filter,and b) the prominent variants of the Bloom Filters.

scaleBF: A High Scalable Membership Filter using 3D Bloom Filter

International Journal of Advanced Computer Science and Applications, 2018

Bloom Filter is extensively deployed data structure in various applications and research domain since its inception. Bloom Filter is able to reduce the space consumption in an order of magnitude. Thus, Bloom Filter is used to keep information of a very large scale data. There are numerous variants of Bloom Filters available, however, scalability is a serious dilemma of Bloom Filter for years. To solve this dilemma, there are also diverse variants of Bloom Filter. However, the time complexity and space complexity become the key issue again. In this paper, we present a novel Bloom Filter to address the scalability issue without compromising the performance, called scaleBF. scaleBF deploys many 3D Bloom Filter to filter the set of items. In this paper, we theoretically compare the contemporary Bloom Filter for scalability and scaleBF outperforms in terms of time complexity.

Retouched Bloom filters: allowing networked applications to trade off selected false positives against false negatives

Proceedings of the 2006 ACM …, 2006

Where distributed agents must share voluminous set membership information, Bloom filters provide a compact, though lossy, way for them to do so. Numerous recent networking papers have examined the trade-offs between the bandwidth consumed by the transmission of Bloom filters, and the error rate, which takes the form of false positives, and which rises the more the filters are compressed. In this paper, we introduce the retouched Bloom filter (RBF), an extension that makes the Bloom filter more flexible by permitting the ...

Bloom Filters–Short Tutorial

2001

Bloom filters [2] are compact data structures for probabilistic representation of a set in order to support membership queries (ie queries that ask: “Is element X in set Y?”). This compact representation is the payoff for allowing a small rate of false positives in membership queries; that is, queries might incorrectly recognize an element as member of the set. ... We succinctly present Bloom filters use to date in the next section. In Section 3 we describe Bloom filters in detail, and in Section 4 we give a hopefully precise picture of space/computing time/error rate tradeoffs.