Improving retouched Bloom filter for trading off selected false positives against false negatives (original) (raw)

Retouched Bloom filters: allowing networked applications to trade off selected false positives against false negatives

Proceedings of the 2006 ACM …, 2006

Where distributed agents must share voluminous set membership information, Bloom filters provide a compact, though lossy, way for them to do so. Numerous recent networking papers have examined the trade-offs between the bandwidth consumed by the transmission of Bloom filters, and the error rate, which takes the form of false positives, and which rises the more the filters are compressed. In this paper, we introduce the retouched Bloom filter (RBF), an extension that makes the Bloom filter more flexible by permitting the ...

Retouched bloom filters

Proceedings of the 2006 ACM CoNEXT conference on - CoNEXT '06, 2006

Where distributed agents must share voluminous set membership information, Bloom filters provide a compact, though lossy, way for them to do so. Numerous recent networking papers have examined the trade-offs between the bandwidth consumed by the transmission of Bloom filters, and the error rate, which takes the form of false positives, and which rises the more the filters are compressed. In this paper, we introduce the retouched Bloom filter (RBF), an extension that makes the Bloom filter more flexible by permitting the removal of selected false positives at the expense of generating random false negatives. We analytically show that RBFs created through a random process maintain an overall error rate, expressed as a combination of the false positive rate and the false negative rate, that is equal to the false positive rate of the corresponding Bloom filters. We further provide some simple heuristics that decrease the false positive rate more than than the corresponding increase in the false negative rate, when creating RBFs. Finally, we demonstrate the advantages of an RBF over a Bloom filter in a distributed network topology measurement application, where information about large stop sets must be shared among route tracing monitors.

Yes-no Bloom filter: A way of representing sets with fewer false positives

ArXiv, 2016

The Bloom filter (BF) is a space efficient randomized data structure particularly suitable to represent a set supporting approximate membership queries. BFs have been extensively used in many applications especially in networking due to their simplicity and flexibility. The performances of BFs mainly depends on query overhead, space requirements and false positives. The aim of this paper is to focus on false positives. Inspired by the recent application of the BF in a novel multicast forwarding fabric for information centric networks, this paper proposes the yes-no BF, a new way of representing a set, based on the BF, but with significantly lower false positives and no false negatives. Although it requires slightly more processing at the stage of its formation, it offers the same processing requirements for membership queries as the BF. After introducing the yes-no BF, we show through simulations, that it has better false positive performance than the BF.

Theory and Practice of Bloom Filters for Distributed Systems

IEEE Communications Surveys & Tutorials, 2000

Many network solutions and overlay networks utilize probabilistic techniques to reduce information processing and networking costs. This survey article presents a number of frequently used and useful probabilistic techniques. Bloom filters and their variants are of prime importance, and they are heavily used in various distributed systems. This has been reflected in recent research and many new algorithms have been proposed for distributed systems that are either directly or indirectly based on Bloom filters. In this survey, we give an overview of the basic and advanced techniques, reviewing over 20 variants and discussing their application in distributed systems, in particular for caching, peer-to-peer systems, routing and forwarding, and measurement data summarization.

Hunting the Pertinency of Bloom Filter in Computer Networking and Beyond: A Survey

Bloom filter is a probabilistic data structure to filter a membership of a set. Bloom filter returns "true" or "false" with an error tolerance depending on the presence of the element in the set. Bloom filter is used to boost up the performance of a system using small space overhead. It is extensively used since its inception. e Bloom filter has met wide area of applications. Bloom filter is used in entire computing field irrespective of application and research domain. Bloom filter poses (i) high adaptability, (ii) low memory space overhead as compared to hashing algorithms, (iii) high scalability, and (iv) high performance. In this article, we uncover the application area of Bloom filter in computer networking and its related domain.

Reducing False Positives of a Bloom Filter using Cross-Checking Bloom Filters

Applied Mathematics & Information Sciences, 2014

A Bloom filter is a compact data structure that supports membership queries on a set, allowing false positives. The simplicity and the excellent performance of a Bloom filter make it a standard data structure of great use in many network applications. In reducing the false positive rate of a Bloom filter, it is well known that the size of a Bloom filter and accordingly the number of hash indices should be increased. In this paper, we propose a new architecture reducing the false positive rate of a Bloom filter more efficiently. The proposed architecture uses cross-checking Bloom filters that are queried in case of positives of a main Bloom filter to cross-check the results. If every cross-checking Bloom filters produce negatives, the positive of the main Bloom filter can be determined as a false positive. The main Bloom filter is not necessarily large to reduce the false positive rate, since more numbers of the false positives of the main Bloom filter are identified by cross-checking Bloom filters. Simulation results show that the false positive of the proposed scheme converges to zero faster, while requiring the total memory size for Bloom filters smaller, than that of a single Bloom filter architecture.

A Generalized Bloom Filter to secure distributed network applications

2011

Distributed applications use Bloom filters to transmit large sets in a compact form. However, attackers can easily disrupt these applications by using or advertising saturated filters. In this paper we introduce the Generalized Bloom Filter (GBF), a space-efficient data structure to securely represent a set in distributed applications, such as IP traceback, web caching, and peer-to-peer networks. Different from the standard Bloom filter, the GBF has an upper bound on the false-positive probability, limiting the effect of these attacks. The key idea of the GBF is to not only set, but also reset bits of the filter at each insertion. This procedure limits the false positives at the expense of introducing false negatives in membership queries. We derive expressions for the false-positive and false-negative rates and show that they are both upper-bounded in the GBF. We conduct simulations that validate the derived expressions and explore the tradeoffs of this data structure.

On the false-positive rate of Bloom filters

Information Processing Letters, 2008

Bloom filters are a randomized data structure for membership queries dating back to 1970. Bloom filters sometimes give erroneous answers to queries, called false positives. Bloom analyzed the probability of such erroneous answers, called the false-positive rate, and Bloom's analysis has appeared in many publications throughout the years. We show that Bloom's analysis is incorrect and give a correct analysis.