Martin Dietzfelbinger - Academia.edu (original) (raw)

Papers by Martin Dietzfelbinger

arXiv (Cornell University), May 6, 2013

Theory of computing systems, Oct 4, 2014

Consider a random bipartite multigraph G with n left nodes and m ≥ n ≥ 2 right nodes. Each left n... more Consider a random bipartite multigraph G with n left nodes and m ≥ n ≥ 2 right nodes. Each left node x has dx ≥ 1 random right neighbors. The average left degree∆ is fixed,∆ ≥ 2. We ask whether for the probability that G has a left-perfect matching it is advantageous not to fix dx for each left node x but rather choose it at random according to some (cleverly chosen) distribution. We show the following, provided that the degrees of the left nodes are independent: If∆ is an integer then it is optimal to use a fixed degree of∆ for all left nodes. If∆ is non-integral then an optimal degree-distribution has the property that each left node x has two possible degrees, ∆ and ∆ , with probability px and 1 − px, respectively, where px is from the closed interval [0, 1] and the average over all px equals ∆ −∆. Furthermore, if n = c • m and ∆ > 2 is constant, then each distribution of the left degrees that meets the conditions above determines the same threshold c * (∆) that has the following property as n goes to infinity: If c < c * (∆) then there exists a left-perfect matching with high probability. If c > c * (∆) then there exists no left-perfect matching with high probability. The threshold c * (∆) is the same as the known threshold for offline k-ary cuckoo hashing for integral or non-integral k =∆. Research supported by DFG grant DI 412/10-2. 1 In the following we will use "matching" and "left-perfect matching" synonymously.

Combinatorics, Probability & Computing, Dec 18, 2009

arXiv (Cornell University), Feb 12, 2016

We present an average case analysis of two variants of dual-pivot quicksort, one with a non-algor... more We present an average case analysis of two variants of dual-pivot quicksort, one with a non-algorithmic comparison-optimal partitioning strategy, the other with a closely related algorithmic strategy. For both we calculate the expected number of comparisons exactly as well as asymptotically, in particular, we provide exact expressions for the linear, logarithmic, and constant terms. An essential step is the analysis of zeros of lattice paths in a certain probability model. Along the way a combinatorial identity is proven.

From 23.09.2007 to 28.09.2007, the Dagstuhl Seminar 07391 "Probabilistic Methods in the Desi... more From 23.09.2007 to 28.09.2007, the Dagstuhl Seminar 07391 "Probabilistic Methods in the Design and Analysis of Algorithms''was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. The seminar brought together leading researchers in probabilistic methods to strengthen and foster collaborations among various areas of Theoretical Computer Science. The interaction between researchers using randomization in algorithm design and researchers studying known algorithms and heuristics in probabilistic models enhanced the research of both groups in developing new complexity frameworks and in obtaining new algorithmic results. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general...

Dagstuhl Reports, 2017

This report documents the program and the topics discussed of the 4-day Dagstuhl Seminar 17181 &q... more This report documents the program and the topics discussed of the 4-day Dagstuhl Seminar 17181 "Theory and Applications of Hashing", which took place May 1-5, 2017. Four long and eighteen short talks covered a wide and diverse range of topics within the theme of the workshop. The program left sufficient space for informal discussions among the 40 participants.

ArXiv, 2018

Given a set XXX of nnn binary words of equal length www, the 3XOR problem asks for three elements... more Given a set XXX of nnn binary words of equal length www, the 3XOR problem asks for three elements a,b,cinXa, b, c \in Xa,b,cinX such that aoplusb=ca \oplus b=caoplusb=c, where $ \oplus$ denotes the bitwise XOR operation. The problem can be easily solved on a word RAM with word length www in time O(n2logn)O(n^2 \log{n})O(n2logn). Using Han's fast integer sorting algorithm (2002/2004) this can be reduced to O(n2loglogn)O(n^2 \log{\log{n}})O(n2loglogn). With randomization or a sophisticated deterministic dictionary construction, creating a hash table for XXX with constant lookup time leads to an algorithm with (expected) running time O(n2)O(n^2)O(n2). At present, seemingly no faster algorithms are known. We present a surprisingly simple deterministic, quadratic time algorithm for 3XOR. Its core is a version of the Patricia trie for XXX, which makes it possible to traverse the set aoplusXa \oplus XaoplusX in ascending order for arbitrary ain0,1wa\in \{0, 1\}^{w}ain0,1w in linear time. Furthermore, we describe a randomized algorithm for 3XOR with expected running time $O(n^2\cdot\m...

For a set U (the universe), retrieval is the following problem. Given a finite subset S ⊆ U of si... more For a set U (the universe), retrieval is the following problem. Given a finite subset S ⊆ U of size m and f : S → {0, 1} for a small constant r, build a data structure Df with the property that for a suitable query algorithm query we have query(Df , x) = f(x) for all x ∈ S. For x ∈ U \ S the value query(Df , x) is arbitrary in {0, 1}. The number of bits needed for Df should be (1 + ε)rm with overhead ε = ε(m) ≥ 0 as small as possible, while the query time should be small. Of course, the time for constructing Df is relevant as well. We assume fully random hash functions on U with constant evaluation time are available. It is known that with ε ≈ 0.09 one can achieve linear construction time and constant query time, and with overhead εk ≈ e−k it is possible to have O(k) query time and O(m1+α) construction time, for arbitrary α > 0. Furthermore, a theoretical construction with ε = O((log logm)/ √ logm) gives constant query time and linear construction time. Known constructions avoidi...

Combinatorics, Probability and Computing, 2018

We present an average-case analysis of a variant of dual-pivot quicksort. We show that the algori... more We present an average-case analysis of a variant of dual-pivot quicksort. We show that the algorithmic partitioning strategy used is optimal, that is, it minimizes the expected number of key comparisons. For the analysis, we calculate the expected number of comparisons exactly as well as asymptotically; in particular, we provide exact expressions for the linear, logarithmic and constant terms.An essential step is the analysis of zeros of lattice paths in a certain probability model. Along the way a combinatorial identity is proved.

ACM Transactions on Algorithms, 2016

Multi-Pivot Quicksort refers to variants of classical quicksort where in the partitioning step k ... more Multi-Pivot Quicksort refers to variants of classical quicksort where in the partitioning step k pivots are used to split the input into k + 1 segments. For many years, multi-pivot quicksort was regarded as impractical, but in 2009 a two-pivot approach by Yaroslavskiy, Bentley, and Bloch was chosen as the standard sorting algorithm in Sun’s Java 7. In 2014 at ALENEX, Kushagra et al. introduced an even faster algorithm that uses three pivots. This article studies what possible advantages multi-pivot quicksort might offer in general. The contributions are as follows: Natural comparison-optimal algorithms for multi-pivot quicksort are devised and analyzed. The analysis shows that the benefits of using multiple pivots with respect to the average comparison count are marginal and these strategies are inferior to simpler strategies such as the well-known median-of- k approach. A substantial part of the partitioning cost is caused by rearranging elements. A rigorous analysis of an algorith...

Symposium on Theoretical Aspects of Computer Science, Dec 18, 2008

Theory of Computing Systems, 2014

Lecture Notes in Computer Science, 2011

Although cuckoo hashing has significant applications in both theoretical and practical settings, ... more Although cuckoo hashing has significant applications in both theoretical and practical settings, a relevant downside is that it requires lookups to multiple locations. In many settings, where lookups are expensive, cuckoo hashing becomes a less compelling alternative. One such standard setting is when memory is arranged in large pages, and a major cost is the number of page accesses. We propose the study of cuckoo hashing with pages, advocating approaches where each key has several possible locations, or cells, on a single page, and additional choices on a second backup page. We show experimentally that with k cell choices on one page and a single backup cell choice, one can achieve nearly the same loads as when each key has k + 1 random cells to choose from, with most lookups requiring just one page access, even when keys are placed online using a simple algorithm. While our results are currently experimental, they suggest several interesting new open theoretical questions for cuckoo hashing with pages.

Automata, Languages and Programming, 2010

Lecture Notes in Computer Science, 2013

Lecture Notes in Computer Science

The retrieval problem is the problem of associating data with keys in a set. Formally, the data s... more The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f : U → {0, 1} r that has specified values on the elements of a given set S ⊆ U , |S| = n, but may have any value on elements outside S. All known methods (e. g. those based on perfect hash functions), induce a space overhead of Θ(n) bits over the optimum, regardless of the evaluation time. We show that for any k, query time O(k) can be achieved using space that is within a factor 1 + e −k of optimal, asymptotically for large n. The time to construct the data structure is O(n), expected. If we allow logarithmic evaluation time, the additive overhead can be reduced to O(log log n) bits whp. A general reduction transfers the results on retrieval into analogous results on approximate membership, a problem traditionally addressed using Bloom filters. Thus we obtain space bounds arbitrarily close to the lower bound for this problem as well. The evaluation procedures of our data structures are extremely simple. For the results stated above we assume free access to fully random hash functions. This assumption can be justified using space o(n) to simulate full randomness on a RAM.

Combinatorics, Probability and Computing, 2009

We analyse a simple random process in which a token is moved in the interval A = {0, . . ., n}. F... more We analyse a simple random process in which a token is moved in the interval A = {0, . . ., n}. Fix a probability distribution μ over D = {1, . . ., n}. Initially, the token is placed in a random position in A. In round t, a random step sized is chosen according to μ. If the token is in position x ≥ d, then it is moved to position x − d. Otherwise it stays put. Let TX be the number of rounds until the token reaches position 0. We show tight bounds for the expectation Eμ(TX) of TX for varying distributions μ. More precisely, we show that minmuEmu(TX)=Thetabigl((logn)2bigr)\min_\mu\{\E_\mu(T_X)\}=\Theta\bigl((\log n)^2\bigr)minmuEmu(TX)=Thetabigl((logn)2bigr). The same bounds are proved for the analogous continuous process, where step sizes and token positions are real values in [0, n + 1), and one measures the time until the token has reached a point in [0, 1). For the proofs, a novel potential function argument is introduced. The research is motivated by the problem of approximating the minimum of a continuous function over [0, 1] with a ‘blind’ opt...

ArXiv, 2016

We study randomness properties of graphs and hypergraphs generated by simple hash functions. Seve... more We study randomness properties of graphs and hypergraphs generated by simple hash functions. Several hashing applications can be analyzed by studying the structure of ddd-uniform random ($d$-partite) hypergraphs obtained from a set SSS of nnn keys and ddd randomly chosen hash functions h1,dots,hdh_1,\dots,h_dh1,dots,hd by associating each key xinSx\in SxinS with a hyperedge h1(x),dots,hd(x)\{h_1(x),\dots, h_d(x)\}h1(x),dots,hd(x). Often it is assumed that h1,dots,hdh_1,\dots,h_dh1,dots,hd exhibit a high degree of independence. We present a simple construction of a hash class whose hash functions have small constant evaluation time and can be stored in sublinear space. We devise general techniques to analyze the randomness properties of the graphs and hypergraphs generated by these hash functions, and we show that they can replace other, less efficient constructions in cuckoo hashing (with and without stash), the simulation of a uniform hash function, the construction of a perfect hash function, generalized cuckoo hashing and different load balancing sce...

Stochastic Algorithms: Foundations and Applications

A minimal perfect hash function h for a set S ⊆ U of size n is a function h\colon U®&lt;/... more A minimal perfect hash function h for a set S ⊆ U of size n is a function h\colon U®&lt;/font &gt; {0,¼&lt;/font &gt;,n-&lt;/font &gt;1}h\colon U\to \{0,\ldots,n-1\} that is one-to-one on S. The complexity measures of interest are storage space for h, evaluation time (which should be constant), and construction time. The talk gives an overview of several recent randomized constructions of minimal

We describe a new family of kkk-uniform hypergraphs with independent random edges. The hypergraph... more We describe a new family of kkk-uniform hypergraphs with independent random edges. The hypergraphs have a high probability of being peelable, i.e. to admit no sub-hypergraph of minimum degree 222, even when the edge density (number of edges over vertices) is close to 111. In our construction, the vertex set is partitioned into linearly arranged segments and each edge is incident to random vertices of kkk consecutive segments. Quite surprisingly, the linear geometry allows our graphs to be peeled "from the outside in". The density thresholds fkf_kfk for peelability of our hypergraphs ($f_3 \approx 0.918$, f4approx0.977f_4 \approx 0.977f4approx0.977, f5approx0.992f_5 \approx 0.992f5approx0.992, ...) are well beyond the corresponding thresholds ($c_3 \approx 0.818$, c4approx0.772c_4 \approx 0.772c4approx0.772, c5approx0.702c_5 \approx 0.702c5approx0.702, ...) of standard kkk-uniform random hypergraphs. To get a grip on fkf_kfk, we analyse an idealised peeling process on the random weak limit of our hypergraph family. The process can be described in terms of an operator on f...