Lower bounds for union-split-find related problems on random access machines (original) (raw)

Succinct Dynamic Ordered Sets with Random Access

ArXiv, 2020

The representation of a dynamic ordered set of nnn integer keys drawn from a universe of size mmm is a fundamental data structuring problem. Many solutions to this problem achieve optimal time but take polynomial space, therefore preserving time optimality in the \emph{compressed} space regime is the problem we address in this work. For a polynomial universe m=nTheta(1)m = n^{\Theta(1)}m=nTheta(1), we give a solution that takes textsfEF(n,m)+o(n)\textsf{EF}(n,m) + o(n)textsfEF(n,m)+o(n) bits, where textsfEF(n,m)leqnlceillog_2(m/n)rceil+2n\textsf{EF}(n,m) \leq n\lceil \log_2(m/n)\rceil + 2ntextsfEF(n,m)leqnlceillog_2(m/n)rceil+2n is the cost in bits of the \emph{Elias-Fano} representation of the set, and supports random access to the iii-th smallest element in O(logn/loglogn)O(\log n/ \log\log n)O(logn/loglogn) time, updates and predecessor search in O(loglogn)O(\log\log n)O(loglogn) time. These time bounds are optimal.

Simpler Analyses of Union-Find

arXiv (Cornell University), 2023

We analyze union-find using potential functions motivated by continuous algorithms, and give alternate proofs of the O(log log n), O(log * n), O(log * * n), and O(α(n)) amortized cost upper bounds. The proof of the O(log log n) amortized bound goes as follows. Let each node's potential be the square root of its size, i.e., the size of the subtree rooted from it. The overall potential increase is O(n) because the node sizes increase geometrically along any tree path. When compressing a path, each node on the path satisfies that either its potential decreases by Ω(1), or its child's size along the path is less than the square root of its size: this can happen at most O(log log n) times along any tree path.

A lower bound for set intersection queries

We consider the following set intersection reporting problem. We have a collection of initially empty sets and would like to process an intermixed sequence of n updates (insertions into and deletions from individual sets) and q queries (reporting the intersection of two sets). We cast this problem in the arithmetic model of computation ofFredman [Fre81] and Yao [Yao85] and show that any algorithm that fits in this model must take time O(q+nytq) to process a sequence of n updates and q queries, ignoring factors that are polynomial in log n.We also show that this bound is tight in this model of computation, agam to within a polynomial in log n factbr, improving upon a result of Yellin (YeI92]. Furthermore we consider the case q = O(n) with an additional space restriction. We only allow to use m memory locations, where m :5 n 3 / 2 • We show• a tight bound of 0(n 2 /m 1 / 3) for a sequence of O(n) operations, agam ignoring polynomial in logn factors.

The Average Case Complexity of the Parallel Prefix Problem

1994

We analyse the average case complexity of evaluating all prefixes of an input vector over a given semigroup. As computational model circuits over the semigroup are used and a complexity measure for the average delay of such circuits, called time, is introduced. Based on this notion, we then define the average case complexity of a computational problem for arbitrary input distributions. For highly nonuniform distributions the average case complexity turns out to be as large as the worst case complexity. Thus, in order to make the average case analysis meaningful we also develop a complexity measure for distributions. Using this framework we show that two n-bit numbers can be added with an average delay of order log log n for a large class of distributions. We then give a complete characterization of the average case complexity of the parallel prefix problem with respect to the underlying semigroup. By considering a related reachability problem for finite automata it is shown that the complexity only depends on a property of the semigroup we will call a confluence. Our analysis yields that only two different cases can arise for the reachability question. We show that the parallel prefix problem either can be solved with an average delay of order log log n, that means with an exponential speedup compared to the worst case, or in case of nonconfluent semigroups that no speedup is possible. Circuit designs are presented that for confluent semigroups achieve the optimal double logarithmic delay while keeping the circuit size linear. The analysis and results are illustrated at some concrete functions. For the n-ary Boolean OR, THRESHOLD and PARITY, for example, the average case circuit delay is determined exactly up to small constant factors for arbitrary distributions. Finally, we determine the complexity of the reachability problem itself and show that it is at most quadratic in the size of the semigroup.

A Generalization of a Lower Bound Technique due to Fredman and Saks

Algorithmica, 2001

In a seminal paper of 1989, Fredman and Saks proved lower bounds for some important datastructure problems in the cell probe model. In particular, lower bounds were established on worst-case and amortized operation cost for the union-find problem and the prefix sum problem. The goal of this paper is to turn their proof technique into a general tool that can be applied to different problems and computational models. To this end we define two quantities: Output Variability depends only on the model of computation. It indicates how much variation can be found in the results of a program with certain resource bounds. This measures in some sense the power of a model. Problem Variability characterizes in a similar sense the difficulty of the problem. Our Main Theorem shows that by comparing a model's output variability to a problem's problem variability, lower bounds on the complexity of solving the problem on the given model may be inferred. The theorem thus shows how to separate the analysis of the model of computation from that of the problem when proving lower bounds. We show how the results of Fredman and Saks fit our framework by computing the output variability of the cell probe model and the problem variability for problems considered in their paper. This allows us to reprove their lower bound results, and slightly extend them. The main purpose of this paper though is to present the generalized technique.

THE MAXIMUM SIZE OF DYNAMIC DATA STRUCTURES

This paper develops two probabilistic methods that allow the analysis of the maximum data structure size encountered during a sequence of insertions and deletions in data structures such as priority queues, dictionaries, linear lists, and symbol tables, and in sweepline structures for geometry and Very-Large-Scale-Integration (VLSI) applications. The notion of the "maximum" is basic to issues of resource prealloca-tion. The methods here are applied to combinatorial models of file histories and probabilistic models, as well as to a non-Markovian process (algorithm) for processing sweepline information in an efficient way, called "hashing with lazy deletion" (HwLD). Expressions are derived for the expected maximum data structure size that are asymptotically exact, that is, correct up to lower-order terms; in several cases of interest the expected value of the maximum size is asymptotically equal to the maximum expected size. This solves several open problems, including longstanding questions in queueing theory. Both of these approaches are robust and rely upon novel applications of techniques from the analysis of algorithms. At a high level, the first method isolates the primary contribution to the maximum and bounds the lesser effects. In the second technique the continuous-time probabilistic model is related to its discrete analog-the maximum slot occupancy in hashing.

The set union problem with dynamic weighted backtracking

BIT, 1991

We consider an extension of the set union problem, in which dynamic weighted backtracking over sequences of unions is permitted. We present a new data structure which can support each operation in O(logn) time in the worst case. We prove that this bound is tight for pointer based algorithms. Furthermore, we design a different data structure to achieve better amortized bounds. The space complexity of both our data structures is O(n). Motivations for studying this problem arise in logic programming memory management.

Memory management for union-find algorithms

1997

We provide a general tool to improve the real time performance of a broad class of Union-Find algorithms. This is done by minimizing the random access memory that is used and thus to avoid the well-known yon Neumann bottleneck of synchronizing CPU and memory. A main application to image segmentation algorithms is demonstrated where the real time performance is drastically improved.

On the computational complexity of dynamic graph problems

Theoretical Computer Science, 1996

A common way to evaluate the time complexity of an algorithm is to use asymptotic worst-case analysis and to express the cost of the computation as a function of the size of the input. However, for an incremental algorithm this kind of analysis is sometimes not very informative. (By an "incremental algorithm," we mean an algorithm for a dynamic problem.) When the cost of the computation is expressed as a function of the size of the (current) input, several incremental algorithms that have been proposed run in time asymptotically no better, in the worst-case, than the time required to perform the computation from scratch. Unfortunately, this kind of information is not very helpful if one wishes to compare different incremental algorithms for a given problem.