Succinct Data Structures Research Papers (original) (raw)
Given a sequence S of n symbols over some alphabet Σ of size σ, we develop new compression methods that are (i) very simple to implement; (ii) provide O(1) time random access to any symbol (or short substring) of the original sequence.... more
Given a sequence S of n symbols over some alphabet Σ of size σ, we develop new compression methods that are (i) very simple to implement; (ii) provide O(1) time random access to any symbol (or short substring) of the original sequence. Our simplest solution uses at most 2h + o(h) bits of space, where h = n(H0(S) + 1), and H0(S) is the zeroth-order empirical entropy of S. We discuss a number of improvements and trade-offs over the basic method. For example, we can achieve n(Hk(S)+1)+o(n(Hk(S)+1)) bits of space, for k = o(log σ(n)). Several applications are discussed, including text compression, (compressed) full-text indexing and string matching.
We present a framework to dynamize succinct data structures, to encourage their use over non-succinct versions in a wide variety of important application areas. Our framework can dynamize most state-of-the-art succinct data structures for... more
We present a framework to dynamize succinct data structures, to encourage their use over non-succinct versions in a wide variety of important application areas. Our framework can dynamize most state-of-the-art succinct data structures for dictionaries, ordinal trees, labeled trees, and text collections. Of particular note is its direct application to XML indexing structures that answer subpath queries [2]. Our framework focuses on achieving information-theoretically optimal space along with near-optimal update/query bounds. As the main part of our work, we consider the following problem central to text indexing: Given a text T over an alphabet Σ, construct a compressed data structure answering the queries char(i), rank s (i), and select s (i) for a symbol s ∈ Σ. Many data structures consider these queries for static text T [5,3,16,4]. We build on these results and give the best known query bounds for the dynamic version of this problem, supporting arbitrary insertions and deletions of symbols in T. Specifically, with an amortized update time of O(n ε ), any static succinct data structure D for T, taking t(n) time for queries, can be converted by our framework into a dynamic succinct data structure that supports rank s (i), select s (i), and char(i) queries in O(t(n) + loglogn) time, for any constant ε> 0. When |Σ| = polylog(n), we achieve O(1) query times. Our update/query bounds are near-optimal with respect to the lower bounds from [13].