Pavel Veselý - Academia.edu (original) (raw)
Papers by Pavel Veselý
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
Estimating the distribution and quantiles of data is a foundational task in data mining and data ... more Estimating the distribution and quantiles of data is a foundational task in data mining and data science. We study algorithms which provide accurate results for extreme quantile queries using a small amount of space, thus helping to understand the tails of the input distribution. Namely, we focus on two recent state-of-the-art solutions:-digest and ReqSketch. While-digest is a popular compact summary which works well in a variety of settings, ReqSketch comes with formal accuracy guarantees at the cost of its size growing as new observations are inserted. In this work, we provide insight into which conditions make one preferable to the other. Namely, we show how to construct inputs for-digest that induce an almost arbitrarily large error and demonstrate that it fails to provide accurate results even on i.i.d. samples from a highly non-uniform distribution. We propose practical improvements to ReqSketch, making it faster than-digest, while its error stays bounded on any instance. Still, our results confirm that-digest remains more accurate on the "non-adversarial" data encountered in practice. CCS CONCEPTS • Theory of computation → Sketching and sampling; Streaming models; Data structures and algorithms for data management.
Theory of Computing Systems
Problems involving the efficient arrangement of simple objects, as captured by bin packing and ma... more Problems involving the efficient arrangement of simple objects, as captured by bin packing and makespan scheduling, are fundamental tasks in combinatorial optimization. These are well understood in the traditional online and offline cases, but have been less well-studied when the volume of the input is truly massive, and cannot even be read into memory. This is captured by the streaming model of computation, where the aim is to approximate the cost of the solution in one pass over the data, using small space. As a result, streaming algorithms produce concise input summaries that approximately preserve the optimum value. We design the first efficient streaming algorithms for these fundamental problems in combinatorial optimization. For Bin Packing, we provide a streaming asymptotic (1 + ε)-approximation with widetildeO\widetilde {O}widetildeO O ~ left(frac1varepsilonright)\left (\frac {1}{\varepsilon }\right )left(frac1varepsilonright) 1 ε , where widetildeO\widetilde {{{O}}}widetildeO O ~ hides logarithmic factors. Moreover, such a space bound is essentially optimal....
Annals of Operations Research
In Packet Scheduling with Adversarial Jamming, packets of arbitrary sizes arrive over time to be ... more In Packet Scheduling with Adversarial Jamming, packets of arbitrary sizes arrive over time to be transmitted over a channel in which instantaneous jamming errors occur at times chosen by the adversary and not known to the algorithm. The transmission taking place at the time of jamming is corrupt, and the algorithm learns this fact immediately. An online algorithm maximizes the total size of packets it successfully transmits and the goal is to develop an algorithm with the lowest possible asymptotic competitive ratio, where the additive constant may depend on packet sizes. Our main contribution is a universal algorithm that works for any speedup and packet sizes and, unlike previous algorithms for the problem, it does not need to know these parameters in advance. We show that this algorithm guarantees 1-competitiveness with speedup 4, making it the first known algorithm to maintain 1-competitiveness with a moderate speedup in the general setting of arbitrary packet sizes. We also prove a lower bound of φ + 1 ≈ 2.618 on the speedup of any 1-competitive deterministic algorithm, showing that our algorithm is close to the optimum. Additionally, we formulate a general framework for analyzing our algorithm locally and use it to show upper bounds on its competitive ratio for speedups in [1, 4) and for several special cases, recovering some previously known results, each of which had a dedicated proof. In particular, our algorithm is 3-competitive without speedup, matching both the (worst-case) performance of the algorithm by Jurdzinski et al.
Journal of Scheduling
Online Bin Stretching is a semi-online variant of bin packing in which the algorithm has to use t... more Online Bin Stretching is a semi-online variant of bin packing in which the algorithm has to use the same number of bins as an optimal packing, but is allowed to slightly overpack the bins. The goal is to minimize the amount of overpacking, i.e., the maximum size packed into any bin. We give an algorithm for Online Bin Stretching with a stretching factor of 11/8 = 1.375 for three bins. Additionally, we present a lower bound of 45/33 = 1.36 for Online Bin Stretching on three bins and a lower bound of 19/14 for four and five bins that were discovered using a computer search.
Journal of Combinatorial Optimization
Online Bin Stretching is a semi-online variant of bin packing in which the algorithm has to use t... more Online Bin Stretching is a semi-online variant of bin packing in which the algorithm has to use the same number of bins as an optimal packing, but is allowed to slightly overpack the bins. The goal is to minimize the amount of overpacking, i.e., the maximum size packed into any bin. We give an algorithm for Online Bin Stretching with a stretching factor of 1.5 for any number of bins. We build on previous algorithms and use a two-phase approach. However, our analysis is technically more complicated and uses amortization over the bins with the help of two weight functions. History. Online Bin Stretching has been proposed by Azar and Regev [3,4]. The original lower bound of 4/3 for three bins has appeared even before that, in [16], Supported by the project 14-10003S of GAČR and by the GAUK project 548214.
Journal of Combinatorial Optimization
Lecture Notes in Computer Science, 2016
In the online graph coloring problem, vertices from a graph G, known in advance, arrive in an onl... more In the online graph coloring problem, vertices from a graph G, known in advance, arrive in an online fashion and an algorithm must immediately assign a color to each incoming vertex v so that the revealed graph is properly colored. The exact location of v in the graph G is not known to the algorithm, since it sees only previously colored neighbors of v. The online chromatic number of G is the smallest number of colors such that some online algorithm is able to properly color G for any incoming order. We prove that computing the online chromatic number of a graph is PSPACE-complete.
Lecture Notes in Computer Science, 2015
In the Colored Bin Packing problem a sequence of items of sizes up to 1 arrives to be packed into... more In the Colored Bin Packing problem a sequence of items of sizes up to 1 arrives to be packed into bins of unit capacity. Each item has one of c ≥ 2 colors and an additional constraint is that we cannot pack two items of the same color next to each other in the same bin. The objective is to minimize the number of bins. In the important special case when all items have size zero, we characterize the optimal value to be equal to color discrepancy. As our main result, we give an (asymptotically) 1.5-competitive algorithm which is optimal. In fact, the algorithm always uses at most 1.5 • OPT bins and we show a matching lower bound of 1.5 • OPT for any value of OPT ≥ 2. In particular, the absolute ratio of our algorithm is 5/3 and this is optimal. For items of unrestricted sizes we give a lower bound of 2.5 and an absolutely 3.5competitive algorithm. When the items have sizes at most 1/d for a real d ≥ 2 the asymptotic competitive ratio is 1.5 + d/(d − 1). We also show that classical algorithms First Fit, Best Fit and Worst Fit are not constant competitive, which holds already for three colors and small items. In the case of two colors-the Black and White Bin Packing problem-we prove that all Any Fit algorithms have the absolute competitive ratio 3. When the items have sizes at most 1/d for a real d ≥ 2 we show that the Worst Fit algorithm is absolutely (1 + d/(d − 1))-competitive.
Lecture Notes in Computer Science, 2015
Online Bin Stretching is a semi-online variant of bin packing in which the algorithm has to use t... more Online Bin Stretching is a semi-online variant of bin packing in which the algorithm has to use the same number of bins as the optimal packing, but is allowed to slightly overpack the bins. The goal is to minimize the amount of overpacking, i.e., the maximum size packed into any bin. We give an algorithm for Online Bin Stretching with a stretching factor of 1.5 for any number of bins. We also show a specialized algorithm for three bins with a stretching factor of 11/8 = 1.375.
Communications in Computer and Information Science, 2014
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
Estimating the distribution and quantiles of data is a foundational task in data mining and data ... more Estimating the distribution and quantiles of data is a foundational task in data mining and data science. We study algorithms which provide accurate results for extreme quantile queries using a small amount of space, thus helping to understand the tails of the input distribution. Namely, we focus on two recent state-of-the-art solutions:-digest and ReqSketch. While-digest is a popular compact summary which works well in a variety of settings, ReqSketch comes with formal accuracy guarantees at the cost of its size growing as new observations are inserted. In this work, we provide insight into which conditions make one preferable to the other. Namely, we show how to construct inputs for-digest that induce an almost arbitrarily large error and demonstrate that it fails to provide accurate results even on i.i.d. samples from a highly non-uniform distribution. We propose practical improvements to ReqSketch, making it faster than-digest, while its error stays bounded on any instance. Still, our results confirm that-digest remains more accurate on the "non-adversarial" data encountered in practice. CCS CONCEPTS • Theory of computation → Sketching and sampling; Streaming models; Data structures and algorithms for data management.
Theory of Computing Systems
Problems involving the efficient arrangement of simple objects, as captured by bin packing and ma... more Problems involving the efficient arrangement of simple objects, as captured by bin packing and makespan scheduling, are fundamental tasks in combinatorial optimization. These are well understood in the traditional online and offline cases, but have been less well-studied when the volume of the input is truly massive, and cannot even be read into memory. This is captured by the streaming model of computation, where the aim is to approximate the cost of the solution in one pass over the data, using small space. As a result, streaming algorithms produce concise input summaries that approximately preserve the optimum value. We design the first efficient streaming algorithms for these fundamental problems in combinatorial optimization. For Bin Packing, we provide a streaming asymptotic (1 + ε)-approximation with widetildeO\widetilde {O}widetildeO O ~ left(frac1varepsilonright)\left (\frac {1}{\varepsilon }\right )left(frac1varepsilonright) 1 ε , where widetildeO\widetilde {{{O}}}widetildeO O ~ hides logarithmic factors. Moreover, such a space bound is essentially optimal....
Annals of Operations Research
In Packet Scheduling with Adversarial Jamming, packets of arbitrary sizes arrive over time to be ... more In Packet Scheduling with Adversarial Jamming, packets of arbitrary sizes arrive over time to be transmitted over a channel in which instantaneous jamming errors occur at times chosen by the adversary and not known to the algorithm. The transmission taking place at the time of jamming is corrupt, and the algorithm learns this fact immediately. An online algorithm maximizes the total size of packets it successfully transmits and the goal is to develop an algorithm with the lowest possible asymptotic competitive ratio, where the additive constant may depend on packet sizes. Our main contribution is a universal algorithm that works for any speedup and packet sizes and, unlike previous algorithms for the problem, it does not need to know these parameters in advance. We show that this algorithm guarantees 1-competitiveness with speedup 4, making it the first known algorithm to maintain 1-competitiveness with a moderate speedup in the general setting of arbitrary packet sizes. We also prove a lower bound of φ + 1 ≈ 2.618 on the speedup of any 1-competitive deterministic algorithm, showing that our algorithm is close to the optimum. Additionally, we formulate a general framework for analyzing our algorithm locally and use it to show upper bounds on its competitive ratio for speedups in [1, 4) and for several special cases, recovering some previously known results, each of which had a dedicated proof. In particular, our algorithm is 3-competitive without speedup, matching both the (worst-case) performance of the algorithm by Jurdzinski et al.
Journal of Scheduling
Online Bin Stretching is a semi-online variant of bin packing in which the algorithm has to use t... more Online Bin Stretching is a semi-online variant of bin packing in which the algorithm has to use the same number of bins as an optimal packing, but is allowed to slightly overpack the bins. The goal is to minimize the amount of overpacking, i.e., the maximum size packed into any bin. We give an algorithm for Online Bin Stretching with a stretching factor of 11/8 = 1.375 for three bins. Additionally, we present a lower bound of 45/33 = 1.36 for Online Bin Stretching on three bins and a lower bound of 19/14 for four and five bins that were discovered using a computer search.
Journal of Combinatorial Optimization
Online Bin Stretching is a semi-online variant of bin packing in which the algorithm has to use t... more Online Bin Stretching is a semi-online variant of bin packing in which the algorithm has to use the same number of bins as an optimal packing, but is allowed to slightly overpack the bins. The goal is to minimize the amount of overpacking, i.e., the maximum size packed into any bin. We give an algorithm for Online Bin Stretching with a stretching factor of 1.5 for any number of bins. We build on previous algorithms and use a two-phase approach. However, our analysis is technically more complicated and uses amortization over the bins with the help of two weight functions. History. Online Bin Stretching has been proposed by Azar and Regev [3,4]. The original lower bound of 4/3 for three bins has appeared even before that, in [16], Supported by the project 14-10003S of GAČR and by the GAUK project 548214.
Journal of Combinatorial Optimization
Lecture Notes in Computer Science, 2016
In the online graph coloring problem, vertices from a graph G, known in advance, arrive in an onl... more In the online graph coloring problem, vertices from a graph G, known in advance, arrive in an online fashion and an algorithm must immediately assign a color to each incoming vertex v so that the revealed graph is properly colored. The exact location of v in the graph G is not known to the algorithm, since it sees only previously colored neighbors of v. The online chromatic number of G is the smallest number of colors such that some online algorithm is able to properly color G for any incoming order. We prove that computing the online chromatic number of a graph is PSPACE-complete.
Lecture Notes in Computer Science, 2015
In the Colored Bin Packing problem a sequence of items of sizes up to 1 arrives to be packed into... more In the Colored Bin Packing problem a sequence of items of sizes up to 1 arrives to be packed into bins of unit capacity. Each item has one of c ≥ 2 colors and an additional constraint is that we cannot pack two items of the same color next to each other in the same bin. The objective is to minimize the number of bins. In the important special case when all items have size zero, we characterize the optimal value to be equal to color discrepancy. As our main result, we give an (asymptotically) 1.5-competitive algorithm which is optimal. In fact, the algorithm always uses at most 1.5 • OPT bins and we show a matching lower bound of 1.5 • OPT for any value of OPT ≥ 2. In particular, the absolute ratio of our algorithm is 5/3 and this is optimal. For items of unrestricted sizes we give a lower bound of 2.5 and an absolutely 3.5competitive algorithm. When the items have sizes at most 1/d for a real d ≥ 2 the asymptotic competitive ratio is 1.5 + d/(d − 1). We also show that classical algorithms First Fit, Best Fit and Worst Fit are not constant competitive, which holds already for three colors and small items. In the case of two colors-the Black and White Bin Packing problem-we prove that all Any Fit algorithms have the absolute competitive ratio 3. When the items have sizes at most 1/d for a real d ≥ 2 we show that the Worst Fit algorithm is absolutely (1 + d/(d − 1))-competitive.
Lecture Notes in Computer Science, 2015
Online Bin Stretching is a semi-online variant of bin packing in which the algorithm has to use t... more Online Bin Stretching is a semi-online variant of bin packing in which the algorithm has to use the same number of bins as the optimal packing, but is allowed to slightly overpack the bins. The goal is to minimize the amount of overpacking, i.e., the maximum size packed into any bin. We give an algorithm for Online Bin Stretching with a stretching factor of 1.5 for any number of bins. We also show a specialized algorithm for three bins with a stretching factor of 11/8 = 1.375.
Communications in Computer and Information Science, 2014