Daniel Prusa | Czech Technical University in Prague (original) (raw)
Papers by Daniel Prusa
Theoretical Computer Science, May 1, 2021
Abstract Given a two-dimensional array of symbols and a picture language over a finite alphabet, ... more Abstract Given a two-dimensional array of symbols and a picture language over a finite alphabet, we investigate how to find rectangular subarrays that belong to the picture language. Two-dimensional pattern matching problems can be formulated by interpreting subarrays as matches and picture languages as patterns. We formulate four particular problems – finding maximum, minimum, any or all match(es) – and describe algorithms solving the problems for basic classes of picture languages, which include local picture languages and picture languages accepted by various types of deterministic two-dimensional finite automata such as four-way, three-way and two-way automata, on-line tessellation automata, and returning finite automata. We also prove that the pattern matching problems cannot be solved for the class of local picture languages in time linear in the input area unless the well known problem of triangle finding in a graph is solvable in quadratic time. This shows a fundamental difference between the complexity of one-dimensional and two-dimensional pattern matching.
IEEE Conference Proceedings, 2016
Lecture Notes in Computer Science, 2015
We study how dimensionality and form of context-free productions affect the power of multi-dimens... more We study how dimensionality and form of context-free productions affect the power of multi-dimensional context-free grammars over unary alphabets. Attention is paid to the emptiness decision problem. It is an open question whether or not it is decidable for two-dimensional Kolam type context-free grammars of Siromoney. We show that the undecidability can be proved in the three-dimensional setting. For the two-dimensional variant, we present several results revealing that the process of generating is still much more complex than that one of the classical one-dimensional context-free grammar.
Theoretical Computer Science, 2016
We study succinctness of descriptional systems for picture languages. Basic models of twodimensio... more We study succinctness of descriptional systems for picture languages. Basic models of twodimensional finite automata and generalizations of context-free grammars are considered. They include the four-way automaton of Blum and Hewitt, the two-dimensional online tessellation automaton of Inoue and Nakamura and the context-free Kolam grammar of Siromoney et al. We show that non-recursive trade-offs between the systems are very common. Basically, each separation result proving that one system describes a picture language which cannot be described by another system can usually be turned into a nonrecursive trade-off result between the systems. These findings are strongly based on the ability of the systems to simulate Turing machines.
arXiv (Cornell University), Jul 11, 2023
The optimal prediction strategy for out-of-distribution (OOD) setups is a fundamental question in... more The optimal prediction strategy for out-of-distribution (OOD) setups is a fundamental question in machine learning. In this paper, we address this question and present several contributions. We propose three reject option models for OOD setups: the Cost-based model, the Bounded TPR-FPR model, and the Bounded Precision-Recall model. These models extend the standard reject option models used in non-OOD setups and define the notion of an optimal OOD selective classifier. We establish that all the proposed models, despite their different formulations, share a common class of optimal strategies. Motivated by the optimal strategy, we introduce double-score OOD methods that leverage uncertainty scores from two chosen OOD detectors: one focused on OOD/ID discrimination and the other on misclassification detection. The experimental results consistently demonstrate the superior performance of this simple strategy compared to state-of-the-art methods. Additionally, we propose novel evaluation metrics derived from the definition of the optimal strategy under the proposed OOD rejection models. These new metrics provide a comprehensive and reliable assessment of OOD methods without the deficiencies observed in existing evaluation approaches. Preprint. Under review.
We show that solving linear programming (LP) relaxations of many classical NP-hard combinatorial ... more We show that solving linear programming (LP) relaxations of many classical NP-hard combinatorial optimization problems is as hard as solving the general LP problem. Precisely, the general LP can be reduced in linear time to the LP relaxation of each of these problems. This result poses a fundamental limitation for designing efficient algorithms to solve the LP relaxations, because finding such an algorithm might improve the complexity of best known algorithms for the general LP. Besides linear-time reductions, we show that the LP relaxations of the considered problems are P-complete under log-space reduction, therefore also hard to parallelize.
Lecture Notes in Computer Science, 2018
We study the lengths of synchronizing words produced by the classical greedy compression algorith... more We study the lengths of synchronizing words produced by the classical greedy compression algorithm. We associate a sequence of graphs with every synchronizing automaton and rely on evolution of the independence number to bound the lengths of produced words. By leveraging graph theoretical results we show that in many cases automata with good extension properties have good compression properties as well. More precisely, we show that the compression algorithm will produce a synchronizing word of length \(\mathcal {O}(n^2 \log (n))\) on cyclic, regular and strongly-transitive automata with n states, which is not far from the best possible bound of \((n-1)^2\). Furthermore, we provide a relatively simple proof that every n-state automaton has a synchronizing word of length at most \(\frac{n^3}{4} + \mathcal {O}(n^2)\).
Lecture Notes in Computer Science, 2019
Given a two-dimensional array of symbols and a picture language over a finite alphabet, we study ... more Given a two-dimensional array of symbols and a picture language over a finite alphabet, we study the problem of finding rectangular subarrays of the array that belong to the picture language. We formulate four particular problems – finding maximum, minimum, any or all match(es) – and describe algorithms solving them for basic classes of picture languages, including local picture languages and picture languages accepted by deterministic on-line tessellation automata or deterministic four-way finite automata. We also prove that the matching problems cannot be solved for the class of local picture languages in linear time unless the problem of triangle finding is solvable in quadratic time. This shows there is a fundamental difference in the pattern matching complexity regarding the one-dimensional and two-dimensional setting.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Jul 1, 2017
In our recent work, we showed that solving the LP relaxation of the pairwise min-sum labeling pro... more In our recent work, we showed that solving the LP relaxation of the pairwise min-sum labeling problem (also known as MAP inference in graphical models or discrete energy minimization) is not much easier than solving any linear program. Precisely, the general linear program reduces in linear time (assuming the Turing model of computation) to the LP relaxation of the min-sum labeling problem. The reduction is possible, though in quadratic time, even to the min-sum labeling problem with planar structure. Here we prove similar results for the pairwise min-sum labeling problem with attractive Potts interactions (also known as the uniform metric labeling problem).
Springer eBooks, 2012
ABSTRACT We present a new model of a two-dimensional computing device called Sgraffito auto maton... more ABSTRACT We present a new model of a two-dimensional computing device called Sgraffito auto maton. In general, the model is quite simple, which allows a clear design of computations. When restricte d to one-dimensional inputs, that is, strings, the Sgraffito automaton does not exceed the power of finite- state automata. On the other hand, for two-dimensional inputs, it yields a family of picture languages with good closure properties that strictly includes the class REC? of recognizable picture languages. The deterministic Sgraffito automata define a class of picture languages that includes the class of deterministic recognizable picture languages DREC, the class of picture languages that are accepted by four-way a lternating automata, those that are accepted by deterministic one-marker automata, and the sudoku-determini stically recognizable picture languages, but the membership problem for the accepted languages is still deci dable in polynomial time. In addition, the deterministic Sgraffito automata accept some unary picture languages that are outside of the class REC.
Lecture Notes in Computer Science, 2023
ITAT, 2021
We present two types of pumping patterns that allow a total separation inside the class of LR(0)g... more We present two types of pumping patterns that allow a total separation inside the class of LR(0)grammars. Using the same type of pumping patterns, we obtain a total separation inside of linear LR(0)-grammars. This type of study has a long-term motivation from computational linguistics and the area of syntactic error localization. A recent motivation also comes from the field of formal models of neural networks.
ITAT, 2020
We study the task of an automatic evaluation of mathematical word problems, which belongs to the ... more We study the task of an automatic evaluation of mathematical word problems, which belongs to the category of natural language processing and has become popular in recent years. Since all the so far published methods were developed for inputs in English, our goal is to review them and propose solutions able to cope with inputs in the Czech language. We face the question whether we can achieve a competitive accuracy for a natural language with flexible word order, and with the assumption that only a relatively small dataset of training and testing data is available. We propose and evaluate two methods. One relies on a rule-based processing of dependency trees computed by UDPipe. The other method builds on machine learning. It transforms word problems into numeric vectors and trains SVM to classify them. We also show that it improves in a combination with a search for predefined sequences of words and word classes, achieving 75% accuracy on our dataset of 500 Czech word problems.
Lecture Notes in Computer Science, 2018
It is well-known that one-tape Turing machines working in linear time are no more powerful than f... more It is well-known that one-tape Turing machines working in linear time are no more powerful than finite automata, namely they recognize exactly the class of regular languages. We study the costs, in terms of description sizes, of the conversion of nondeterministic finite automata into equivalent linear-time one-tape deterministic machines. We prove a polynomial blowup from two-way nondeterministic finite automata into equivalent weight-reducing one-tape deterministic machines that work in linear time. The blowup remains polynomial if the tape in the resulting machines is restricted to the portion which initially contains the input. However, in this case the machines resulting from our construction are not weight reducing, unless the input alphabet is unary.
Lecture Notes in Computer Science, 2013
ABSTRACT The deterministic sgraffito automaton is a two-dimensional computing device that allows ... more ABSTRACT The deterministic sgraffito automaton is a two-dimensional computing device that allows a clear and simple design of important computations. The family of picture languages it accepts has many nice closure properties, but when restricted to one-row inputs (that is, strings), this family collapses to the class of regular languages. Here we compare the deterministic sgraffito automaton to some other two-dimensional models: the two-dimensional deterministic forgetting automaton, the four-way alternating automaton and the sudoku-deterministically recognizable picture languages. In addition, we prove that deterministic sgraffito automata accept some unary picture languages that are outside the class REC of recognizable picture languages.
We experiment with off-line recognition of handwritten flowcharts based on strokes reconstruction... more We experiment with off-line recognition of handwritten flowcharts based on strokes reconstruction and our state-of-the-art on-line diagram recognizer. A simple baseline algorithm for strokes reconstruction is presented and necessary modifications of the original recognizer are identified. We achieve very promising results on a flowcharts database created as an extension of our previously published on-line database.
Information & Computation, Jun 1, 2023
* This work contains, in an extended form, some material and results which were previously presen... more * This work contains, in an extended form, some material and results which were previously presented in a preliminary form in conference papers [10] and [2].
Theoretical Computer Science, May 1, 2021
Abstract Given a two-dimensional array of symbols and a picture language over a finite alphabet, ... more Abstract Given a two-dimensional array of symbols and a picture language over a finite alphabet, we investigate how to find rectangular subarrays that belong to the picture language. Two-dimensional pattern matching problems can be formulated by interpreting subarrays as matches and picture languages as patterns. We formulate four particular problems – finding maximum, minimum, any or all match(es) – and describe algorithms solving the problems for basic classes of picture languages, which include local picture languages and picture languages accepted by various types of deterministic two-dimensional finite automata such as four-way, three-way and two-way automata, on-line tessellation automata, and returning finite automata. We also prove that the pattern matching problems cannot be solved for the class of local picture languages in time linear in the input area unless the well known problem of triangle finding in a graph is solvable in quadratic time. This shows a fundamental difference between the complexity of one-dimensional and two-dimensional pattern matching.
IEEE Conference Proceedings, 2016
Lecture Notes in Computer Science, 2015
We study how dimensionality and form of context-free productions affect the power of multi-dimens... more We study how dimensionality and form of context-free productions affect the power of multi-dimensional context-free grammars over unary alphabets. Attention is paid to the emptiness decision problem. It is an open question whether or not it is decidable for two-dimensional Kolam type context-free grammars of Siromoney. We show that the undecidability can be proved in the three-dimensional setting. For the two-dimensional variant, we present several results revealing that the process of generating is still much more complex than that one of the classical one-dimensional context-free grammar.
Theoretical Computer Science, 2016
We study succinctness of descriptional systems for picture languages. Basic models of twodimensio... more We study succinctness of descriptional systems for picture languages. Basic models of twodimensional finite automata and generalizations of context-free grammars are considered. They include the four-way automaton of Blum and Hewitt, the two-dimensional online tessellation automaton of Inoue and Nakamura and the context-free Kolam grammar of Siromoney et al. We show that non-recursive trade-offs between the systems are very common. Basically, each separation result proving that one system describes a picture language which cannot be described by another system can usually be turned into a nonrecursive trade-off result between the systems. These findings are strongly based on the ability of the systems to simulate Turing machines.
arXiv (Cornell University), Jul 11, 2023
The optimal prediction strategy for out-of-distribution (OOD) setups is a fundamental question in... more The optimal prediction strategy for out-of-distribution (OOD) setups is a fundamental question in machine learning. In this paper, we address this question and present several contributions. We propose three reject option models for OOD setups: the Cost-based model, the Bounded TPR-FPR model, and the Bounded Precision-Recall model. These models extend the standard reject option models used in non-OOD setups and define the notion of an optimal OOD selective classifier. We establish that all the proposed models, despite their different formulations, share a common class of optimal strategies. Motivated by the optimal strategy, we introduce double-score OOD methods that leverage uncertainty scores from two chosen OOD detectors: one focused on OOD/ID discrimination and the other on misclassification detection. The experimental results consistently demonstrate the superior performance of this simple strategy compared to state-of-the-art methods. Additionally, we propose novel evaluation metrics derived from the definition of the optimal strategy under the proposed OOD rejection models. These new metrics provide a comprehensive and reliable assessment of OOD methods without the deficiencies observed in existing evaluation approaches. Preprint. Under review.
We show that solving linear programming (LP) relaxations of many classical NP-hard combinatorial ... more We show that solving linear programming (LP) relaxations of many classical NP-hard combinatorial optimization problems is as hard as solving the general LP problem. Precisely, the general LP can be reduced in linear time to the LP relaxation of each of these problems. This result poses a fundamental limitation for designing efficient algorithms to solve the LP relaxations, because finding such an algorithm might improve the complexity of best known algorithms for the general LP. Besides linear-time reductions, we show that the LP relaxations of the considered problems are P-complete under log-space reduction, therefore also hard to parallelize.
Lecture Notes in Computer Science, 2018
We study the lengths of synchronizing words produced by the classical greedy compression algorith... more We study the lengths of synchronizing words produced by the classical greedy compression algorithm. We associate a sequence of graphs with every synchronizing automaton and rely on evolution of the independence number to bound the lengths of produced words. By leveraging graph theoretical results we show that in many cases automata with good extension properties have good compression properties as well. More precisely, we show that the compression algorithm will produce a synchronizing word of length \(\mathcal {O}(n^2 \log (n))\) on cyclic, regular and strongly-transitive automata with n states, which is not far from the best possible bound of \((n-1)^2\). Furthermore, we provide a relatively simple proof that every n-state automaton has a synchronizing word of length at most \(\frac{n^3}{4} + \mathcal {O}(n^2)\).
Lecture Notes in Computer Science, 2019
Given a two-dimensional array of symbols and a picture language over a finite alphabet, we study ... more Given a two-dimensional array of symbols and a picture language over a finite alphabet, we study the problem of finding rectangular subarrays of the array that belong to the picture language. We formulate four particular problems – finding maximum, minimum, any or all match(es) – and describe algorithms solving them for basic classes of picture languages, including local picture languages and picture languages accepted by deterministic on-line tessellation automata or deterministic four-way finite automata. We also prove that the matching problems cannot be solved for the class of local picture languages in linear time unless the problem of triangle finding is solvable in quadratic time. This shows there is a fundamental difference in the pattern matching complexity regarding the one-dimensional and two-dimensional setting.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Jul 1, 2017
In our recent work, we showed that solving the LP relaxation of the pairwise min-sum labeling pro... more In our recent work, we showed that solving the LP relaxation of the pairwise min-sum labeling problem (also known as MAP inference in graphical models or discrete energy minimization) is not much easier than solving any linear program. Precisely, the general linear program reduces in linear time (assuming the Turing model of computation) to the LP relaxation of the min-sum labeling problem. The reduction is possible, though in quadratic time, even to the min-sum labeling problem with planar structure. Here we prove similar results for the pairwise min-sum labeling problem with attractive Potts interactions (also known as the uniform metric labeling problem).
Springer eBooks, 2012
ABSTRACT We present a new model of a two-dimensional computing device called Sgraffito auto maton... more ABSTRACT We present a new model of a two-dimensional computing device called Sgraffito auto maton. In general, the model is quite simple, which allows a clear design of computations. When restricte d to one-dimensional inputs, that is, strings, the Sgraffito automaton does not exceed the power of finite- state automata. On the other hand, for two-dimensional inputs, it yields a family of picture languages with good closure properties that strictly includes the class REC? of recognizable picture languages. The deterministic Sgraffito automata define a class of picture languages that includes the class of deterministic recognizable picture languages DREC, the class of picture languages that are accepted by four-way a lternating automata, those that are accepted by deterministic one-marker automata, and the sudoku-determini stically recognizable picture languages, but the membership problem for the accepted languages is still deci dable in polynomial time. In addition, the deterministic Sgraffito automata accept some unary picture languages that are outside of the class REC.
Lecture Notes in Computer Science, 2023
ITAT, 2021
We present two types of pumping patterns that allow a total separation inside the class of LR(0)g... more We present two types of pumping patterns that allow a total separation inside the class of LR(0)grammars. Using the same type of pumping patterns, we obtain a total separation inside of linear LR(0)-grammars. This type of study has a long-term motivation from computational linguistics and the area of syntactic error localization. A recent motivation also comes from the field of formal models of neural networks.
ITAT, 2020
We study the task of an automatic evaluation of mathematical word problems, which belongs to the ... more We study the task of an automatic evaluation of mathematical word problems, which belongs to the category of natural language processing and has become popular in recent years. Since all the so far published methods were developed for inputs in English, our goal is to review them and propose solutions able to cope with inputs in the Czech language. We face the question whether we can achieve a competitive accuracy for a natural language with flexible word order, and with the assumption that only a relatively small dataset of training and testing data is available. We propose and evaluate two methods. One relies on a rule-based processing of dependency trees computed by UDPipe. The other method builds on machine learning. It transforms word problems into numeric vectors and trains SVM to classify them. We also show that it improves in a combination with a search for predefined sequences of words and word classes, achieving 75% accuracy on our dataset of 500 Czech word problems.
Lecture Notes in Computer Science, 2018
It is well-known that one-tape Turing machines working in linear time are no more powerful than f... more It is well-known that one-tape Turing machines working in linear time are no more powerful than finite automata, namely they recognize exactly the class of regular languages. We study the costs, in terms of description sizes, of the conversion of nondeterministic finite automata into equivalent linear-time one-tape deterministic machines. We prove a polynomial blowup from two-way nondeterministic finite automata into equivalent weight-reducing one-tape deterministic machines that work in linear time. The blowup remains polynomial if the tape in the resulting machines is restricted to the portion which initially contains the input. However, in this case the machines resulting from our construction are not weight reducing, unless the input alphabet is unary.
Lecture Notes in Computer Science, 2013
ABSTRACT The deterministic sgraffito automaton is a two-dimensional computing device that allows ... more ABSTRACT The deterministic sgraffito automaton is a two-dimensional computing device that allows a clear and simple design of important computations. The family of picture languages it accepts has many nice closure properties, but when restricted to one-row inputs (that is, strings), this family collapses to the class of regular languages. Here we compare the deterministic sgraffito automaton to some other two-dimensional models: the two-dimensional deterministic forgetting automaton, the four-way alternating automaton and the sudoku-deterministically recognizable picture languages. In addition, we prove that deterministic sgraffito automata accept some unary picture languages that are outside the class REC of recognizable picture languages.
We experiment with off-line recognition of handwritten flowcharts based on strokes reconstruction... more We experiment with off-line recognition of handwritten flowcharts based on strokes reconstruction and our state-of-the-art on-line diagram recognizer. A simple baseline algorithm for strokes reconstruction is presented and necessary modifications of the original recognizer are identified. We achieve very promising results on a flowcharts database created as an extension of our previously published on-line database.
Information & Computation, Jun 1, 2023
* This work contains, in an extended form, some material and results which were previously presen... more * This work contains, in an extended form, some material and results which were previously presented in a preliminary form in conference papers [10] and [2].