Daniel Prusa | Czech Technical University in Prague (original) (raw)

Papers by Daniel Prusa

Research paper thumbnail of Two-dimensional pattern matching against local and regular-like picture languages

Theoretical Computer Science, May 1, 2021

Abstract Given a two-dimensional array of symbols and a picture language over a finite alphabet, ... more Abstract Given a two-dimensional array of symbols and a picture language over a finite alphabet, we investigate how to find rectangular subarrays that belong to the picture language. Two-dimensional pattern matching problems can be formulated by interpreting subarrays as matches and picture languages as patterns. We formulate four particular problems – finding maximum, minimum, any or all match(es) – and describe algorithms solving the problems for basic classes of picture languages, which include local picture languages and picture languages accepted by various types of deterministic two-dimensional finite automata such as four-way, three-way and two-way automata, on-line tessellation automata, and returning finite automata. We also prove that the pattern matching problems cannot be solved for the class of local picture languages in time linear in the input area unless the well known problem of triangle finding in a graph is solvable in quadratic time. This shows a fundamental difference between the complexity of one-dimensional and two-dimensional pattern matching.

Research paper thumbnail of ストローク再構成によるオフラインフローチャートの認識とオンライン認識技術の使用【Powered by NICT】

IEEE Conference Proceedings, 2016

Research paper thumbnail of Review of: "SAT is as hard as solving Homogeneous Diophantine Equation of Degree Two

Research paper thumbnail of (Un)decidability of the Emptiness Problem for Multi-dimensional Context-Free Grammars

Lecture Notes in Computer Science, 2015

We study how dimensionality and form of context-free productions affect the power of multi-dimens... more We study how dimensionality and form of context-free productions affect the power of multi-dimensional context-free grammars over unary alphabets. Attention is paid to the emptiness decision problem. It is an open question whether or not it is decidable for two-dimensional Kolam type context-free grammars of Siromoney. We show that the undecidability can be proved in the three-dimensional setting. For the two-dimensional variant, we present several results revealing that the process of generating is still much more complex than that one of the classical one-dimensional context-free grammar.

Research paper thumbnail of Non-recursive trade-offs between two-dimensional automata and grammars

Theoretical Computer Science, 2016

We study succinctness of descriptional systems for picture languages. Basic models of twodimensio... more We study succinctness of descriptional systems for picture languages. Basic models of twodimensional finite automata and generalizations of context-free grammars are considered. They include the four-way automaton of Blum and Hewitt, the two-dimensional online tessellation automaton of Inoue and Nakamura and the context-free Kolam grammar of Siromoney et al. We show that non-recursive trade-offs between the systems are very common. Basically, each separation result proving that one system describes a picture language which cannot be described by another system can usually be turned into a nonrecursive trade-off result between the systems. These findings are strongly based on the ability of the systems to simulate Turing machines.

Research paper thumbnail of Reject option models comprising out-of-distribution detection

arXiv (Cornell University), Jul 11, 2023

The optimal prediction strategy for out-of-distribution (OOD) setups is a fundamental question in... more The optimal prediction strategy for out-of-distribution (OOD) setups is a fundamental question in machine learning. In this paper, we address this question and present several contributions. We propose three reject option models for OOD setups: the Cost-based model, the Bounded TPR-FPR model, and the Bounded Precision-Recall model. These models extend the standard reject option models used in non-OOD setups and define the notion of an optimal OOD selective classifier. We establish that all the proposed models, despite their different formulations, share a common class of optimal strategies. Motivated by the optimal strategy, we introduce double-score OOD methods that leverage uncertainty scores from two chosen OOD detectors: one focused on OOD/ID discrimination and the other on misclassification detection. The experimental results consistently demonstrate the superior performance of this simple strategy compared to state-of-the-art methods. Additionally, we propose novel evaluation metrics derived from the definition of the optimal strategy under the proposed OOD rejection models. These new metrics provide a comprehensive and reliable assessment of OOD methods without the deficiencies observed in existing evaluation approaches. Preprint. Under review.

Research paper thumbnail of On a class of rational functions for pictures

Research paper thumbnail of LP Relaxations of Some NP-Hard Problems Are as Hard as Any LP

We show that solving linear programming (LP) relaxations of many classical NP-hard combinatorial ... more We show that solving linear programming (LP) relaxations of many classical NP-hard combinatorial optimization problems is as hard as solving the general LP problem. Precisely, the general LP can be reduced in linear time to the LP relaxation of each of these problems. This result poses a fundamental limitation for designing efficient algorithms to solve the LP relaxations, because finding such an algorithm might improve the complexity of best known algorithms for the general LP. Besides linear-time reductions, we show that the LP relaxations of the considered problems are P-complete under log-space reduction, therefore also hard to parallelize.

Research paper thumbnail of Dynamics of the Independence Number and Automata Synchronization

Lecture Notes in Computer Science, 2018

We study the lengths of synchronizing words produced by the classical greedy compression algorith... more We study the lengths of synchronizing words produced by the classical greedy compression algorithm. We associate a sequence of graphs with every synchronizing automaton and rely on evolution of the independence number to bound the lengths of produced words. By leveraging graph theoretical results we show that in many cases automata with good extension properties have good compression properties as well. More precisely, we show that the compression algorithm will produce a synchronizing word of length \(\mathcal {O}(n^2 \log (n))\) on cyclic, regular and strongly-transitive automata with n states, which is not far from the best possible bound of \((n-1)^2\). Furthermore, we provide a relatively simple proof that every n-state automaton has a synchronizing word of length at most \(\frac{n^3}{4} + \mathcal {O}(n^2)\).

Research paper thumbnail of Graph-based simplex method for pairwise energy minimization with binary variables

Research paper thumbnail of Two-Dimensional Pattern Matching Against Basic Picture Languages

Lecture Notes in Computer Science, 2019

Given a two-dimensional array of symbols and a picture language over a finite alphabet, we study ... more Given a two-dimensional array of symbols and a picture language over a finite alphabet, we study the problem of finding rectangular subarrays of the array that belong to the picture language. We formulate four particular problems – finding maximum, minimum, any or all match(es) – and describe algorithms solving them for basic classes of picture languages, including local picture languages and picture languages accepted by deterministic on-line tessellation automata or deterministic four-way finite automata. We also prove that the matching problems cannot be solved for the class of local picture languages in linear time unless the problem of triangle finding is solvable in quadratic time. This shows there is a fundamental difference in the pattern matching complexity regarding the one-dimensional and two-dimensional setting.

Research paper thumbnail of LP Relaxation of the Potts Labeling Problem Is as Hard as Any Linear Program

IEEE Transactions on Pattern Analysis and Machine Intelligence, Jul 1, 2017

In our recent work, we showed that solving the LP relaxation of the pairwise min-sum labeling pro... more In our recent work, we showed that solving the LP relaxation of the pairwise min-sum labeling problem (also known as MAP inference in graphical models or discrete energy minimization) is not much easier than solving any linear program. Precisely, the general linear program reduces in linear time (assuming the Turing model of computation) to the LP relaxation of the min-sum labeling problem. The reduction is possible, though in quadratic time, even to the min-sum labeling problem with planar structure. Here we prove similar results for the pairwise min-sum labeling problem with attractive Potts interactions (also known as the uniform metric labeling problem).

Research paper thumbnail of Two-Dimensional Sgraffito Automata

Springer eBooks, 2012

ABSTRACT We present a new model of a two-dimensional computing device called Sgraffito auto maton... more ABSTRACT We present a new model of a two-dimensional computing device called Sgraffito auto maton. In general, the model is quite simple, which allows a clear design of computations. When restricte d to one-dimensional inputs, that is, strings, the Sgraffito automaton does not exceed the power of finite- state automata. On the other hand, for two-dimensional inputs, it yields a family of picture languages with good closure properties that strictly includes the class REC? of recognizable picture languages. The deterministic Sgraffito automata define a class of picture languages that includes the class of deterministic recognizable picture languages DREC, the class of picture languages that are accepted by four-way a lternating automata, those that are accepted by deterministic one-marker automata, and the sudoku-determini stically recognizable picture languages, but the membership problem for the accepted languages is still deci dable in polynomial time. In addition, the deterministic Sgraffito automata accept some unary picture languages that are outside of the class REC.

Research paper thumbnail of Consistent and Tractable Algorithm for Markov Network Learning

Lecture Notes in Computer Science, 2023

Research paper thumbnail of On Separations of LR(0)-Grammars by Two Types of Pumping Patterns

ITAT, 2021

We present two types of pumping patterns that allow a total separation inside the class of LR(0)g... more We present two types of pumping patterns that allow a total separation inside the class of LR(0)grammars. Using the same type of pumping patterns, we obtain a total separation inside of linear LR(0)-grammars. This type of study has a long-term motivation from computational linguistics and the area of syntactic error localization. A recent motivation also comes from the field of formal models of neural networks.

Research paper thumbnail of Solvers for Mathematical Word Problems in Czech

ITAT, 2020

We study the task of an automatic evaluation of mathematical word problems, which belongs to the ... more We study the task of an automatic evaluation of mathematical word problems, which belongs to the category of natural language processing and has become popular in recent years. Since all the so far published methods were developed for inputs in English, our goal is to review them and propose solutions able to cope with inputs in the Czech language. We face the question whether we can achieve a competitive accuracy for a natural language with flexible word order, and with the assumption that only a relatively small dataset of training and testing data is available. We propose and evaluate two methods. One relies on a rule-based processing of dependency trees computed by UDPipe. The other method builds on machine learning. It transforms word problems into numeric vectors and trains SVM to classify them. We also show that it improves in a combination with a search for predefined sequences of words and word classes, achieving 75% accuracy on our dataset of 500 Czech word problems.

Research paper thumbnail of Two-Way Automata and One-Tape Machines

Lecture Notes in Computer Science, 2018

It is well-known that one-tape Turing machines working in linear time are no more powerful than f... more It is well-known that one-tape Turing machines working in linear time are no more powerful than finite automata, namely they recognize exactly the class of regular languages. We study the costs, in terms of description sizes, of the conversion of nondeterministic finite automata into equivalent linear-time one-tape deterministic machines. We prove a polynomial blowup from two-way nondeterministic finite automata into equivalent weight-reducing one-tape deterministic machines that work in linear time. The blowup remains polynomial if the tape in the resulting machines is restricted to the portion which initially contains the input. However, in this case the machines resulting from our construction are not weight reducing, unless the input alphabet is unary.

Research paper thumbnail of New Results on Deterministic Sgraffito Automata

Lecture Notes in Computer Science, 2013

ABSTRACT The deterministic sgraffito automaton is a two-dimensional computing device that allows ... more ABSTRACT The deterministic sgraffito automaton is a two-dimensional computing device that allows a clear and simple design of important computations. The family of picture languages it accepts has many nice closure properties, but when restricted to one-row inputs (that is, strings), this family collapses to the class of regular languages. Here we compare the deterministic sgraffito automaton to some other two-dimensional models: the two-dimensional deterministic forgetting automaton, the four-way alternating automaton and the sudoku-deterministically recognizable picture languages. In addition, we prove that deterministic sgraffito automata accept some unary picture languages that are outside the class REC of recognizable picture languages.

Research paper thumbnail of Recognizing Off-Line Flowcharts by Reconstructing Strokes and Using On-Line Recognition Techniques

We experiment with off-line recognition of handwritten flowcharts based on strokes reconstruction... more We experiment with off-line recognition of handwritten flowcharts based on strokes reconstruction and our state-of-the-art on-line diagram recognizer. A simple baseline algorithm for strokes reconstruction is presented and necessary modifications of the original recognizer are identified. We achieve very promising results on a flowcharts database created as an extension of our previously published on-line database.

Research paper thumbnail of Weight-reducing Turing machines

Information & Computation, Jun 1, 2023

* This work contains, in an extended form, some material and results which were previously presen... more * This work contains, in an extended form, some material and results which were previously presented in a preliminary form in conference papers [10] and [2].

Research paper thumbnail of Two-dimensional pattern matching against local and regular-like picture languages

Theoretical Computer Science, May 1, 2021

Abstract Given a two-dimensional array of symbols and a picture language over a finite alphabet, ... more Abstract Given a two-dimensional array of symbols and a picture language over a finite alphabet, we investigate how to find rectangular subarrays that belong to the picture language. Two-dimensional pattern matching problems can be formulated by interpreting subarrays as matches and picture languages as patterns. We formulate four particular problems – finding maximum, minimum, any or all match(es) – and describe algorithms solving the problems for basic classes of picture languages, which include local picture languages and picture languages accepted by various types of deterministic two-dimensional finite automata such as four-way, three-way and two-way automata, on-line tessellation automata, and returning finite automata. We also prove that the pattern matching problems cannot be solved for the class of local picture languages in time linear in the input area unless the well known problem of triangle finding in a graph is solvable in quadratic time. This shows a fundamental difference between the complexity of one-dimensional and two-dimensional pattern matching.

Research paper thumbnail of ストローク再構成によるオフラインフローチャートの認識とオンライン認識技術の使用【Powered by NICT】

IEEE Conference Proceedings, 2016

Research paper thumbnail of Review of: "SAT is as hard as solving Homogeneous Diophantine Equation of Degree Two

Research paper thumbnail of (Un)decidability of the Emptiness Problem for Multi-dimensional Context-Free Grammars

Lecture Notes in Computer Science, 2015

We study how dimensionality and form of context-free productions affect the power of multi-dimens... more We study how dimensionality and form of context-free productions affect the power of multi-dimensional context-free grammars over unary alphabets. Attention is paid to the emptiness decision problem. It is an open question whether or not it is decidable for two-dimensional Kolam type context-free grammars of Siromoney. We show that the undecidability can be proved in the three-dimensional setting. For the two-dimensional variant, we present several results revealing that the process of generating is still much more complex than that one of the classical one-dimensional context-free grammar.

Research paper thumbnail of Non-recursive trade-offs between two-dimensional automata and grammars

Theoretical Computer Science, 2016

We study succinctness of descriptional systems for picture languages. Basic models of twodimensio... more We study succinctness of descriptional systems for picture languages. Basic models of twodimensional finite automata and generalizations of context-free grammars are considered. They include the four-way automaton of Blum and Hewitt, the two-dimensional online tessellation automaton of Inoue and Nakamura and the context-free Kolam grammar of Siromoney et al. We show that non-recursive trade-offs between the systems are very common. Basically, each separation result proving that one system describes a picture language which cannot be described by another system can usually be turned into a nonrecursive trade-off result between the systems. These findings are strongly based on the ability of the systems to simulate Turing machines.

Research paper thumbnail of Reject option models comprising out-of-distribution detection

arXiv (Cornell University), Jul 11, 2023

The optimal prediction strategy for out-of-distribution (OOD) setups is a fundamental question in... more The optimal prediction strategy for out-of-distribution (OOD) setups is a fundamental question in machine learning. In this paper, we address this question and present several contributions. We propose three reject option models for OOD setups: the Cost-based model, the Bounded TPR-FPR model, and the Bounded Precision-Recall model. These models extend the standard reject option models used in non-OOD setups and define the notion of an optimal OOD selective classifier. We establish that all the proposed models, despite their different formulations, share a common class of optimal strategies. Motivated by the optimal strategy, we introduce double-score OOD methods that leverage uncertainty scores from two chosen OOD detectors: one focused on OOD/ID discrimination and the other on misclassification detection. The experimental results consistently demonstrate the superior performance of this simple strategy compared to state-of-the-art methods. Additionally, we propose novel evaluation metrics derived from the definition of the optimal strategy under the proposed OOD rejection models. These new metrics provide a comprehensive and reliable assessment of OOD methods without the deficiencies observed in existing evaluation approaches. Preprint. Under review.

Research paper thumbnail of On a class of rational functions for pictures

Research paper thumbnail of LP Relaxations of Some NP-Hard Problems Are as Hard as Any LP

We show that solving linear programming (LP) relaxations of many classical NP-hard combinatorial ... more We show that solving linear programming (LP) relaxations of many classical NP-hard combinatorial optimization problems is as hard as solving the general LP problem. Precisely, the general LP can be reduced in linear time to the LP relaxation of each of these problems. This result poses a fundamental limitation for designing efficient algorithms to solve the LP relaxations, because finding such an algorithm might improve the complexity of best known algorithms for the general LP. Besides linear-time reductions, we show that the LP relaxations of the considered problems are P-complete under log-space reduction, therefore also hard to parallelize.

Research paper thumbnail of Dynamics of the Independence Number and Automata Synchronization

Lecture Notes in Computer Science, 2018

We study the lengths of synchronizing words produced by the classical greedy compression algorith... more We study the lengths of synchronizing words produced by the classical greedy compression algorithm. We associate a sequence of graphs with every synchronizing automaton and rely on evolution of the independence number to bound the lengths of produced words. By leveraging graph theoretical results we show that in many cases automata with good extension properties have good compression properties as well. More precisely, we show that the compression algorithm will produce a synchronizing word of length \(\mathcal {O}(n^2 \log (n))\) on cyclic, regular and strongly-transitive automata with n states, which is not far from the best possible bound of \((n-1)^2\). Furthermore, we provide a relatively simple proof that every n-state automaton has a synchronizing word of length at most \(\frac{n^3}{4} + \mathcal {O}(n^2)\).

Research paper thumbnail of Graph-based simplex method for pairwise energy minimization with binary variables

Research paper thumbnail of Two-Dimensional Pattern Matching Against Basic Picture Languages

Lecture Notes in Computer Science, 2019

Given a two-dimensional array of symbols and a picture language over a finite alphabet, we study ... more Given a two-dimensional array of symbols and a picture language over a finite alphabet, we study the problem of finding rectangular subarrays of the array that belong to the picture language. We formulate four particular problems – finding maximum, minimum, any or all match(es) – and describe algorithms solving them for basic classes of picture languages, including local picture languages and picture languages accepted by deterministic on-line tessellation automata or deterministic four-way finite automata. We also prove that the matching problems cannot be solved for the class of local picture languages in linear time unless the problem of triangle finding is solvable in quadratic time. This shows there is a fundamental difference in the pattern matching complexity regarding the one-dimensional and two-dimensional setting.

Research paper thumbnail of LP Relaxation of the Potts Labeling Problem Is as Hard as Any Linear Program

IEEE Transactions on Pattern Analysis and Machine Intelligence, Jul 1, 2017

In our recent work, we showed that solving the LP relaxation of the pairwise min-sum labeling pro... more In our recent work, we showed that solving the LP relaxation of the pairwise min-sum labeling problem (also known as MAP inference in graphical models or discrete energy minimization) is not much easier than solving any linear program. Precisely, the general linear program reduces in linear time (assuming the Turing model of computation) to the LP relaxation of the min-sum labeling problem. The reduction is possible, though in quadratic time, even to the min-sum labeling problem with planar structure. Here we prove similar results for the pairwise min-sum labeling problem with attractive Potts interactions (also known as the uniform metric labeling problem).

Research paper thumbnail of Two-Dimensional Sgraffito Automata

Springer eBooks, 2012

ABSTRACT We present a new model of a two-dimensional computing device called Sgraffito auto maton... more ABSTRACT We present a new model of a two-dimensional computing device called Sgraffito auto maton. In general, the model is quite simple, which allows a clear design of computations. When restricte d to one-dimensional inputs, that is, strings, the Sgraffito automaton does not exceed the power of finite- state automata. On the other hand, for two-dimensional inputs, it yields a family of picture languages with good closure properties that strictly includes the class REC? of recognizable picture languages. The deterministic Sgraffito automata define a class of picture languages that includes the class of deterministic recognizable picture languages DREC, the class of picture languages that are accepted by four-way a lternating automata, those that are accepted by deterministic one-marker automata, and the sudoku-determini stically recognizable picture languages, but the membership problem for the accepted languages is still deci dable in polynomial time. In addition, the deterministic Sgraffito automata accept some unary picture languages that are outside of the class REC.

Research paper thumbnail of Consistent and Tractable Algorithm for Markov Network Learning

Lecture Notes in Computer Science, 2023

Research paper thumbnail of On Separations of LR(0)-Grammars by Two Types of Pumping Patterns

ITAT, 2021

We present two types of pumping patterns that allow a total separation inside the class of LR(0)g... more We present two types of pumping patterns that allow a total separation inside the class of LR(0)grammars. Using the same type of pumping patterns, we obtain a total separation inside of linear LR(0)-grammars. This type of study has a long-term motivation from computational linguistics and the area of syntactic error localization. A recent motivation also comes from the field of formal models of neural networks.

Research paper thumbnail of Solvers for Mathematical Word Problems in Czech

ITAT, 2020

We study the task of an automatic evaluation of mathematical word problems, which belongs to the ... more We study the task of an automatic evaluation of mathematical word problems, which belongs to the category of natural language processing and has become popular in recent years. Since all the so far published methods were developed for inputs in English, our goal is to review them and propose solutions able to cope with inputs in the Czech language. We face the question whether we can achieve a competitive accuracy for a natural language with flexible word order, and with the assumption that only a relatively small dataset of training and testing data is available. We propose and evaluate two methods. One relies on a rule-based processing of dependency trees computed by UDPipe. The other method builds on machine learning. It transforms word problems into numeric vectors and trains SVM to classify them. We also show that it improves in a combination with a search for predefined sequences of words and word classes, achieving 75% accuracy on our dataset of 500 Czech word problems.

Research paper thumbnail of Two-Way Automata and One-Tape Machines

Lecture Notes in Computer Science, 2018

It is well-known that one-tape Turing machines working in linear time are no more powerful than f... more It is well-known that one-tape Turing machines working in linear time are no more powerful than finite automata, namely they recognize exactly the class of regular languages. We study the costs, in terms of description sizes, of the conversion of nondeterministic finite automata into equivalent linear-time one-tape deterministic machines. We prove a polynomial blowup from two-way nondeterministic finite automata into equivalent weight-reducing one-tape deterministic machines that work in linear time. The blowup remains polynomial if the tape in the resulting machines is restricted to the portion which initially contains the input. However, in this case the machines resulting from our construction are not weight reducing, unless the input alphabet is unary.

Research paper thumbnail of New Results on Deterministic Sgraffito Automata

Lecture Notes in Computer Science, 2013

ABSTRACT The deterministic sgraffito automaton is a two-dimensional computing device that allows ... more ABSTRACT The deterministic sgraffito automaton is a two-dimensional computing device that allows a clear and simple design of important computations. The family of picture languages it accepts has many nice closure properties, but when restricted to one-row inputs (that is, strings), this family collapses to the class of regular languages. Here we compare the deterministic sgraffito automaton to some other two-dimensional models: the two-dimensional deterministic forgetting automaton, the four-way alternating automaton and the sudoku-deterministically recognizable picture languages. In addition, we prove that deterministic sgraffito automata accept some unary picture languages that are outside the class REC of recognizable picture languages.

Research paper thumbnail of Recognizing Off-Line Flowcharts by Reconstructing Strokes and Using On-Line Recognition Techniques

We experiment with off-line recognition of handwritten flowcharts based on strokes reconstruction... more We experiment with off-line recognition of handwritten flowcharts based on strokes reconstruction and our state-of-the-art on-line diagram recognizer. A simple baseline algorithm for strokes reconstruction is presented and necessary modifications of the original recognizer are identified. We achieve very promising results on a flowcharts database created as an extension of our previously published on-line database.

Research paper thumbnail of Weight-reducing Turing machines

Information & Computation, Jun 1, 2023

* This work contains, in an extended form, some material and results which were previously presen... more * This work contains, in an extended form, some material and results which were previously presented in a preliminary form in conference papers [10] and [2].