Andrea Asperti | Università di Bologna (original) (raw)
Papers by Andrea Asperti
ACM Transactions on Computational Logic, Jun 16, 2015
We address computational complexity writing polymorphic functions between finite types (i.e., typ... more We address computational complexity writing polymorphic functions between finite types (i.e., types with a finite number of canonical elements), expressing costs in terms of the cardinality of these types. This allows us to rediscover, in a more syntactical setting, the known result that the different levels in the hierarchy of higher-order primitive recursive functions (Gödel system T), when interpreted over finite structures, precisely capture basic complexity classes: functions of rank 1 characterize LOGSPACE, rank 2 PTIME, rank 3 PSPACE, rank 4 EXPTIME = DTIME(2 poly ), and so on.
Edizioni ETS eBooks, 2018
Il settore della verifica automatica si occupa della elaborazione e validazione mediante elaborat... more Il settore della verifica automatica si occupa della elaborazione e validazione mediante elaboratori elettronici di certificati di correttezza. Gli strumenti in oggetto, chiamati abitualmente proof assistants o interactive provers, forniscono un ambiente interattivo per la costruzione di certificati formali la cui correttezza pu\uf2 essere determinata in modo completamente automatico. Questi strumenti hanno applicazioni sia in campo matematico, dove i certificati sono prove di teoremi, sia in campo informatico, dove i certificati argomentano la correttezza di un determinato software rispetto ad una sua specifica
Logic in Computer Science, 2013
A new \u201cinductive\u201d approach to standardization for the lambda-calculus has been recently... more A new \u201cinductive\u201d approach to standardization for the lambda-calculus has been recently introduced by Xi, allowing him to establish a double-exponential upper bound |M|^(2^|s|) for the length of the standard reduction relative to an arbitrary reduction s originated in M. In this paper we refine Xi\u2019s analysis, obtaining much better bounds, especially for computations producing small normal forms. For instance, for terms reducing to a boolean, we are able to prove that the length of the standard reduction is at most a mere factorial of the length of the shortest reduction sequence. The methodological innovation of our approach is that instead of counting the cost for producing something, as is customary, we count the cost of consuming things. The key observation is that the part of a lambda-term that is needed to produce the normal form (or an arbitrary rigid prefix) may rapidly augment along a computation, but can only decrease very slowly (actually, linearly)
arXiv (Cornell University), Dec 11, 2017
Lecture Notes in Computer Science, 2023
arXiv (Cornell University), Feb 23, 2020
Lecture Notes in Computer Science, 2011
Matita is an interactive theorem prover being developed by the Helm team at the University of Bol... more Matita is an interactive theorem prover being developed by the Helm team at the University of Bologna. Its stable version 0.5.x may be downloaded at http://matita.cs.unibo.it. The tool originated in the European project MoWGLI as a set of XML-based tools aimed to provide a mathematician-friendly web-interface to repositories of formal mathematical knoweldge, supporting advanced content-based functionalities for querying, searching and browsing the library. It has since then evolved into a fully fledged ITP, specifically designed as a lightweight , but competitive system, particularly suited for the assessment of innovative ideas, both at foundational and logical level. In this paper, we give an account of the whole system, its peculiarities and its main applications. 1 The System Matita is an interactive proof assistant, adopting a dependent type theory-the Calculus of (Co)Inductive Constructions (CIC)-as its foundational language for describing proofs. It is thus compatible, at the proof term level, with Coq [27], and the two systems are able to check each other's proof objects. Since the two systems do not share a single line of code, but are akin to each other, it is natural to take Coq as the main term of comparison, referring to other systems (most notably, Isabelle and HOL) when some ideas or philosophies characteristic of these latter tools have been imported into our system. Similarly to Coq, Matita follows the so called De Bruijn principle, stating that proofs generated by the system should be verifiable by a small and trusted component, traditionally called kernel. Unsurprisingly, the kernel has roughly the same size in the two tools, in spite of a few differences in the encoding of terms: in particular, Matita's kernel handles explicit substitutions to mimic Coq's Section mechanism, and can cope with existential metavariables, i.e. non-linear placeholders that are Curry-Howard isomorphic to holes in the proofs. Metavariables cannot be instantiated by the kernel: they are considered as opaque constants, with a declared type, only equal to themselves. While this extension does not make the kernel sensibly more complex or fragile, it has a beneficial effect on the size of the type inference subsystem, here
We analyze the inherent complexity of implementing L&y's notion of optimal evaluation for the &ca... more We analyze the inherent complexity of implementing L&y's notion of optimal evaluation for the &calculus, where similar redexes are contracted in one step via so-called parallel /%rekction. optimal evaluation IQSS finally realized by Lamping, who introduced a beautiful graph reduction technology for sharing evaluation. contexts dual to the sharing of values. His pioneering insights have been modified and improved in subsequent implementations of optimal reduction. We prove that the cost of parallel P-reduction is not bounded by any ICalm&-elementary recursive function. Not merely do we establish that the parallel @-step cannot, be a unit-cost operation, we demonstrate that the time complexity of implementing a sequence of It parallel P-steps is not bounded as 0(2"), 0(22"), O(22z"), or in general, O(Kl(n)) where K~fn) is a fixed stack of e 2s with an n on top. A key insight, essential to the establishment of this nonelementary lower bound, is that any simply-typed A-term can be reduced to normal form in a number of parallel &steps that is only polynomial in the length of the explicitly-typed term. The result follows from Statman's theorem that deciding equivalence of typed X-terms is not elementary recursive. The main theorem gives a lower bound on the work that must be done by any technology that implements L&y's notion of optima1 reduction. However, in the significant case of Lamping's solution, we make some important remarks addressing how work done by preduction is translated into equivalent work carried out 'Partially supported by Esprit WG-21836 CONFER-2. %upported by ONR Grant NO0014951-1015, NSF Grant CCR-9619638, and the Tyson Foundation. &mission to m&e digikdfhxd copies of all or part of this material for pcrsonnl or cbassroom use is granted without fee provided that the copies nre not made or distributed for profit or commercial advantig, the copyright noke, llie tide of the publication 'and its date appear, and n&e is given th.?l copyright is by permission oflhe ACM, Inc. To copy olberwk, to republish, to post on servers or to redistribute to lists, requires specific pent&ion nndlor fee.
arXiv (Cornell University), Feb 18, 2020
In the loss function of Variational Autoencoders there is a well known tension between two compon... more In the loss function of Variational Autoencoders there is a well known tension between two components: the reconstruction loss, improving the quality of the resulting images, and the Kullback-Leibler divergence, acting as a regularizer of the latent space. Correctly balancing these two components is a delicate issue, easily resulting in poor generative behaviours. In a recent work [8], a sensible improvement has been obtained by allowing the network to learn the balancing factor during training, according to a suitable loss function. In this article, we show that learning can be replaced by a simple deterministic computation, helping to understand the underlying mechanism, and resulting in a faster and more accurate behaviour. On typical datasets such as Cifar and Celeba, our technique sensibly outperforms all previous VAE architectures.
We prove that the complexity of Lamping's optimal graph reduction technique for the A-calculus ca... more We prove that the complexity of Lamping's optimal graph reduction technique for the A-calculus can be exponential in the number of L6vy 's family reductions. Starting from this consideration, we propose a new measure for what could be considered as "the intrinsic complexity" of A-terms.
arXiv (Cornell University), Mar 1, 2021
Variational AutoEncoders (VAEs) are powerful generative models that merge elements from statistic... more Variational AutoEncoders (VAEs) are powerful generative models that merge elements from statistics and information theory with the flexibility offered by deep neural networks to efficiently solve the generation problem for high dimensional data. The key insight of VAEs is to learn the latent distribution of data in such a way that new meaningful samples can be generated from it. This approach led to tremendous research and variations in the architectural design of VAEs, nourishing the recent field of research known as unsupervised representation learning. In this article, we provide a comparative evaluation of some of the most successful, recent variations of VAEs. We particularly focus the analysis on the energetic efficiency of the different models, in the spirit of the so called Green AI, aiming both to reduce the carbon footprint and the financial cost of generative techniques. For each architecture we provide its mathematical formulation, the ideas underlying its design, a detailed model description, a running implementation and quantitative results. Keywords Generative Modeling • Variational Autoencoders • GreenAI 1 Introduction Data generation, that is the task of generating new realistic samples given a set of training data, is a fascinating problem of AI, with many relevant appli
arXiv (Cornell University), Mar 20, 2022
MicroRacer is a simple, open source environment inspired by car racing especially meant for the d... more MicroRacer is a simple, open source environment inspired by car racing especially meant for the didactics of Deep Reinforcement Learning. The complexity of the environment has been explicitly calibrated to allow users to experiment with many different methods, networks and hyperparameters settings without requiring sophisticated software or the need of exceedingly long training times. Baseline agents for major learning algorithms such as DDPG, PPO, SAC, TD2 and DSAC are provided too, along with a preliminary comparison in terms of training time and performance.
Convolutionalization of discriminative neural networks, introduced also for segmentation purposes... more Convolutionalization of discriminative neural networks, introduced also for segmentation purposes, is a simple technique allowing to generate heat-maps relative to the location of a given object in a larger image. In this article, we apply this technique to automatically crop images at their actual point of interest, fine tuning them with the final aim to improve the quality of a dataset. The use of an ensemble of fully convolutional nets sensibly reduce the risk of overfitting, resulting in reasonably accurate croppings. The methodology has been tested on a well known dataset, particularly renowned for containing badly centered and noisy images: the Food-101 dataset, composed of 101K images spread over 101 food categories. The quality of croppings can be testified by a sensible and uniform improvement (3 − 5%) in the classification accuracy of classifiers, even external to the ensemble.
arXiv (Cornell University), Jan 16, 2017
There is still a lot of confusion about "optimal" sharing in the lambda calculus, and its actual ... more There is still a lot of confusion about "optimal" sharing in the lambda calculus, and its actual efficiency. In this article, we shall try to clarify some of these issues.
arXiv (Cornell University), Dec 1, 2018
Working in high-dimensional latent spaces, the internal encoding of data in Variational Autoencod... more Working in high-dimensional latent spaces, the internal encoding of data in Variational Autoencoders becomes naturally sparse. We discuss this known but controversial phenomenon sometimes refereed to as overpruning, to emphasize the under-use of the model capacity. In fact, it is an important form of self-regularization, with all the typical benefits associated with sparsity: it forces the model to focus on the really important features, highly reducing the risk of overfitting. Especially, it is a major methodological guide for the correct tuning of the model capacity, progressively augmenting it to attain sparsity, or conversely reducing the dimension of the network removing links to zeroed out neurons. The degree of sparsity crucially depends on the network architecture: for instance, convolutional networks typically show less sparsity, likely due to the tighter relation of features to different spatial regions of the input.
ACM Transactions on Computational Logic, Jun 16, 2015
We address computational complexity writing polymorphic functions between finite types (i.e., typ... more We address computational complexity writing polymorphic functions between finite types (i.e., types with a finite number of canonical elements), expressing costs in terms of the cardinality of these types. This allows us to rediscover, in a more syntactical setting, the known result that the different levels in the hierarchy of higher-order primitive recursive functions (Gödel system T), when interpreted over finite structures, precisely capture basic complexity classes: functions of rank 1 characterize LOGSPACE, rank 2 PTIME, rank 3 PSPACE, rank 4 EXPTIME = DTIME(2 poly ), and so on.
Edizioni ETS eBooks, 2018
Il settore della verifica automatica si occupa della elaborazione e validazione mediante elaborat... more Il settore della verifica automatica si occupa della elaborazione e validazione mediante elaboratori elettronici di certificati di correttezza. Gli strumenti in oggetto, chiamati abitualmente proof assistants o interactive provers, forniscono un ambiente interattivo per la costruzione di certificati formali la cui correttezza pu\uf2 essere determinata in modo completamente automatico. Questi strumenti hanno applicazioni sia in campo matematico, dove i certificati sono prove di teoremi, sia in campo informatico, dove i certificati argomentano la correttezza di un determinato software rispetto ad una sua specifica
Logic in Computer Science, 2013
A new \u201cinductive\u201d approach to standardization for the lambda-calculus has been recently... more A new \u201cinductive\u201d approach to standardization for the lambda-calculus has been recently introduced by Xi, allowing him to establish a double-exponential upper bound |M|^(2^|s|) for the length of the standard reduction relative to an arbitrary reduction s originated in M. In this paper we refine Xi\u2019s analysis, obtaining much better bounds, especially for computations producing small normal forms. For instance, for terms reducing to a boolean, we are able to prove that the length of the standard reduction is at most a mere factorial of the length of the shortest reduction sequence. The methodological innovation of our approach is that instead of counting the cost for producing something, as is customary, we count the cost of consuming things. The key observation is that the part of a lambda-term that is needed to produce the normal form (or an arbitrary rigid prefix) may rapidly augment along a computation, but can only decrease very slowly (actually, linearly)
arXiv (Cornell University), Dec 11, 2017
Lecture Notes in Computer Science, 2023
arXiv (Cornell University), Feb 23, 2020
Lecture Notes in Computer Science, 2011
Matita is an interactive theorem prover being developed by the Helm team at the University of Bol... more Matita is an interactive theorem prover being developed by the Helm team at the University of Bologna. Its stable version 0.5.x may be downloaded at http://matita.cs.unibo.it. The tool originated in the European project MoWGLI as a set of XML-based tools aimed to provide a mathematician-friendly web-interface to repositories of formal mathematical knoweldge, supporting advanced content-based functionalities for querying, searching and browsing the library. It has since then evolved into a fully fledged ITP, specifically designed as a lightweight , but competitive system, particularly suited for the assessment of innovative ideas, both at foundational and logical level. In this paper, we give an account of the whole system, its peculiarities and its main applications. 1 The System Matita is an interactive proof assistant, adopting a dependent type theory-the Calculus of (Co)Inductive Constructions (CIC)-as its foundational language for describing proofs. It is thus compatible, at the proof term level, with Coq [27], and the two systems are able to check each other's proof objects. Since the two systems do not share a single line of code, but are akin to each other, it is natural to take Coq as the main term of comparison, referring to other systems (most notably, Isabelle and HOL) when some ideas or philosophies characteristic of these latter tools have been imported into our system. Similarly to Coq, Matita follows the so called De Bruijn principle, stating that proofs generated by the system should be verifiable by a small and trusted component, traditionally called kernel. Unsurprisingly, the kernel has roughly the same size in the two tools, in spite of a few differences in the encoding of terms: in particular, Matita's kernel handles explicit substitutions to mimic Coq's Section mechanism, and can cope with existential metavariables, i.e. non-linear placeholders that are Curry-Howard isomorphic to holes in the proofs. Metavariables cannot be instantiated by the kernel: they are considered as opaque constants, with a declared type, only equal to themselves. While this extension does not make the kernel sensibly more complex or fragile, it has a beneficial effect on the size of the type inference subsystem, here
We analyze the inherent complexity of implementing L&y's notion of optimal evaluation for the &ca... more We analyze the inherent complexity of implementing L&y's notion of optimal evaluation for the &calculus, where similar redexes are contracted in one step via so-called parallel /%rekction. optimal evaluation IQSS finally realized by Lamping, who introduced a beautiful graph reduction technology for sharing evaluation. contexts dual to the sharing of values. His pioneering insights have been modified and improved in subsequent implementations of optimal reduction. We prove that the cost of parallel P-reduction is not bounded by any ICalm&-elementary recursive function. Not merely do we establish that the parallel @-step cannot, be a unit-cost operation, we demonstrate that the time complexity of implementing a sequence of It parallel P-steps is not bounded as 0(2"), 0(22"), O(22z"), or in general, O(Kl(n)) where K~fn) is a fixed stack of e 2s with an n on top. A key insight, essential to the establishment of this nonelementary lower bound, is that any simply-typed A-term can be reduced to normal form in a number of parallel &steps that is only polynomial in the length of the explicitly-typed term. The result follows from Statman's theorem that deciding equivalence of typed X-terms is not elementary recursive. The main theorem gives a lower bound on the work that must be done by any technology that implements L&y's notion of optima1 reduction. However, in the significant case of Lamping's solution, we make some important remarks addressing how work done by preduction is translated into equivalent work carried out 'Partially supported by Esprit WG-21836 CONFER-2. %upported by ONR Grant NO0014951-1015, NSF Grant CCR-9619638, and the Tyson Foundation. &mission to m&e digikdfhxd copies of all or part of this material for pcrsonnl or cbassroom use is granted without fee provided that the copies nre not made or distributed for profit or commercial advantig, the copyright noke, llie tide of the publication 'and its date appear, and n&e is given th.?l copyright is by permission oflhe ACM, Inc. To copy olberwk, to republish, to post on servers or to redistribute to lists, requires specific pent&ion nndlor fee.
arXiv (Cornell University), Feb 18, 2020
In the loss function of Variational Autoencoders there is a well known tension between two compon... more In the loss function of Variational Autoencoders there is a well known tension between two components: the reconstruction loss, improving the quality of the resulting images, and the Kullback-Leibler divergence, acting as a regularizer of the latent space. Correctly balancing these two components is a delicate issue, easily resulting in poor generative behaviours. In a recent work [8], a sensible improvement has been obtained by allowing the network to learn the balancing factor during training, according to a suitable loss function. In this article, we show that learning can be replaced by a simple deterministic computation, helping to understand the underlying mechanism, and resulting in a faster and more accurate behaviour. On typical datasets such as Cifar and Celeba, our technique sensibly outperforms all previous VAE architectures.
We prove that the complexity of Lamping's optimal graph reduction technique for the A-calculus ca... more We prove that the complexity of Lamping's optimal graph reduction technique for the A-calculus can be exponential in the number of L6vy 's family reductions. Starting from this consideration, we propose a new measure for what could be considered as "the intrinsic complexity" of A-terms.
arXiv (Cornell University), Mar 1, 2021
Variational AutoEncoders (VAEs) are powerful generative models that merge elements from statistic... more Variational AutoEncoders (VAEs) are powerful generative models that merge elements from statistics and information theory with the flexibility offered by deep neural networks to efficiently solve the generation problem for high dimensional data. The key insight of VAEs is to learn the latent distribution of data in such a way that new meaningful samples can be generated from it. This approach led to tremendous research and variations in the architectural design of VAEs, nourishing the recent field of research known as unsupervised representation learning. In this article, we provide a comparative evaluation of some of the most successful, recent variations of VAEs. We particularly focus the analysis on the energetic efficiency of the different models, in the spirit of the so called Green AI, aiming both to reduce the carbon footprint and the financial cost of generative techniques. For each architecture we provide its mathematical formulation, the ideas underlying its design, a detailed model description, a running implementation and quantitative results. Keywords Generative Modeling • Variational Autoencoders • GreenAI 1 Introduction Data generation, that is the task of generating new realistic samples given a set of training data, is a fascinating problem of AI, with many relevant appli
arXiv (Cornell University), Mar 20, 2022
MicroRacer is a simple, open source environment inspired by car racing especially meant for the d... more MicroRacer is a simple, open source environment inspired by car racing especially meant for the didactics of Deep Reinforcement Learning. The complexity of the environment has been explicitly calibrated to allow users to experiment with many different methods, networks and hyperparameters settings without requiring sophisticated software or the need of exceedingly long training times. Baseline agents for major learning algorithms such as DDPG, PPO, SAC, TD2 and DSAC are provided too, along with a preliminary comparison in terms of training time and performance.
Convolutionalization of discriminative neural networks, introduced also for segmentation purposes... more Convolutionalization of discriminative neural networks, introduced also for segmentation purposes, is a simple technique allowing to generate heat-maps relative to the location of a given object in a larger image. In this article, we apply this technique to automatically crop images at their actual point of interest, fine tuning them with the final aim to improve the quality of a dataset. The use of an ensemble of fully convolutional nets sensibly reduce the risk of overfitting, resulting in reasonably accurate croppings. The methodology has been tested on a well known dataset, particularly renowned for containing badly centered and noisy images: the Food-101 dataset, composed of 101K images spread over 101 food categories. The quality of croppings can be testified by a sensible and uniform improvement (3 − 5%) in the classification accuracy of classifiers, even external to the ensemble.
arXiv (Cornell University), Jan 16, 2017
There is still a lot of confusion about "optimal" sharing in the lambda calculus, and its actual ... more There is still a lot of confusion about "optimal" sharing in the lambda calculus, and its actual efficiency. In this article, we shall try to clarify some of these issues.
arXiv (Cornell University), Dec 1, 2018
Working in high-dimensional latent spaces, the internal encoding of data in Variational Autoencod... more Working in high-dimensional latent spaces, the internal encoding of data in Variational Autoencoders becomes naturally sparse. We discuss this known but controversial phenomenon sometimes refereed to as overpruning, to emphasize the under-use of the model capacity. In fact, it is an important form of self-regularization, with all the typical benefits associated with sparsity: it forces the model to focus on the really important features, highly reducing the risk of overfitting. Especially, it is a major methodological guide for the correct tuning of the model capacity, progressively augmenting it to attain sparsity, or conversely reducing the dimension of the network removing links to zeroed out neurons. The degree of sparsity crucially depends on the network architecture: for instance, convolutional networks typically show less sparsity, likely due to the tighter relation of features to different spatial regions of the input.