Deep learners benefit more from out-of-distribution examples (original) (raw)

Intelligent Systems Reference Library, 2013

Unsupervised learning of representations has been found useful in many applications and benefits from several advantages, e.g., where there are many unlabeled exemples and few labeled ones (semi-supervised learning), or where the unlabeled or labeled examples are from a distribution different but related to the one of interest (self-taught learning, multi-task learning, and domain adaptation). Some of these algorithms have successfully been used to learn a hierarchy of features, i.e., to build a deep architecture, either as initialization for a supervised predictor, or as a generative model. Deep learning algorithms can yield representations that are more abstract and better disentangle the hidden factors of variation underlying the unknown generating distribution, i.e., to capture invariances and discover non-local structure in that distribution. This chapter reviews the main motivations and ideas behind deep learning algorithms and their representation-learning components, as well as recent results in this area, and proposes a vision of challenges and hopes on the road ahead.

How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model

2024

Deep learning algorithms demonstrate a surprising ability to learn high-dimensional tasks from limited examples. This is commonly attributed to the depth of neural networks, enabling them to build a hierarchy of abstract, low-dimensional data representations. However, how many training examples are required to learn such representations remains unknown. To quantitatively study this question, we introduce the random hierarchy model: a family of synthetic tasks inspired by the hierarchical structure of language and images. The model is a classification task where each class corresponds to a group of high-level features, chosen among several equivalent groups associated with the same class. In turn, each feature corresponds to a group of subfeatures chosen among several equivalent groups and so on, following a hierarchy of composition rules. We find that deep networks learn the task by developing internal representations invariant to exchanging equivalent groups. Moreover, the number of data required corresponds to the point where correlations between low-level features and classes become detectable. Overall, our results indicate how deep networks overcome the curse of dimensionality by building invariant representations and provide an estimate of the number of data required to learn a hierarchical task.

Representation Learning: A Review and New Perspectives

—The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.

The difficulty of training deep architectures and the effect of unsupervised pre-training

Proceedings of The Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS’09), 2009

Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures was unsuccessful until the recent advent of algorithms based on unsupervised pretraining. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. Answering these questions is important if learning in deep architectures is to be further improved. We attempt to shed some light on these questions ...

Neural networks trained with SGD learn distributions of increasing complexity

arXiv (Cornell University), 2022

The ability of deep neural networks to generalise well even when they interpolate their training data has been explained using various "simplicity biases". These theories postulate that neural networks avoid over tting by rst learning simple functions, say a linear classi er, before learning more complex, non-linear functions. Meanwhile, data structure is also recognised as a key ingredient for good generalisation, yet its role in simplicity biases is not yet understood. Here, we show that neural networks trained using stochastic gradient descent initially classify their inputs using lower-order input statistics, like mean and covariance, and exploit higher-order statistics only later during training. We rst demonstrate this distributional simplicity bias (DSB) in a solvable model of a neural network trained on synthetic data. We empirically demonstrate DSB in a range of deep convolutional networks and visual transformers trained on CIFAR10, and show that it even holds in networks pre-trained on ImageNet. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of Gaussian universality in learning.

Introducing the structural bases of typicality effects in deep learning

2021

In this paper, we hypothesize that the effects of the degree of typicality in natural semantic categories can be generated based on the structure of artificial categories learned with deep learning models. Motivated by the human approach to representing natural semantic categories and based on the Prototype Theory foundations, we propose a novel Computational Prototype Model (CPM) to represent the internal structure of semantic categories. Unlike other prototype learning approaches, our mathematical framework proposes a first approach to provide deep neural networks with the ability to model abstract semantic concepts such as category central semantic meaning, typicality degree of an object's image, and family resemblance relationship. We proposed several methodologies based on the typicality's concept to evaluate our CPM-model in image semantic processing tasks such as image classification, a global semantic description, and transfer learning. Our experiments on different i...

Representation Learning: A Statistical Perspective

Annual Review of Statistics and Its Application

Learning representations of data is an important problem in statistics and machine learning. While the origin of learning representations can be traced back to factor analysis and multidimensional scaling in statistics, it has become a central theme in deep learning with important applications in computer vision and computational neuroscience. In this article, we review recent advances in learning representations from a statistical perspective. In particular, we review the following two themes: ( a) unsupervised learning of vector representations and ( b) learning of both vector and matrix representations. Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 7 is March 7, 2020. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

The Diffculty of Training Deep Architectures and the Effect of Unsupervised Pre-Training

2009

Whereas theoretical work suggests that deep ar- chitectures might be more e cient at represent- ing highly-varying functions, training deep ar- chitectures was unsuccessful until the recent ad- vent of algorithms based on unsupervised pre- training. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this di cult learning problem. ...

Deep learning from a statistical perspective

Stat, 2020

As one of the most rapidly developing artificial intelligence techniques, deep learning has been applied in various machine learning tasks, and has received great attention in data science and statistics. Regardless of the complex model structure, the deep neural networks can be viewed as non-linear and nonparametric generalization of existing statistical models. In this review, we introduce several popular deep learning models including convolutional neural networks, generative adversarial networks, recurrent neural networks and autoencoders, with their applications in image data, sequential data and recommender systems. We review the architecture of each model, and highlight their connections and differences compared to conventional statistical models. In particular, we provide a brief survey of the recent works on the unique overparameterization phenomenon, which explains the strengths and advantages of using extremely large number of parameters in deep learning. In addition, we provide a practical guidance on optimization algorithms, hyperparameter tuning and computing resources.

Encouraging an appropriate representation simplifies training of neural networks

Acta Universitatis Sapientiae, Informatica, 2020

A common assumption about neural networks is that they can learn an appropriate internal representation on their own, see e.g. end-to-end learning. In this work we challenge this assumption. We consider two simple tasks and show that the state-of-the-art training algorithm fails, although the model itself is able to represent an appropriate solution. We will demonstrate that encouraging an appropriate internal representation allows the same model to solve these tasks. While we do not claim that it is impossible to solve these tasks by other means (such as neural networks with more layers), our results illustrate that integration of domain knowledge in form of a desired internal representation may improve the generalization ability of neural networks.

Deep learners benefit more from out-of-distribution examples (original) (raw)

Related papers