Efficient Design of Neural Networks with Random Weights (original) (raw)

Multi-Activation Hidden Units for Neural Networks with Random Weights

ArXiv, 2020

Single layer feedforward networks with random weights are successful in a variety of classification and regression problems. These networks are known for their non-iterative and fast training algorithms. A major drawback of these networks is that they require a large number of hidden units. In this paper, we propose the use of multi-activation hidden units. Such units increase the number of tunable parameters and enable formation of complex decision surfaces, without increasing the number of hidden units. We experimentally show that multi-activation hidden units can be used either to improve the classification accuracy, or to reduce computations.

A Method of Generating Random Weights and Biases in Feedforward Neural Networks with Random Hidden Nodes

2017

Neural networks with random hidden nodes have gained increasing interest from researchers and practical applications. This is due to their unique features such as very fast training and universal approximation property. In these networks the weights and biases of hidden nodes determining the nonlinear feature mapping are set randomly and are not learned. Appropriate selection of the intervals from which weights and biases are selected is extremely important. This topic has not yet been sufficiently explored in the literature. In this work a method of generating random weights and biases is proposed. This method generates the parameters of the hidden nodes in such a way that nonlinear fragments of the activation functions are located in the input space regions with data and can be used to construct the surface approximating a nonlinear target function. The weights and biases are dependent on the input data range and activation function type. The proposed methods allows us to control ...

Speeding up the Training of Neural Networks with the One-Step Procedure

Neural processing letters/Neural Processing Letters, 2024

In the last decade, research and corporate have shown a dramatically growing interest in the field of machine learning, mostly due to the performances of deep neural networks. These increasingly complex architectures solved a wide range of problems. However, training these sophisticated architectures require many computation on advanced hardware. With this paper, we introduce a new approach based on the One-Step procedure that may fasten their training. In this procedure, an initial guess estimator is computed on a subsample that is then improved with only one step of the Newton gradient descent on the whole dataset. To show the efficiency of this framework, we consider regression and classification tasks using simulated and real datasets. We consider classic architectures, namely multi-layer perceptrons and show, on our examples, that the One-Step procedure is often halving the computation time to train the neural networks while preserving the performances.

Improving Randomized Learning of Feedforward Neural Networks by Appropriate Generation of Random Parameters

Lecture Notes in Computer Science, 2019

In this work, a method of random parameters generation for randomized learning of a single-hidden-layer feedforward neural network is proposed. The method firstly, randomly selects the slope angles of the hidden neurons activation functions from an interval adjusted to the target function, then randomly rotates the activation functions, and finally distributes them across the input space. For complex target functions the proposed method gives better results than the approach commonly used in practice, where the random parameters are selected from the fixed interval. This is because it introduces the steepest fragments of the activation functions into the input hypercube, avoiding their saturation fragments.

A Novel Technique for Optimizing the Hidden Layer Architecture in Artificial Neural Networks

Artificial neural networks have been showed their effectiveness in many real world problems such as signal processing, pattern recognition, and classification problems. Although they provide highly generalized solutions, we find several unanswered problems in using artificial neural networks. Determining the most appropriate architecture of artificial neural network is identified as one of those major problems. Generally, the performance of a neural network strongly depends on the size of the network. By increasing the number of layers generalization ability can be improved. However, this solution may not be computationally optimized. On the other hand, too many hidden neurons may over-train the data and which cause the poor generalization. Also, too few neurons under-fit the data and hence, network may not train the data properly. Thus, both too many and too few neurons show bad generalization. Therefore, determining the most suitable architecture is very important in artificial neural networks. As such, a large number of researchers have been carried out to model the hidden layer architecture by using various techniques. These techniques can be categorized as pruning techniques and constructive techniques. Pruning algorithms start with an oversized network and remove nodes until the optimal architecture occurs [1],[2],[3],[4] and [12]. Constructive algorithms [5],[6],[7],[8] do the other way. They build the appropriate neural network during the training process by adding hidden layers, nodes and connection weights to a minimal architecture. However, most of these methods are confined to networks with small number of neurons or single hidden layer neural networks. Hence, they have not addressed the existing problem of hidden layer architecture properly. In this paper, a new pruning algorithm based on backpropagation training [11] has been proposed to design the optimal neural network. The optimal solution is obtained by two steps. First, the number of hidden layers in the most efficient network is determined. Then the network tends to the optimal solution by removing all unimportant nodes from each layer. The removable nodes are identified through the delta values of hidden neurons [9],[10]. The choosing of delta values was based on the fact that the delta values of the hidden layers are used to compute the error term of the next training cycle. Hence, delta value is a significant factor in error term. Thus, the delta values are used to identify the less saliency neurons and remove them from hidden neurons so that the error term tends to the desired limit faster than the backpropagation training. The approaches of the other researchers are discussed in the next section. Section III describes the new algorithm and how to use the delta values in optimization of hidden layer architecture. The experimental method and results are discussed in section IV. Finally, section V presents the conclusions.

Accelerating the learning process of a neural network by predicting the weight coefficient. Viktor O. Speranskyy, Mihail O. Domanciuc (Herald of Advanced Information Technology, Vol. 4 No. 4)

Accelerating the learning process of a neural network by predicting the weight coefficient, 2021

The purpose of this study is to analyze and implement the acceleration of the neural network learning process by predicting the weight coefficients. The relevance of accelerating the learning of neural networks is touched upon, as well as the possibility of using prediction models in a wide range of tasks where it is necessary to build fast classifiers. When data is received from the array of sensors of a chemical unit in real time, it is necessary to be able to predict changes and change the operating parameters. After assessment, this should be done as quickly as possible in order to promptly change the current structure and state of the resulting substances. Work on speeding up classifiers usually focuses on speeding up the applied classifier. The calculation of the predicted values of the weight coefficients are carried out using the calculation of the value using the known prediction models. The possibility of the combined use of prediction models and optimization models was tested to accelerate the learning process of a neural network. The scientific novelty of the study lies in the effectiveness analysis of prediction models use in training neural networks. For the experimental evaluation of the effectiveness of prediction models use, the classification problem was chosen. To solve the experimental problem, the type of neural network "multilayer perceptron" was chosen. The experiment is divided into several stages: initial training of the neural network without a model, and then using prediction models; initial training of a neural network without an optimization method, and then using optimization methods; initial training of the neural network using combinations of prediction models and optimization methods; measuring the relative error of using prediction models, optimization methods and combined use. Models such as "Seasonal Linear Regression", "Simple Moving Average", and "Jump" were used in the experiment. The "Jump" model was proposed and developed based on the results of observing the dependence of changes in the values of the weighting coefficient on the epoch. Methods such as "Adagrad", "Adadelta", "Adam" were chosen for training neural and subsequent verification of the combined use of prediction models with optimization methods. As a result of the study, the effectiveness of the use of prediction models in predicting the weight coefficients of a neural network has been revealed. The idea is proposed and models are used that can significantly reduce the training time of a neural network. The idea of using prediction models is that the model of the change in the weight coefficient from the epoch is a time series, which in turn tends to a certain value. As a result of the study, it was found that it is possible to combine prediction models and optimization models. Also, prediction models do not interfere with optimization models, since they do not affect the formula of the training itself, as a result of which it is possible to achieve rapid training of the neural network. In the practical part of the work, two known prediction models and the proposed developed model were used. As a result of the experiment, operating conditions were determined using prediction models.

High-dimensional neural feature design for layer-wise reduction of training cost

EURASIP Journal on Advances in Signal Processing

We design a rectified linear unit-based multilayer neural network by mapping the feature vectors to a higher dimensional space in every layer. We design the weight matrices in every layer to ensure a reduction of the training cost as the number of layers increases. Linear projection to the target in the higher dimensional space leads to a lower training cost if a convex cost is minimized. An ℓ2-norm convex constraint is used in the minimization to reduce the generalization error and avoid overfitting. The regularization hyperparameters of the network are derived analytically to guarantee a monotonic decrement of the training cost, and therefore, it eliminates the need for cross-validation to find the regularization hyperparameter in each layer. We show that the proposed architecture is norm-preserving and provides an invertible feature vector and, therefore, can be used to reduce the training cost of any other learning method which employs linear projection to estimate the target.

Accelerating the learning process of a neural network by predicting the weight coefficient

Herald of Advanced Information Technology, 2021

The purpose of this study is to analyze and implement the acceleration of the neural network learning process by predicting the weight coefficients. The relevance of accelerating the learning of neural networks is touched upon, as well as the possibility of using prediction models in a wide range of tasks where it is necessary to build fast classifiers. When data is received from the array of sensors of a chemical unit in real time, it is necessary to be able to predict changes and change the operating parameters. After assessment, this should be done as quickly as possible in order to promptly change the current structure and state of the resulting substances.. Work on speeding up classifiers usually focuses on speeding up the applied classifier. The calculation of the predicted values of the weight coefficients is carried out using the calculation of the value using the known prediction models. The possibility of the combined use of prediction models and optimization models was te...

Learning Neural Network Classifiers with Low Model Complexity

ArXiv, 2017

Modern neural network architectures for large-scale learning tasks have substantially higher model complexities, which makes understanding, visualizing and training these architectures difficult. Recent contributions to deep learning techniques have focused on architectural modifications to improve parameter efficiency and performance. In this paper, we derive a continuous and differentiable error functional for a neural network that minimizes its empirical error as well as a measure of the model complexity. The latter measure is obtained by deriving a differentiable upper bound on the Vapnik-Chervonenkis (VC) dimension of the classifier layer of a class of deep networks. Using standard backpropagation, we realize a training rule that tries to minimize the error on training samples, while improving generalization by keeping the model complexity low. We demonstrate the effectiveness of our formulation (the Low Complexity Neural Network - LCNN) across several deep learning algorithms,...

A deterministic algorithm that emulates learning with random weights

Neurocomputing, 2002

The expectation of a function of random variables can be modeled as the value of the function in the mean value of the variables plus a penalty term. Here, this penalty term is calculated exactly, and the properties of different approximations are analyzed. Then, a deterministic algorithm for minimizing the expected error of a feedforward network of random weights is presented. Given a particular feedforward network architecture and a training set, this algorithm accurately finds the weight configuration that makes the network response most resistant to a class of weight perturbations. Finally, the study of the most stable configurations of a network unravels some undesirable properties of networks with asymmetric activation functions.