Spectrum of GMDH algorithms (original) (raw)


Solution of practical problems and GMDH theory design lead to development of broad spectrum of software algorithms. Each of them corresponds to some definite conditions of it application [17]. Algorithms mainly differ one from another by the models-candidates set generator arrangement for given basic function, by the way of models structure complexing and, at last, by the external criteria accepted. Algorithm choice depends on specifics of the problem, noise dispersion level, sufficiency of data sample, and if data sample contains only continuous data.

Most often criteria of accuracy, differential or informative type are used. The work of GMDH algorithms has a straightforward analogy with the work of gardener during selection of a new hybrid plant [11].

GMDH algorithms
Variables Parametric Non-parametric
Continuous - Combinatorial (COMBI) - Multilayered Iterative (MIA) - GN - Objective System Analysis (OSA) - Harmonical - Two-level (ARIMAD) - Multiplicative-Additive (MAA) - Objective Computer Clusterization (OCC); - "Pointing Finger" (PF) clusterization algorithm; - Analogues Complexing (AC)
Discrete or binary - Harmonical Rediscretization - Algorithm on the base of Multilayered Theory of Statistical Decisions (MTSD)

The basic parametric GMDH algorithms listed in table, have been developed for continuous variables. Among the parametric algorithms [1,9] the most known are:

- the basic Combinatorial (COMBI) algorithm. It is based on full or reduced sorting-out of gradually complicated models and evaluation of them by external criterion on separate part of data sample;

- Multilayered Iteration (MIA) algorithm use at each layer of sorting procedure the same partial description (iteration rule). It should be used when it is needed to handle a big number of variables;

- Objective System Analysis (OSA) algorithm. The key feature of it, is that it examine not single equations, but systems of algebraic or difference equations, obtained by implicit templates (without goal function). An advantage of the algorithm is that the number of regressors is increased consequently, the information embedded in the data sample is utilized better;

- Two-level (ARIMAD) algorithm for modeling of long-term cyclic (such as stock or weather) processes. There are used systems of polynomial or difference equations for identification of models on two time scales and then choice of the best pair of models by external criterion value. For this can be used any parametric algorithm from described above [23].

Less known are parametric algorithms, which apply an exhaustive search to difference, harmonic or harmonic-exponential functions, and the Multiplicative-Additive algorithm, in which tested polynomial models are obtained by taking the logarithm of the product of input variables [18,19].
The parametric GMDH algorithms have proved to be highly efficient in cases where one is to model objects with non-fuzzy characteristics, such as engineering objects. In cases, where modeling involves objects with fuzzy characteristics, it is more efficient to use the non-parametric GMDH algorithms, in which polynomial models are replaced by a data sample divided into intervals or clusters. Such type algorithms completely solve the problem of coefficients estimates bias elimination.

Non-parametric algorithms are exemplified by:

- Objective Computer Clusterization (OCC) algorithm that operates with pairs of closely spaced sample points [5]. It finds physical clusterization that would as possible be the same on two subsamples;

- "Pointing Finger" (PF) algorithm for the search of physical clusterization. It is implemented by construction of two hierarchical clustering trees and estimation by the balance criterion [20];

- Analogues Complexing (AC) algorithm, which use the set of analogues instead of models and clusterizations [8]. It is recommended for the most fuzzy objects;

- Probabilistic algorithm, based on the Multilayered Theory of Statistical Decisions [6]. It is recommended for recognition and forecasting of binary objects and for the veritability of input data control to avoid the possible experts errors in it.

Recent developments of the GMDH have led to neuronets with active neurons, which realise twice-multilayered structure: neurons are multilayered and they are connected into multilayered structure. This gives possibility to optimize the set of input variables at each layer, while the accuracy increases. The accuracy of forecasting, approximation or pattern recognition can be increased beyond the limits which are reached by neuronet with single neurons [9,10,34]. In this approach, which corresponds to the actions of human nervous system, the connections between several neurons are not fixed but change depending on the neurons themselves. Such active neurons are able during the learning self-organizing process to estimate which inputs are necessary to minimise the given objective function of neuron. This is possible on the condition that every neuron in its turn is multilayered unit, such as modeling GMDH algorithm. Neuronet with active neurons, is considered as a tool to increase AI problems accuracy and lead-time with the help of regression area extension for inaccurate, noisy data or small data samples.

The GMDH algorithms recently are applied in optimization to solve the problems of normative forecasting (after "what-if-then" scenario) and optimal control of multivariable ill-defined objects. Many ill-defined objects in macroeconomy, ecology, manufacturing etc. can be described accurately enough by static algebraic or by difference equations, which can be transformed into problems of linear programming by nomination of non-linear members by additional variables. GMDH algorithms are applied to evaluate deflections of output variables from their reference optimal values [7,21]. Examples of use of Simplified Linear Programming (SLP) algorithm should be used for expert computer advisers construction, normative forecasting and control optimization of averaged variables. An important example [10] gives the prediction of effects of experiments. The algorithm solves two problems: calculation of effects of a given nuclear experiment and calculation of parameters which are necessary to reach optimal effects. It means, that the realization of experiments can often be replaced by computer experiments.

As already noted, the GMDH algorithms have been developed for continuous variables. In practice, however the sample will often include variables discretized into a small number of levels or even binary values. To extend the GMDH algorithms to discretized or binary variables, the Harmonic Rediscretization algorithm has been developed [22].

The existence of a broad gamut of GMDH algorithms is traceable to the fact, that it is impossible to define the characteristics of the rest or controlled objects exactly in advance. Therefore, it can be good practice to try several GMDH algorithms one after another and to decide which one suits a given type of objects best.

All the questions, which arise during modeling process, are to be solved by the comparison of the criterion values: that variant is better, which leads to more deeper minimum of basic external criteria. In this way, the type of algorithm is chosen objectively, according to the value of the discriminating criterion.

Information about dispersion of noise level is very useful to chose the algorithm type. For small dispersion level we can use the learning networks of GMDH type, based on the ordinary regression analysis using internal criteria. For considerable noise level the GMDH algorithms with external criteria are recommended. And for high level of noise dispersion non-parametric algorithms of clusterization or analogues complexing should be applied [8].