Optimal Model Structure Identification 2 Nonlinear Regression (original) (raw)
2023, ForsChem Research Reports
Nonlinear regression consists in finding the best possible model parameter values of a given homoscedastic mathematical structure with nonlinear functions of the model parameters. In this report, the second part of the series, the mathematical structure of models with nonlinear functions of their parameters is optimized, resulting in the minimum estimation of model error variance. The uncertainty in the estimation of model parameters is evaluated using a linear approximation of the model about the optimal model parameter values found. The homoscedasticity of model residuals must be evaluated to validate this important assumption. The model structure identification procedure is implemented in R language and shown in the Appendix. Several examples are considered for illustrating the optimization procedure. In many practical situations, the optimal model obtained has heteroscedastic residuals. If the purpose of the model is only describing the experimental observations, the violation of the homoscedastic assumption may not be critical. However, for explanatory or extrapolating models, the presence of heteroscedastic residuals may lead to flawed conclusions.
Related papers
Optimal Model Structure Identification. 1. Multiple Linear Regression
ForsChem Research Reports, 2023
This is the first part of a series of reports discussing different strategies for optimizing the structure of mathematical models fitted from experimental data. In this report, the concept of randomistic models is introduced along with the general formulation of the multi-objective optimization problem of model structure identification. Different approaches can be used to solve this problem, depending on the set of possible models considered. In the case of mathematical models with linear parameters, a stepwise multiple linear regression procedure can be used. In particular, a stepwise strategy in both directions (backward elimination and forward selection) is suggested based on the selection of relevant terms for the model prioritized on their absolute linear correlation coefficients with respect to the response variable, followed by the identification of statistically significant or explanatory terms based on optimal significant levels. Two additional constraints can be included, considering a lower limit in the normality value of the residuals (normality assumption check), as well as a lower limit in standard residual error (avoiding model overfitting). This stepwise strategy, which successfully overcomes several limitations of conventional stepwise regression, is implemented as a function (steplm) in R language, and different examples are presented to illustrate its use.
In this study, nonlinear model with variance homogeneity is compared with nonlinear model with variance heterogeneity (power-of-the-mean-variance model) using residual standard error and F-statistic to see which one gives parsimonious description of the datasets. Newton-Raphson Algorithm was used to estimate the parameters of the models. The two models are fitted to Carbon Monoxide (CO) pollution data measured in part per million (PPM). Based on residual standard error and F-statistic, the power-of-the-mean-variance model performed better than nonlinear model with variance homogeneity.
Heteroscedastic Regression Models
ForsChem Research Reports, 2023
One of the most important tools for data analysis is statistical regression. This technique consists on identifying the best parameters of a given mathematical model describing a particular set of experimental observations. This method implicitly assumes that the model error has a constant variance (homoscedasticity) over the whole range of observations. However, this is not always the case, leading to inadequate or incomplete models as the changing variance (heteroscedasticity) is neglected. In this report, a method is proposed for describing the heteroscedastic behavior of the regression model residuals. The method uses weighted least squares minimization to fit the confidence intervals of the regression from a model of the standard error. The weights used are related to the confidence level considered. In addition, a test of heteroscedasticity is proposed based on the coefficient of variation of the model of standard error obtained by optimization. Various practical examples are presented for illustrating the proposed method.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.