GeneralizedLinearMixedModel.predict - Predict response of generalized linear mixed-effects model - MATLAB (original) (raw)

Class: GeneralizedLinearMixedModel

Predict response of generalized linear mixed-effects model

Syntax

Description

[ypred](#bubto66-ypred) = predict([glme](#bubto66%5Fsep%5Fshared-glme)) returns the predicted conditional means of the response, ypred, using the original predictor values used to fit the generalized linear mixed-effects model glme.

example

[ypred](#bubto66-ypred) = predict([glme](#bubto66%5Fsep%5Fshared-glme),[tblnew](#bubto66-tblnew)) returns the predicted conditional means using the new predictor values specified in tblnew.

If a grouping variable in tblnew has levels that are not in the original data, then the random effects for that grouping variable do not contribute to the 'Conditional' prediction at observations where the grouping variable has new levels.

example

[ypred](#bubto66-ypred) = predict(___,[Name,Value](#namevaluepairarguments)) returns the predicted conditional means of the response using additional options specified by one or more Name,Value pair arguments. For example, you can specify the confidence level, simultaneous confidence bounds, or contributions from only fixed effects. You can use any of the input arguments in the previous syntaxes.

[[ypred](#bubto66-ypred),[ypredCI](#bubto66-ypredCI)] = predict(___) also returns 95% point-wise confidence intervals, ypredCI, for each predicted value.

[[ypred](#bubto66-ypred),[ypredCI](#bubto66-ypredCI),[DF](#bubto66-DF)] = predict(___) also returns the degrees of freedom, DF, used to compute the confidence intervals.

Input Arguments

expand all

Generalized linear mixed-effects model, specified as a GeneralizedLinearMixedModel object. For properties and methods of this object, see GeneralizedLinearMixedModel.

New input data, which includes the response variable, predictor variables, and grouping variables, specified as a table or dataset array. The predictor variables can be continuous or grouping variables. tblnew must have the same variables as the original table or dataset array used in fitglme to fit the generalized linear mixed-effects model glme.

Name-Value Arguments

expand all

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Data Types: single | double

Model offset, specified as a vector of scalar values of length m, where m is the number of rows in tblnew. The offset is used as an additional predictor and has a coefficient value fixed at 1.

Type of confidence bounds, specified as the comma-separated pair consisting of 'Simultaneous' and either false or true.

Example: 'Simultaneous',true

Output Arguments

expand all

Predicted responses, returned as a vector. If the 'Conditional' name-value pair argument is specified as true, ypred contains predictions for the conditional means of the responses given the random effects. Conditional predictions include contributions from both fixed and random effects. Marginal predictions include only contributions from fixed effects.

To compute marginal predictions, predict computes conditional predictions, but substitutes a vector of zeros in place of the empirical Bayes predictors (EBPs) of the random effects.

Point-wise confidence intervals for the predicted values, returned as a two-column matrix. The first column of ypredCI contains the lower bound, and the second column contains the upper bound. By default, ypredCI contains the 95% nonsimultaneous confidence intervals for the predictions. You can change the confidence level using the Alpha name-value pair argument, and make them simultaneous using the Simultaneous name-value pair argument.

When fitting a GLME model using fitglme and one of the maximum likelihood fit methods ('Laplace' or 'ApproximateLaplace'), predict computes the confidence intervals using the conditional mean squared error of prediction (CMSEP) approach conditional on the estimated covariance parameters and the observed response. Alternatively, you can interpret the confidence intervals as approximate Bayesian credible intervals conditional on the estimated covariance parameters and the observed response.

When fitting a GLME model using fitglme and one of the pseudo likelihood fit methods ('MPL' or 'REMPL'), predict bases the computations on the fitted linear mixed-effects model from the final pseudo likelihood iteration.

Degrees of freedom used in computing the confidence intervals, returned as a vector or a scalar value.

Examples

expand all

Load the sample data.

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data:

The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius.

Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects term for intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0.

The number of defects can be modeled using a Poisson distribution:

defectsij∼Poisson(μij)

This corresponds to the generalized linear mixed-effects model

log(μij)=β0+β1newprocessij+β2time_devij+β3temp_devij+β4supplier_Cij+β5supplier_Bij+bi,

where

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)','Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');

Predict the response values at the original design values. Display the first ten predictions along with the observed response values.

ypred = predict(glme); [ypred(1:10),mfr.defects(1:10)]

ans = 10×2

4.9883    6.0000
5.9423    7.0000
5.1318    6.0000
5.6295    5.0000
5.3499    6.0000
5.2134    5.0000
4.6430    4.0000
4.5342    4.0000
5.3903    9.0000
4.6529    4.0000

Column 1 contains the predicted response values at the original design values. Column 2 contains the observed response values.

Load the sample data.

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data:

The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius.

Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects term for intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0.

The number of defects can be modeled using a Poisson distribution:

defectsij∼Poisson(μij)

This corresponds to the generalized linear mixed-effects model

log(μij)=β0+β1newprocessij+β2time_devij+β3temp_devij+β4supplier_Cij+β5supplier_Bij+bi,

where

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)','Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');

Predict the response values at the original design values.

Create a new table by copying the first 10 rows of mfr into tblnew.

The first 10 rows of mfr include data collected from trials 1 through 5 for factories 1 and 2. Both factories used the old process for all of their trials during the experiment, so newprocess = 0 for all 10 observations.

Change the value of newprocess to 1 for the observations in tblnew.

tblnew.newprocess = ones(height(tblnew),1);

Compute predicted response values and nonsimultaneous 99% confidence intervals using tblnew. Display the first 10 rows of the predicted values based on tblnew, the predicted values based on mfr, and the observed response values.

[ypred_new,ypredCI] = predict(glme,tblnew,'Alpha',0.01); [ypred_new,ypred(1:10),mfr.defects(1:10)]

ans = 10×3

3.4536    4.9883    6.0000
4.1142    5.9423    7.0000
3.5530    5.1318    6.0000
3.8976    5.6295    5.0000
3.7040    5.3499    6.0000
3.6095    5.2134    5.0000
3.2146    4.6430    4.0000
3.1393    4.5342    4.0000
3.7320    5.3903    9.0000
3.2214    4.6529    4.0000

Column 1 contains predicted response values based on the data in tblnew, where newprocess = 1. Column 2 contains predicted response values based on the original data in mfr, where newprocess = 0. Column 3 contains the observed response values in mfr. Based on these results, if all other predictors retain their original values, the predicted number of defects appears to be smaller when using the new process.

Display the 99% confidence intervals for rows 1 through 10 corresponding to the new predicted response values.

ans = 10×2

1.6983    7.0235
1.9191    8.8201
1.8735    6.7380
2.0149    7.5395
1.9034    7.2079
1.8918    6.8871
1.6776    6.1597
1.5404    6.3976
1.9574    7.1154
1.6892    6.1436

References

[1] Booth, J.G., and J.P. Hobert. “Standard Errors of Prediction in Generalized Linear Mixed Models.” Journal of the American Statistical Association, Vol. 93, 1998, pp. 262–272.