GeneralizedLinearMixedModel - Generalized linear mixed-effects model class - MATLAB (original) (raw)

Generalized linear mixed-effects model class

Description

A GeneralizedLinearMixedModel object represents a regression model of a response variable that contains both fixed and random effects. The object comprises data, a model description, fitted coefficients, covariance parameters, design matrices, residuals, residual plots, and other diagnostic information for a generalized linear mixed-effects (GLME) model. You can predict model responses with thepredict function and generate random data at new design points using the random function.

Construction

You can fit a generalized linear mixed-effects (GLME) model to sample data usingfitglme([tbl](#bubtdtx-tbl),[formula](#bubtdtx-formula)). For more information, see fitglme.

Input Arguments

expand all

Input data, which includes the response variable, predictor variables, and grouping variables, specified as a table or dataset array. The predictor variables can be continuous or grouping variables (see Grouping Variables). You must specify the model for the variables using formula.

Data Types: table

Formula for model specification, specified as a character vector or string scalar of the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'. For a full description, seeFormula.

Example: 'y ~ treatment +(1|block)'

Properties

expand all

Estimates of fixed-effects coefficients and related statistics, stored as a dataset array that has one row for each coefficient and the following columns:

To obtain any of these columns as a vector, index into the property using dot notation.

Use the coefTest method to perform other tests on the coefficients.

Covariance of estimated fixed-effects coefficient, stored as a matrix.

Data Types: single | double

Names of fixed-effects coefficients, stored as a cell array of character vectors. The label for the coefficient of the constant term is(Intercept). The labels for other coefficients indicate the terms that they multiply. When the term includes a categorical predictor, the label also indicates the level of that predictor.

Data Types: cell

Degrees of freedom for error, stored as a positive integer value.DFE is the number of observations minus the number of estimated coefficients.

DFE contains the degrees of freedom corresponding to the 'Residual' method of calculating denominator degrees of freedom for hypothesis tests on fixed-effects coefficients. If_n_ is the number of observations and_p_ is the number of fixed-effects coefficients, thenDFE is equal to np.

Data Types: double

Model dispersion parameter, stored as a scalar value. The dispersion parameter defines the conditional variance of the response.

For observation i, the conditional variance of the response _y_i, given the conditional mean μi and the dispersion parameter _σ_2, in a generalized linear mixed-effects model is

where _w_i is the i_th observation weight and_v is the variance function for the specified conditional distribution of the response. The Dispersion property contains an estimate of _σ_2 for the specified GLME model. The value ofDispersion depends on the specified conditional distribution of the response. For binomial and Poisson distributions, the theoretical value of Dispersion is equal to _σ_2 = 1.0.

Data Types: double

Flag indicating estimated dispersion parameter, stored as a logical value.

Data Types: logical

Response distribution name, stored as one of the following:

Method used to fit the model, stored as one of the following.

Model specification formula, stored as an object. The model specification formula uses Wilkinson’s notation to describe the relationship between the fixed-effects terms, random-effects terms, and grouping variables in the GLME model. For more information see Formula.

Link function characteristics, stored as a structure containing the following fields. The link is a function G that links the distribution parameter MU to the linear predictorETA as follows: G(MU) = ETA.

Field Description
Name Name of the link function
Link Function that defines G
Derivative Derivative of G
SecondDerivative Second derivative of G
Inverse Inverse of G

Data Types: struct

Log of likelihood function evaluated at the estimated coefficient values, stored as a scalar value. LogLikelihood depends on the method used to fit the model.

Data Types: double

Model criterion to compare fitted generalized linear mixed-effects models, stored as a table with the following fields.

Field Description
AIC Akaike information criterion
BIC Bayesian information criterion
LogLikelihood For a model fit using'Laplace' or'ApproximateLaplace',LogLikelihood is the maximized log likelihood.For a model fit using'MPL',LogLikelihood is the maximized log likelihood of the pseudo data from the final pseudo likelihood iteration.For a model fit using'REMPL',LogLikelihood is the maximized restricted log likelihood of the pseudo data from the final pseudo likelihood iteration.
Deviance –2 times LogLikelihood

Number of fixed-effects coefficients in the fitted generalized linear mixed-effects model, stored as a positive integer value.

Data Types: double

Number of estimated fixed-effects coefficients in the fitted generalized linear mixed-effects model, stored as a positive integer value.

Data Types: double

Number of observations used in the fit, stored as a positive integer value. NumObservations is the number of rows in the table or dataset array tbl, minus rows excluded using the'Exclude' name-value pair of fitglme or rows containingNaN values.

Data Types: double

Number of variables used as predictors in the generalized linear mixed-effects model, stored as a positive integer value.

Data Types: double

Total number of variables, including the response and predictors, stored as a positive integer value. If the sample data is in a table or dataset array tbl, then NumVariables is the total number of variables in tbl, including the response variable. NumVariables includes variables, if any, that are not used as predictors or as the response.

Data Types: double

Information about the observations used in the fit, stored as a table.

ObservationInfo has one row for each observation and the following columns.

Name Description
Weights The weight value for the observation. The default value is 1.
Excluded If the observation was excluded from the fit using the 'Exclude' name-value pair argument in fitglme, thenExcluded istrue, or 1. Otherwise, Excluded isfalse, or0.
Missing If the observation was excluded from the fit because any response or predictor value is missing, then Missing istrue. Otherwise,Missing isfalse.Missing values include NaN for numeric variables, empty cells for cell arrays, blank rows for character arrays, and the value for categorical arrays.
Subset If the observation was used in the fit, thenSubset istrue. If the observation was not used in the fit because it is missing or excluded, thenSubset isfalse.
BinomSize Binomial size for each observation. This column only applies when fitting a binomial distribution.

Data Types: table

Names of observations used in the fit, stored as a cell array of character vectors.

Data Types: cell

Names of the variables used as predictors in the fit, stored as a cell array of character vectors that has the same length asNumPredictors.

Data Types: cell

Name of the variable used as the response variable in the fit, stored as a character vector.

Data Types: char

Proportion of variability in the response explained by the fitted model, stored as a structure. Rsquared contains the_R_-squared value of the fitted model, also known as the multiple correlation coefficient. Rsquared contains the following fields.

Field Description
Ordinary R-squared value, stored as a scalar value in a structure. Rsquared.Ordinary = 1 — SSE./SST
Adjusted R-squared value adjusted for the number of fixed-effects coefficients, stored as a scalar value in a structure. Rsquared.Adjusted = 1 — (SSE./SST)*(DFT./DFE),whereDFE = n – p, DFT = n – 1, n is the total number of observations, and p is the number of fixed-effects coefficients.

Data Types: struct

Sum of squared errors, specified as a positive scalar.SSE is the weighted sum of the squared conditional residuals, and is calculated as

where N is the number of observations, _wi_eff is the _i_th effective weight, yi is the _i_th response, and fi is the _i_th fitted value.

The _i_th effective weight is calculated as

where wi is the _i_th observation weight, vi is the variance term for the i_th observation, and β^ and b^ are estimated values of β and_b, respectively.

The _i_th fitted value is calculated as

where g is the link function, _xi_T is the _i_th row of the fixed-effects design matrix X, _zi_T is the _i_th row of the random-effects design matrix Z, and δi is the _i_th offset value.

Data Types: double

Regression sum of squares, specified as a positive scalar.SSR is the sum of squares explained by the generalized linear mixed-effects regression, and is equal to the sum of the squared deviations between the fitted values and the mean of the response.SSR is calculated as

where N is the number of observations, _wi_eff is the _i_th effective weight, fi is the _i_th fitted value, and y¯ is the weighted average of the response.

The _i_th effective weight is calculated as

where wi is the _i_th observation weight, vi is the variance term for the i_th observation, and β^ and b^ are estimated values of β and_b, respectively.

The _i_th fitted value is calculated as

where g is the link function, _xi_T is the _i_th row of the fixed-effects design matrix X, _zi_T is the _i_th row of the random-effects design matrix Z, and δi is the _i_th offset value.

Data Types: double

Total sum of squares, specified as a positive scalar.

For a GLME model with an intercept, SST is calculated as

SST = SSE + SSR,

where SST is the total sum of squares,SSE is the error sum of squares, andSSR is the regression sum of squares.

For a GLME model without an intercept, SST is calculated as

where N is the number of observations, _wi_eff is the _i_th effective weight, yi is the _i_th response value, and y¯ is the weighted average of the response.

Data Types: double

Information about the variables used in the fit, stored as a table.VariableInfo has one row for each variable and contains the following columns.

Column Name Description
Class Class of the variable ('double','cell', 'nominal', and so on).
Range Value range of the variable. For a numerical variable,Range is a two-element vector of the form [min,max].For a cell or categorical variable,Range is a cell or categorical array containing all unique values of the variable.
InModel If the variable is a predictor in the fitted model,InModel istrue.If the variable is not in the fitted model, InModel is false.
IsCategorical If the variable type is treated as a categorical predictor (such as cell, logical, or categorical), thenIsCategorical istrue.If the variable is a continuous predictor, thenIsCategorical isfalse.

Data Types: table

Names of all the variables contained in the table or dataset arraytbl, stored as a cell array of character vectors.

Data Types: cell

Variables, stored as a table. If the fit is based on a table or dataset array tbl, then Variables is identical to tbl.

Data Types: table

Object Functions

anova Analysis of variance for generalized linear mixed-effects model
coefCI Confidence intervals for coefficients of generalized linear mixed-effects model
coefTest Hypothesis test on fixed and random effects of generalized linear mixed-effects model
compare Compare generalized linear mixed-effects models
covarianceParameters Extract covariance parameters of generalized linear mixed-effects model
designMatrix Fixed- and random-effects design matrices
fitted Fitted responses from generalized linear mixed-effects model
fixedEffects Estimates of fixed effects and related statistics
partialDependence Compute partial dependence
plotPartialDependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots
plotResiduals Plot residuals of generalized linear mixed-effects model
predict Predict response of generalized linear mixed-effects model
random Generate random responses from fitted generalized linear mixed-effects model
randomEffects Estimates of random effects and related statistics
refit Refit generalized linear mixed-effects model
residuals Residuals of fitted generalized linear mixed-effects model
response Response vector of generalized linear mixed-effects model

Examples

collapse all

Load the sample data.

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data:

The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius.

Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects term for intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0.

The number of defects can be modeled using a Poisson distribution

defectsij∼Poisson(μij)

This corresponds to the generalized linear mixed-effects model

log(μij)=β0+β1newprocessij+β2time_devij+β3temp_devij+β4supplier_Cij+β5supplier_Bij+bi,

where

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)', ... 'Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');

Display the model.

Generalized linear mixed-effects model fit by ML

Model information: Number of observations 100 Fixed effects coefficients 6 Random effects coefficients 20 Covariance parameters 1 Distribution Poisson Link Log
FitMethod Laplace

Formula: defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory)

Model fit statistics: AIC BIC LogLikelihood Deviance 416.35 434.58 -201.17 402.35

Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper
{'(Intercept)'} 1.4689 0.15988 9.1875 94 9.8194e-15 1.1515 1.7864 {'newprocess' } -0.36766 0.17755 -2.0708 94 0.041122 -0.72019 -0.015134 {'time_dev' } -0.094521 0.82849 -0.11409 94 0.90941 -1.7395 1.5505 {'temp_dev' } -0.28317 0.9617 -0.29444 94 0.76907 -2.1926 1.6263 {'supplier_C' } -0.071868 0.078024 -0.9211 94 0.35936 -0.22679 0.083051 {'supplier_B' } 0.071072 0.07739 0.91836 94 0.36078 -0.082588 0.22473

Random effects covariance parameters: Group: factory (20 Levels) Name1 Name2 Type Estimate {'(Intercept)'} {'(Intercept)'} {'std'} 0.31381

Group: Error Name Estimate {'sqrt(Dispersion)'} 1

The Model information table displays the total number of observations in the sample data (100), the number of fixed- and random-effects coefficients (6 and 20, respectively), and the number of covariance parameters (1). It also indicates that the response variable has a Poisson distribution, the link function is Log, and the fit method is Laplace.

Formula indicates the model specification using Wilkinson’s notation.

The Model fit statistics table displays statistics used to assess the goodness of fit of the model. This includes the Akaike information criterion (AIC), Bayesian information criterion (BIC) values, log likelihood (LogLikelihood), and deviance (Deviance) values.

The Fixed effects coefficients table indicates that fitglme returned 95% confidence intervals. It contains one row for each fixed-effects predictor, and each column contains statistics corresponding to that predictor. Column 1 (Name) contains the name of each fixed-effects coefficient, column 2 (Estimate) contains its estimated value, and column 3 (SE) contains the standard error of the coefficient. Column 4 (tStat) contains the t-statistic for a hypothesis test that the coefficient is equal to 0. Column 5 (DF) and column 6 (pValue) contain the degrees of freedom and p-value that correspond to the t-statistic, respectively. The last two columns (Lower and Upper) display the lower and upper limits, respectively, of the 95% confidence interval for each fixed-effects coefficient.

Random effects covariance parameters displays a table for each grouping variable (here, only factory), including its total number of levels (20), and the type and estimate of the covariance parameter. Here, std indicates that fitglme returns the standard deviation of the random effect associated with the factory predictor, which has an estimated value of 0.31381. It also displays a table containing the error parameter type (here, the square root of the dispersion parameter), and its estimated value of 1.

The standard display generated by fitglme does not provide confidence intervals for the random-effects parameters. To compute and display these values, use covarianceParameters.

Load the carbig sample data set.

The variables Acceleration, Model_Year, and Cylinders contain data for car acceleration, year of manufacture, and number of engine cylinders, respectively. The data was collected from cars built between 1970 and 1982.

Create a variable named CylinderCats that indicates whether a car has more than four cylinders. Use the table function to create a table from the data in Acceleration, Model_Year, and CylinderCats.

CylinderCats = Cylinders>4; tbl = table(Acceleration,Model_Year,CylinderCats);

Fit a generalized mixed-effects model to the data, using CylinderCats as the response variable and Model_Year as a random effect. Specify the response data distribution as binomial.

glme = fitglme(tbl,"CylinderCats~Acceleration+(Acceleration|Model_Year)",Distribution="Binomial");

glme is a GeneralizedLinearMixedModel object that contains information about the fitted model.

Inspect the statistics for the fixed effect Acceleration by using the fixedEffects object function with the default 95% confidence level.

[,,statsFixed] = fixedEffects(glme)

statsFixed = Fixed effect coefficients: DFMethod = 'residual', Alpha = 0.05

Name                    Estimate    SE          tStat      DF     pValue        Lower       Upper  
{'(Intercept)' }          4.3838      1.2374     3.5428    404    0.00044213      1.9513     6.8163
{'Acceleration'}        -0.29673    0.077896    -3.8093    404    0.00016104    -0.44986    -0.1436

The small _p_-value for the Acceleration term indicates that car acceleration has a statistically significant effect on whether a car has more than four cylinders.

Inspect the statistics for the random effect Model_Year by using the randomEffects object function with the default 95% confidence level.

[,,statsRandom] = randomEffects(glme)

statsRandom = Random effect coefficients: DFMethod = 'residual', Alpha = 0.05

Group                 Level         Name                    Estimate    SEPred     tStat       DF     pValue      Lower        Upper   
{'Model_Year'}        {'70'}        {'(Intercept)' }           3.041     2.1322      1.4262    404     0.15457      -1.1506      7.2326
{'Model_Year'}        {'70'}        {'Acceleration'}        -0.16836    0.13906     -1.2107    404     0.22672     -0.44173     0.10501
{'Model_Year'}        {'71'}        {'(Intercept)' }          3.4715     2.3452      1.4802    404     0.13959      -1.1389      8.0818
{'Model_Year'}        {'71'}        {'Acceleration'}        -0.21721    0.15106     -1.4378    404     0.15125     -0.51418    0.079764
{'Model_Year'}        {'72'}        {'(Intercept)' }          4.2634     2.4382      1.7486    404    0.081124     -0.52977      9.0566
{'Model_Year'}        {'72'}        {'Acceleration'}        -0.28827    0.15892     -1.8139    404    0.070435      -0.6007    0.024149
{'Model_Year'}        {'73'}        {'(Intercept)' }          3.7951     2.1976      1.7269    404    0.084949     -0.52512      8.1153
{'Model_Year'}        {'73'}        {'Acceleration'}        -0.21079    0.14182     -1.4864    404     0.13796     -0.48958    0.067996
{'Model_Year'}        {'74'}        {'(Intercept)' }        -0.77693     2.6678    -0.29123    404     0.77103      -6.0214      4.4675
{'Model_Year'}        {'74'}        {'Acceleration'}        0.056863    0.16571     0.34314    404     0.73167      -0.2689     0.38263
{'Model_Year'}        {'75'}        {'(Intercept)' }         -3.2681     2.1531     -1.5178    404     0.12984      -7.5008     0.96463
{'Model_Year'}        {'75'}        {'Acceleration'}         0.24151    0.13346      1.8096    404    0.071093    -0.020847     0.50387
{'Model_Year'}        {'76'}        {'(Intercept)' }        -0.28228     2.0922    -0.13492    404     0.89274      -4.3952      3.8306
{'Model_Year'}        {'76'}        {'Acceleration'}        0.045966    0.13069     0.35171    404     0.72524     -0.21096     0.30289
{'Model_Year'}        {'77'}        {'(Intercept)' }        -0.78239     2.2806    -0.34305    404     0.73174      -5.2658       3.701
{'Model_Year'}        {'77'}        {'Acceleration'}        0.052519    0.14498     0.36226    404     0.71735     -0.23249     0.33752
{'Model_Year'}        {'78'}        {'(Intercept)' }        -0.46307     2.2693    -0.20406    404     0.83841      -4.9242      3.9981
{'Model_Year'}        {'78'}        {'Acceleration'}        0.050014    0.14243     0.35114    404     0.72567     -0.22999     0.33002
{'Model_Year'}        {'79'}        {'(Intercept)' }         -2.5181     2.0134     -1.2507    404     0.21178      -6.4762        1.44
{'Model_Year'}        {'79'}        {'Acceleration'}         0.19051     0.1257      1.5156    404      0.1304    -0.056591     0.43761
{'Model_Year'}        {'80'}        {'(Intercept)' }         -2.6168     2.4053     -1.0879    404     0.27728      -7.3452      2.1117
{'Model_Year'}        {'80'}        {'Acceleration'}         0.10117    0.14903     0.67883    404     0.49763     -0.19181     0.39414
{'Model_Year'}        {'81'}        {'(Intercept)' }         -1.8396     2.4268    -0.75801    404     0.44888      -6.6103      2.9312
{'Model_Year'}        {'81'}        {'Acceleration'}         0.08723    0.15145     0.57596    404     0.56497      -0.2105     0.38496
{'Model_Year'}        {'82'}        {'(Intercept)' }         -2.0238     2.5531    -0.79267    404     0.42843      -7.0428      2.9953
{'Model_Year'}        {'82'}        {'Acceleration'}        0.058853    0.15948     0.36903    404      0.7123     -0.25467     0.37237

The large _p_-values in the table output indicate that not enough evidence exists to conclude that any of the random effect terms have a statistically significant effect on whether a car has more than four cylinders.

More About

expand all

In general, a formula for model specification is a character vector or string scalar of the form 'y ~ terms'. For generalized linear mixed-effects models, this formula is in the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)', wherefixed and random contain the fixed-effects and the random-effects terms, respectively, and R is the number of grouping variables in the model.

Suppose a table tbl contains the following:

where the grouping variables inX_j_ andg_r_ can be categorical, logical, character arrays, string arrays, or cell arrays of character vectors.

Then, in a formula of the form, 'y ~ fixed + (random1|g1) + ... + (random_R_|g_R_)', the term fixed corresponds to a specification of the fixed-effects design matrix X,random1 is a specification of the random-effects design matrix Z1 corresponding to grouping variable g1, and similarly randomR is a specification of the random-effects design matrixZR corresponding to grouping variablegR. You can express the fixed and random terms using Wilkinson notation.

Wilkinson notation describes the factors present in models. The notation relates to factors present in models, not to the multipliers (coefficients) of those factors.

Wilkinson Notation Factors in Standard Notation
1 Constant (intercept) term
X^k, where k is a positive integer X,X2, ...,X_k_
X1 + X2 X1, X2
X1*X2 X1, X2, X1.*X2 (elementwise multiplication of X1 and X2)
X1:X2 X1.*X2 only
- X2 Do not include X2
X1*X2 + X3 X1, X2,X3, X1*X2
X1 + X2 + X3 + X1:X2 X1, X2,X3, X1*X2
X1*X2*X3 - X1:X2:X3 X1, X2,X3, X1*X2,X1*X3, X2*X3
X1*(X2 + X3) X1, X2,X3, X1*X2,X1*X3

Statistics and Machine Learning Toolbox™ notation always includes a constant term unless you explicitly remove the term using -1. Here are some examples for linear mixed-effects model specification.

Examples:

Formula Description
'y ~ X1 + X2' Fixed effects for the intercept, X1 andX2. This is equivalent to 'y ~ 1 + X1 + X2'.
'y ~ -1 + X1 + X2' No intercept and fixed effects for X1 andX2. The implicit intercept term is suppressed by including -1.
'y ~ 1 + (1 | g1)' Fixed effects for the intercept plus random effect for the intercept for each level of the grouping variableg1.
'y ~ X1 + (1 | g1)' Random intercept model with a fixed slope.
'y ~ X1 + (X1 | g1)' Random intercept and slope, with possible correlation between them. This is equivalent to 'y ~ 1 + X1 + (1 + X1|g1)'.
'y ~ X1 + (1 | g1) + (-1 + X1 g1)' Independent random effects terms for intercept and slope.
'y ~ 1 + (1 | g1) + (1 g2) + (1 g1:g2)' Random intercept model with independent main effects forg1 and g2, plus an independent interaction effect.