LinearMixedModel - Linear mixed-effects model - MATLAB (original) (raw)

Linear mixed-effects model

Description

A LinearMixedModel object represents a model of a response variable with fixed and random effects. It comprises data, a model description, fitted coefficients, covariance parameters, design matrices, residuals, residual plots, and other diagnostic information for a linear mixed-effects model. You can predict model responses with the predict function and generate random data at new design points using the random function.

Creation

Create a LinearMixedModel model using fitlme or fitlmematrix. You can fit a linear mixed-effects model using fitlme(tbl,formula) if your data is in a table or dataset array. Alternatively, if your model is not easily described using a formula, you can create matrices to define the fixed and random effects, and fit the model using fitlmematrix(X,y,Z,G)

Properties

expand all

Coefficient Estimates

Fixed-effects coefficient estimates and related statistics, stored as a dataset array containing the following fields.

Name	Name of the term.
Estimate	Estimated value of the coefficient.
SE	Standard error of the coefficient.
tStat	_t_-statistics for testing the null hypothesis that the coefficient is equal to zero.
DF	Degrees of freedom for the _t_-test. Method to compute DF is specified by the 'DFMethod' name-value pair argument. Coefficients always uses the 'Residual' method for'DFMethod'.
pValue	_p_-value for the_t_-test.
Lower	Lower limit of the confidence interval for coefficient. Coefficients always uses the 95% confidence level, i.e.'alpha' is 0.05.
Upper	Upper limit of confidence interval for coefficient.Coefficients always uses the 95% confidence level, i.e.'alpha' is 0.05.

You can change 'DFMethod' and'alpha' while computing confidence intervals for or testing hypotheses involving fixed- and random-effects, using thecoefCI and coefTest methods.

Covariance of the estimated fixed-effects coefficients of the linear mixed-effects model, stored as a_p_-by-p matrix, where_p_ is the number of fixed-effects coefficients.

You can display the covariance parameters associated with the random effects using the covarianceParameters method.

Data Types: double

Names of the fixed-effects coefficients of a linear mixed-effects model, stored as a 1-by-p cell array of character vectors.

Data Types: cell

Number of fixed-effects coefficients in the fitted linear mixed-effects model, stored as a positive integer value.

Data Types: double

Number of estimated fixed-effects coefficients in the fitted linear mixed-effects model, stored as a positive integer value.

Data Types: double

Fitting Method

Method used to fit the linear mixed-effects model, stored as either of the following.

ML, if the fitting method is maximum likelihood
REML, if the fitting method is restricted maximum likelihood

Data Types: char

Input Data

Specification of the fixed-effects terms, random-effects terms, and grouping variables that define the linear mixed-effects model, stored as an object.

For more information on how to specify the model to fit using a formula, see Formula.

Number of observations used in the fit, stored as a positive integer value. This is the number of rows in the table or dataset array, or the design matrices minus the excluded rows or rows withNaN values.

Data Types: double

Number of variables used as predictors in the linear mixed-effects model, stored as a positive integer value.

Data Types: double

Total number of variables including the response and predictors, stored as a positive integer value.

If the sample data is in a table or dataset arraytbl, NumVariables is the total number of variables in tbl including the response variable.
If the fit is based on matrix input,NumVariables is the total number of columns in the predictor matrix or matrices, and response vector.

NumVariables includes variables, if there are any, that are not used as predictors or as the response.

Data Types: double

Information about the observations used in the fit, stored as a table.

ObservationInfo has one row for each observation and the following four columns.

Weights	The value of the weighted variable for that observation. Default value is 1.
Excluded	true, if the observation was excluded from the fit using the'Exclude' name-value pair argument, false, otherwise. 1 stands for true and 0 stands forfalse.
Missing	true, if the observation was excluded from the fit because any response or predictor value is missing,false, otherwise.Missing values includeNaN for numeric variables, empty cells for cell arrays, blank rows for character arrays, and the value for categorical arrays.
Subset	true, if the observation was used in the fit, false, if it was not used because it is missing or excluded.

Data Types: table

Names of observations used in the fit, stored as a cell array of character vectors.

If the data is in a table or dataset array,tbl, containing observation names,ObservationNames has those names.
If the data is provided in matrices, or a table or dataset array without observation names, thenObservationNames is an empty cell array.

Data Types: cell

Names of the variables that you use as predictors in the fit, stored as a cell array of character vectors that has the same length asNumPredictors.

Data Types: cell

Name of the variable used as the response variable in the fit, stored as a character vector.

Data Types: char

Variables, stored as a table.

If the fit is based on a table or dataset arraytbl, then Variables is identical to tbl.
If the fit is based on matrix input, thenVariables is a table containing all the variables in the predictor matrix or matrices, and response variable.

Data Types: table

Information about the variables used in the fit, stored as a table.

VariableInfo has one row for each variable and contains the following four columns.

Class	Class of the variable ('double','cell','nominal', and so on).
Range	Value range of the variable. For a numerical variable, it is a two-element vector of the form[min,max].For a cell or categorical variable, it is a cell or categorical array containing all unique values of the variable.
InModel	true, if the variable is a predictor in the fitted model.false, if the variable is not in the fitted model.
IsCategorical	true, if the variable has a type that is treated as a categorical predictor, such as cell, logical, or categorical, or if it is specified as categorical by the'Categorical' name-value pair argument of the fit method.false, if it is a continuous predictor.

Data Types: table

Names of the variables used in the fit, stored as a cell array of character vectors.

If sample data is in a table or dataset arraytbl, VariableNames contains the names of the variables intbl.
If sample data is in matrix format, thenVariableInfo includes variable names you supply while fitting the model. If you do not supply the variable names, then VariableInfo contains the default names.

Data Types: cell

Summary Statistics

Residual degrees of freedom, stored as a positive integer value.DFE = n –p, where n is the number of observations, and p is the number of fixed-effects coefficients.

This corresponds to the 'Residual' method of calculating degrees of freedom in the fixedEffects and randomEffects methods.

Data Types: double

Maximized log likelihood or maximized restricted log likelihood of the fitted linear mixed-effects model depending on the fitting method you choose, stored as a scalar value.

Data Types: double

Model criterion to compare fitted linear mixed-effects models, stored as a dataset array with the following columns.

AIC	Akaike Information Criterion
BIC	Bayesian Information Criterion
Loglikelihood	Log likelihood value of the model
Deviance	–2 times the log likelihood of the model

If n is the number of observations used in fitting the model, and p is the number of fixed-effects coefficients, then for calculating AIC and BIC,

The total number of parameters is nc +p + 1, where nc is the total number of parameters in the random-effects covariance excluding the residual variance
The effective number of observations is
- n, when the fitting method is maximum likelihood (ML)
- n – p, when the fitting method is restricted maximum likelihood (REML)

ML or REML estimate, based on the fitting method used for estimating σ2, stored as a positive scalar value. σ2 is the residual variance or variance of the observation error term of the linear mixed-effects model.

Data Types: double

Proportion of variability in the response explained by the fitted model, stored as a structure. It is the multiple correlation coefficient or R-squared. Rsquared has two fields.

Ordinary	R-squared value, stored as a scalar value in a structure. Rsquared.Ordinary = 1 – SSE./SST
Adjusted	R-squared value adjusted for the number of fixed-effects coefficients, stored as a scalar value in a structure.Rsquared.Adjusted = 1 – (SSE./SST)*(DFT./DFE),where DFE = n – p,DFT = n – 1, andn is the total number of observations, p is the number of fixed-effects coefficients.

Data Types: struct

Sum of squared errors, specified as a positive scalar.SSE is equal to the squared conditional residuals, that is

SSE = sum((y – F).^2),

where y is the response vector andF is the fitted conditional response of the linear mixed-effects model. The conditional model has contributions from both fixed and random effects.

If the model was trained with observation weights, the sum of squares in the SSE calculation is the weighted sum of squares.

Data Types: double

Regression sum of squares, specified as a positive scalar.SSR is the sum of squares explained by the linear mixed-effects regression, and is equal to the sum of the squared deviations between the fitted values and the mean of the response.

SSR = sum((F – mean(y)).^2),

where F is the fitted conditional response of the linear mixed-effects model and y is the response vector. The conditional model has contributions from both fixed and random effects.

If the model was trained with observation weights, the sum of squares in the SSR calculation is the weighted sum of squares.

Data Types: double

Total sum of squares, specified as a positive scalar.

For a linear mixed-effects model with an intercept,SST is calculated as

SST = SSE + SSR,

where SST is the total sum of squares, SSE is the sum of squared errors, andSSR is the regression sum of squares.

For a linear mixed-effects model without an intercept,SST is calculated as the sum of the squared deviations of the observed response values from their mean, that is

SST = sum((y – mean(y)).^2),

where y is the response vector.

If the model was trained with observation weights, the sum of squares in the SST calculation is the weighted sum of squares.

Data Types: double

Object Functions

anova	Analysis of variance for linear mixed-effects model
coefCI	Confidence intervals for coefficients of linear mixed-effects model
coefTest	Hypothesis test on fixed and random effects of linear mixed-effects model
compare	Compare linear mixed-effects models
covarianceParameters	Extract covariance parameters of linear mixed-effects model
designMatrix	Fixed- and random-effects design matrices
fitted	Fitted responses from a linear mixed-effects model
fixedEffects	Estimates of fixed effects and related statistics
partialDependence	Compute partial dependence
plotPartialDependence	Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots
plotResiduals	Plot residuals of linear mixed-effects model
predict	Predict response of linear mixed-effects model
random	Generate random responses from fitted linear mixed-effects model
randomEffects	Estimates of random effects and related statistics
residuals	Residuals of fitted linear mixed-effects model
response	Response vector of the linear mixed-effects model

Examples

collapse all

Load the sample data and convert it to table format.

load flu flu = dataset2table(flu)

flu=52×11 table Date NE MidAtl ENCentral WNCentral SAtl ESCentral WSCentral Mtn Pac WtdILI ______________ _____ ______ _________ _________ _____ _________ _________ _____ _____ ______

{'10/9/2005' }     0.97    1.025       1.232        1.286      1.082      1.457          1.1      0.981    0.971    1.182 
{'10/16/2005'}    1.136     1.06       1.228        1.286      1.146      1.644        1.123      0.976    0.917     1.22 
{'10/23/2005'}    1.135    1.172       1.278        1.536      1.274      1.556        1.236      1.102    0.895     1.31 
{'10/30/2005'}     1.52    1.489       1.576        1.794       1.59      2.252        1.612      1.321    1.082    1.343 
{'11/6/2005' }    1.365    1.394        1.53        1.825       1.62      2.059        1.471      1.453    1.118    1.586 
{'11/13/2005'}     1.39    1.477       1.506          1.9      1.683      1.813        1.464      1.388    1.204     1.47 
{'11/20/2005'}    1.212    1.231       1.295        1.495      1.347      1.794        1.303      1.371    1.137    1.611 
{'11/27/2005'}    1.477    1.546       1.557        1.855      1.678      2.159        1.739      1.628    1.443    1.827 
{'12/4/2005' }    1.285     1.43       1.482        1.635      1.577      1.903         1.53      1.701    1.516    1.776 
{'12/11/2005'}    1.354     1.45        1.46        1.794      1.583      1.894        1.831      2.364    2.094    1.941 
{'12/18/2005'}    1.502    1.622       1.638        1.988      1.947       2.22        2.577       3.89     2.66     2.34 
{'12/25/2005'}     1.86    1.915       1.955         2.38      2.343      3.027        3.219      4.862    2.595    3.086 
{'1/1/2006'  }    2.114    2.174       2.065        2.557      2.275      2.498        2.644      3.352    2.181     3.26 
{'1/8/2006'  }    1.815    1.932       1.822        2.046      1.969      1.805        2.189      2.132    1.717    2.613 
{'1/15/2006' }    1.541    1.695       1.581        2.008      1.718      1.662        2.156      1.694    1.351    2.247 
{'1/22/2006' }    1.632    1.758       1.711        2.217      1.866      2.194        2.268      1.826    1.384    2.352 
  ⋮

The flu table has a Date variable, and 10 variables containing estimated influenza rates (in 9 different regions, estimated from Google® searches, plus a nationwide estimate from the Center for Disease Control and Prevention, CDC).

To fit a linear-mixed effects model, your data must be in a properly formatted table. To fit a linear mixed-effects model with the influenza rates as the responses and region as the predictor variable, combine the nine columns corresponding to the regions into an array. The new table, flu2, must have the response variable, FluRate, the nominal variable, Region, that shows which region each estimate is from, and the grouping variable Date.

flu2 = stack(flu,2:10,NewDataVariableName="FluRate",... IndexVariableName="Region"); flu2.Date = nominal(flu2.Date);

Fit a linear mixed-effects model with fixed effects for region and a random intercept that varies by Date.

Because region is a nominal variable, fitlme takes the first region, NE, as the reference and creates eight dummy variables representing the other eight regions. For example, I[MidAtl] is the dummy variable representing the region MidAtl. For details, see Dummy Variables.

The corresponding model is

yim=β0+β1I[MidAtl]i+β2I[ENCentral]i+β3I[WNCentral]i+β4I[SAtl]i+β5I[ESCentral]i+β6I[WSCentral]i+β7I[Mtn]i+β8I[Pac]i+b0m+εim,m=1,2,...,52,

where yim is the observation i for level m of grouping variable Date, βj, j = 0, 1, ..., 8, are the fixed-effects coefficients, b0m is the random effect for level m of the grouping variable Date, and εim is the observation error for observation i. The random effect has the prior distribution, b0m∼N(0,σb2) and the error term has the distribution, εim∼N(0,σ2).

lme = fitlme(flu2,"FluRate ~ 1 + Region + (1|Date)")

lme = Linear mixed-effects model fit by ML

Model information: Number of observations 468 Fixed effects coefficients 9 Random effects coefficients 52 Covariance parameters 2

Formula: FluRate ~ 1 + Region + (1 | Date)

Model fit statistics: AIC BIC LogLikelihood Deviance 318.71 364.35 -148.36 296.71

Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper
{'(Intercept)' } 1.2233 0.096678 12.654 459 1.085e-31 1.0334 1.4133 {'Region_MidAtl' } 0.010192 0.052221 0.19518 459 0.84534 -0.092429 0.11281 {'Region_ENCentral'} 0.051923 0.052221 0.9943 459 0.3206 -0.050698 0.15454 {'Region_WNCentral'} 0.23687 0.052221 4.5359 459 7.3324e-06 0.13424 0.33949 {'Region_SAtl' } 0.075481 0.052221 1.4454 459 0.14902 -0.02714 0.1781 {'Region_ESCentral'} 0.33917 0.052221 6.495 459 2.1623e-10 0.23655 0.44179 {'Region_WSCentral'} 0.069 0.052221 1.3213 459 0.18705 -0.033621 0.17162 {'Region_Mtn' } 0.046673 0.052221 0.89377 459 0.37191 -0.055948 0.14929 {'Region_Pac' } -0.16013 0.052221 -3.0665 459 0.0022936 -0.26276 -0.057514

Random effects covariance parameters (95% CIs): Group: Date (52 Levels) Name1 Name2 Type Estimate Lower Upper
{'(Intercept)'} {'(Intercept)'} {'std'} 0.6443 0.5297 0.78368

Group: Error Name Estimate Lower Upper {'Res Std'} 0.26627 0.24878 0.285

The p-values 7.3324e-06 and 2.1623e-10 respectively show that the fixed effects of the flu rates in regions WNCentral and ESCentral are significantly different relative to the flu rates in region NE.

The confidence limits for the standard deviation of the random-effects term, σb, do not include 0 (0.5297, 0.78368), which indicates that the random-effects term is significant. You can also test the significance of the random-effects terms using the compare method.

The estimated value of an observation is the sum of the fixed effects and the random-effect value at the grouping variable level corresponding to that observation. For example, the estimated best linear unbiased predictor (BLUP) of the flu rate for region WNCentral in week 10/9/2005 is

yˆWNCentral,10/9/2005=βˆ0+βˆ3I[WNCentral]+bˆ10/9/2005=1.2233+0.23687-0.1718=1.28837.

This is the fitted conditional response, since it includes contribution to the estimate from both the fixed and random effects. You can compute this value as follows.

beta = fixedEffects(lme); [,,STATS] = randomEffects(lme); % Compute the random-effects statistics (STATS) STATS.Level = nominal(STATS.Level); y_hat = beta(1) + beta(4) + STATS.Estimate(STATS.Level=="10/9/2005")

You can simply display the fitted value using the fitted method.

F = fitted(lme); F(flu2.Date == "10/9/2005" & flu2.Region == "WNCentral")

Compute the fitted marginal response for region WNCentral in week 10/9/2005.

F = fitted(lme,Conditional=false); F(flu2.Date == "10/9/2005" & flu2.Region == "WNCentral")

Load the sample data.

Fit a linear mixed-effects model for miles per gallon (MPG), with fixed effects for acceleration, horsepower and the cylinders, and uncorrelated random-effect for intercept and acceleration grouped by the model year. This model corresponds to

MPGim=β0+β1Acci+β2HP+b0m+b1mAccim+εim,m=1,2,3,

with the random-effects terms having the following prior distributions:

bm=(b0mb1m)∼N(0,(σ02σ0,1σ0,1σ12)),

where m represents the model year.

First, prepare the design matrices for fitting the linear mixed-effects model.

X = [ones(406,1) Acceleration Horsepower]; Z = [ones(406,1) Acceleration]; Model_Year = nominal(Model_Year); G = Model_Year;

Now, fit the model using fitlmematrix with the defined design matrices and grouping variables. Use the 'fminunc' optimization algorithm.

lme = fitlmematrix(X,MPG,Z,G,'FixedEffectPredictors',.... {'Intercept','Acceleration','Horsepower'},'RandomEffectPredictors',... {{'Intercept','Acceleration'}},'RandomEffectGroups',{'Model_Year'},... 'FitMethod','REML')

lme = Linear mixed-effects model fit by REML

Model information: Number of observations 392 Fixed effects coefficients 3 Random effects coefficients 26 Covariance parameters 4

Formula: y ~ Intercept + Acceleration + Horsepower + (Intercept + Acceleration | Model_Year)

Model fit statistics: AIC BIC LogLikelihood Deviance 2202.9 2230.7 -1094.5 2188.9

Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper
{'Intercept' } 50.064 2.3176 21.602 389 1.4185e-68 45.507 54.62 {'Acceleration'} -0.57897 0.13843 -4.1825 389 3.5654e-05 -0.85112 -0.30681 {'Horsepower' } -0.16958 0.0073242 -23.153 389 3.5289e-75 -0.18398 -0.15518

Random effects covariance parameters (95% CIs): Group: Model_Year (13 Levels) Name1 Name2 Type Estimate Lower Upper
{'Intercept' } {'Intercept' } {'std' } 3.72 1.5215 9.0954 {'Acceleration'} {'Intercept' } {'corr'} -0.8769 -0.98274 -0.33846 {'Acceleration'} {'Acceleration'} {'std' } 0.3593 0.19418 0.66483

Group: Error Name Estimate Lower Upper {'Res Std'} 3.6913 3.4331 3.9688

The fixed effects coefficients display includes the estimate, standard errors (SE), and the 95% confidence interval limits (Lower and Upper). The p-values for (pValue) indicate that all three fixed-effects coefficients are significant.

The confidence intervals for the standard deviations and the correlation between the random effects for intercept and acceleration do not include zeros, hence they seem significant. Use the compare method to test for the random effects.

Display the covariance matrix of the estimated fixed-effects coefficients.

lme.CoefficientCovariance

ans = 3×3

5.3711   -0.2809   -0.0126

-0.2809 0.0192 0.0005 -0.0126 0.0005 0.0001

The diagonal elements show the variances of the fixed-effects coefficient estimates. For example, the variance of the estimate of the intercept is 5.3711. Note that the standard errors of the estimates are the square roots of the variances. For example, the standard error of the intercept is 2.3176, which is sqrt(5.3711).

The off-diagonal elements show the correlation between the fixed-effects coefficient estimates. For example, the correlation between the intercept and acceleration is –0.2809 and the correlation between acceleration and horsepower is 0.0005.

Display the coefficient of determination for the model.

ans = struct with fields: Ordinary: 0.7866 Adjusted: 0.7855

The adjusted value is the R-squared value adjusted for the number of predictors in the model.

More About

expand all

In general, a formula for model specification is a character vector or string scalar of the form 'y ~ terms'. For the linear mixed-effects models, this formula is in the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)', where fixed andrandom contain the fixed-effects and the random-effects terms.

Suppose a table tbl contains the following:

A response variable, y
Predictor variables, X_j_, which can be continuous or grouping variables
Grouping variables, g1, g2, ..., g_R_,

where the grouping variables inX_j_ andg_r_ can be categorical, logical, character arrays, string arrays, or cell arrays of character vectors.

Then, in a formula of the form, 'y ~ fixed + (random1|g1) + ... + (random_R_|g_R_)', the term fixed corresponds to a specification of the fixed-effects design matrix X, random1 is a specification of the random-effects design matrix Z1 corresponding to grouping variable g1, and similarly randomR is a specification of the random-effects design matrix ZR corresponding to grouping variable gR. You can express the fixed and random terms using Wilkinson notation.

Wilkinson notation describes the factors present in models. The notation relates to factors present in models, not to the multipliers (coefficients) of those factors.

Wilkinson Notation	Factors in Standard Notation
1	Constant (intercept) term
X^k, where k is a positive integer	X, X2, ..., X_k_
X1 + X2	X1, X2
X1*X2	X1, X2, X1.*X2 (elementwise multiplication of X1 and X2)
X1:X2	X1.*X2 only
- X2	Do not include X2
X1*X2 + X3	X1, X2, X3, X1*X2
X1 + X2 + X3 + X1:X2	X1, X2, X3, X1*X2
X1X2X3 - X1:X2:X3	X1, X2, X3, X1X2, X1X3, X2*X3
X1*(X2 + X3)	X1, X2, X3, X1X2, X1X3

Statistics and Machine Learning Toolbox™ notation always includes a constant term unless you explicitly remove the term using -1. Here are some examples for linear mixed-effects model specification.

Examples:

Formula	Description
'y ~ X1 + X2'	Fixed effects for the intercept, X1 and X2. This is equivalent to 'y ~ 1 + X1 + X2'.
'y ~ -1 + X1 + X2'	No intercept and fixed effects for X1 and X2. The implicit intercept term is suppressed by including -1.
'y ~ 1 + (1 \| g1)'	Fixed effects for the intercept plus random effect for the intercept for each level of the grouping variable g1.
'y ~ X1 + (1 \| g1)'	Random intercept model with a fixed slope.
'y ~ X1 + (X1 \| g1)'	Random intercept and slope, with possible correlation between them. This is equivalent to 'y ~ 1 + X1 + (1 + X1\|g1)'.
'y ~ X1 + (1 \| g1) + (-1 + X1	g1)'	Independent random effects terms for intercept and slope.
'y ~ 1 + (1 \| g1) + (1	g2) + (1	g1:g2)'	Random intercept model with independent main effects for g1 and g2, plus an independent interaction effect.

Version History

Introduced in R2013b