Tutorial in biostatistics: spline smoothing with linear mixed models (original) (raw)

Marginal Longitudinal Curves Estimated via Bayesian Penalized Splines

The Six Cites air pollution data is used to estimate and investigate the marginal curve of a function describing lung growth for set of children in a longitudinal study. This article proposes penalized regression spline technique based on a semiparametric mixed models (MM) framework for an additive model. This smoothing approach fits marginal models for longitudinal unbalanced measurements by using a Bayesian inference approach, implemented using a Markov chain Monte Carlo (MCMC) approach with the Gibbs sampler. The unbalanced case in which missing or different number of measurements for a set of subjects is more practical and common in real life studies. This methodology makes it possible to establish a straightforward approach to estimating similar models using R programming, when it is not possible to do so using existing codes like lme().

The Analysis of Designed Experiments and Longitudinal Data by Using Smoothing Splines

Journal of the Royal Statistical Society: Series C (Applied Statistics), 1999

Smoothing splines and other non-parametric smoothing methods are well accepted for exploratory data analysis. These methods have been used in regression, in repeated measures or longitudinal data analysis, and in generalized linear models. However, a major drawback is the lack of a formal inferential framework. An exception which has not been fully exploited is the cubic smoothing spline. The cubic smoothing spline admits a mixed model formulation, which places this non-parametric smoother firmly in a parametric setting. The formulation presented in this paper provides the mechanism for including cubic smoothing splines in models for the analysis of designed experiments and longitudinal data. Thus nonlinear curves can be included with random effects and random coefficients, and this leads to very flexible and informative modelling within the linear mixed model framework. Variance heterogeneity can also be accommodated. The advantage of using the cubic smoothing spline in the case of longitudinal data is particularly pronounced, because covariance modelling is achieved implicitly as for random coefficient models. Several examples are considered to illustrate the ideas. Verbyla, Cullis, Kenward and Welham mechanism driving the process being observed. A major difficulty is that both estimation and inference are approximate, because the nonlinearity precludes exact likelihoods from being found. In addition, the assumptions made for the random effects may be questionable. Methods which avoid these problems are therefore very attractive.

gss : A Package for Smoothing Spline ANOVAModels

2012

This document provides a brief introduction to the gss facilities for nonparametric statistical modeling in a variety of problem settings including regression, density estimation, and hazard estimation. Functional ANOVA decompositions are built into models on product domains, and modeling and inferential tools are provided for tasks such as interval estimates, the “testing” of negligible model terms, the handling of correlated data, etc. The methodological background is outlined, and data analysis is illustrated using real-data examples. Nonparametric function estimation using stochastic data, also known as smoothing, has been studied by generations of statisticians. While scores of methods have proved successful for univariate smoothing, ones practical in multivariate settings number far less. Smoothing spline ANOVAmodels are a versatile family of smoothing methods that are suitable for both univariate and multivariate problems. The first public release of gss dated back to 1999, w...

Penalized Splines For Longitudinal Data With An Application In AIDS Studies

Journal of Modern Applied Statistical Methods, 2006

A penalized spline approximation is proposed in considering nonparametric regression for longitudinal data. Standard linear mixed-effects modeling can be applied for the estimation. It is relatively simple, efficiently computed, and robust to the smooth parameters selection, which are often encountered when local polynomial and smoothing spline techniques are used to analyze longitudinal data set. The method is extended to time-varying coefficient mixed-effects models. The proposed methods are applied to data from an AIDS clinical study. Biological interpretations and clinical implications are discussed. Simulation studies are done to illustrate the proposed methods.

Predicting pregnancy outcomes using longitudinal information: a penalized splines mixed-effects model approach

Statistics in Medicine, 2017

We propose a semiparametric mixed-effects model (SNMM) using penalized splines to classify longitudinal data and improve the prediction of a binary outcome. The work is motivated by a study in which different hormone levels were measured during the early stages of pregnancy, and the challenge is using this information to predict normal versus abnormal pregnancy outcomes. The aim of this paper is to compare models and estimation strategies based on alternative formulations of SNMMs depending on the characteristics of the data set under consideration. For our motivating example, we address the classification problem using a particular case of the SNMM in which the parameter space has a finite dimensional component (fixed effects and variance components) and an infinite dimensional component (unknown function) that need to be estimated. The nonparametric component of the model is estimated using penalized splines. For the parametric component, we compare the advantages of using random effects versus direct modeling of the correlation structure of the errors. Numerical studies show that our approach improves over other existing methods for the analysis of this type of data. Furthermore, the results obtained using our method support the idea that explicit modeling of the serial correlation of the error term improves the prediction accuracy with respect to a model with random effects, but independent errors.

Effect of Smoothing in Generalized Linear Mixed Models on the Estimation of Covariance Parameters for Longitudinal Data

The International Journal of Biostatistics, 2016

Besides being mainly used for analyzing clustered or longitudinal data, generalized linear mixed models can also be used for smoothing via restricting changes in the fit at the knots in regression splines. The resulting models are usually called semiparametric mixed models (SPMMs). We investigate the effect of smoothing using SPMMs on the correlation and variance parameter estimates for serially correlated longitudinal normal, Poisson and binary data. Through simulations, we compare the performance of SPMMs to other simpler methods for estimating the nonlinear association such as fractional polynomials, and using a parametric nonlinear function. Simulation results suggest that, in general, the SPMMs recover the true curves very well and yield reasonable estimates of the correlation and variance parameters. However, for binary outcomes, SPMMs produce biased estimates of the variance parameters for high serially correlated data. We apply these methods to a dataset investigating the as...

Analyzing Longitudinal Data using Gee-Smoothing Spline

… . Proceedings. Mathematics and Computers in Science …, 2009

This paper considers nonparametric regression to analyze longitudinal data. Some developments of nonparametric regression have been achieved for longitudinal or clustered categorical data. For exponential family distribution, Lin & Carroll [6] considered nonparametric regression for longitudinal data using GEE-Local Polynomial Kernel (LPK). They showed that in order to obtain an efficient estimator, one must ignore within subject correlation. This means within subject observations should be assumed independent, hence the working correlation matrix must be an identity matrix. With Lin & Carroll , to obtain efficient estimates we should ignore correlation that exist in longitudinal data, even if correlation is the interest of the study. In this paper we propose GEE-Smoothing spline to analyze longitudinal data and study the property of the estimator such as the bias, consistency and efficiency. We use natural cubic spline and combine with GEE of Liang & Zeger [5] in estimation. We want to explore numerically, whether the properties of GEE-Smoothing spline are better than of GEE-Local Polynomial Kernel that proposed by Lin & Carrol [6]. Using simulation we show that GEE-Smoothing Spline is better than GEE-local polynomial. The bias of pointwise estimator is decreasing with increasing sample size. The pointwise estimator is also consistent even with incorrect correlation structure, and the most efficient estimate is obtained if the true correlation structure is used.

Efficient two-dimensional smoothing with P-spline ANOVA mixed models and nested bases

Computational Statistics & Data Analysis, 2013

Low-rank smoothing techniques have gained much popularity in non-standard regression modeling. In particular, penalized splines and tensor product smooths are used as flexible tools to study non-parametric relationships among several covariates. The use of standard statistical software facilitates their use for several types of problems and applications. However, when interaction terms are considered in the modeling, and multiple smoothing parameters need to be estimated standard software does not work well when datasets are large or higher-order interactions are included or need to be tested. In this paper, a general approach for constructing and estimating bivariate smooth models for additive and interaction terms using penalized splines is proposed. The formulation is based on the mixed model representation of the smooth-ANOVA model by Lee and Durbán (in press), and several nested models in terms of random effects components are proposed. Each component has a clear interpretation in terms of function shape and model identifiability constraints. The term PS-ANOVA is coined for this type of models. The estimation method is relatively straightforward based on the algorithm by Schall (1991) for generalized linear mixed models. Further, a simplification of the smooth interaction term is used by constructing lower-rank basis (nested basis). Finally, some simulation studies and real data examples are presented to evaluate the new model and the estimation method.

Age-period-cohort models using smoothing splines: a generalized additive model approach

Statistics in Medicine, 2013

Age-period-cohort (APC) models are used to analyze temporal trends in disease or mortality rates, dealing with linear dependency among associated effects of age, period, and cohort. However, the nature of sparseness in such data has severely limited the use of APC models. To deal with these practical limitations and issues, we advocate cubic smoothing splines. We show that the methods of estimable functions proposed in the framework of generalized linear models can still be considered to solve the non-identifiability problem when the model fitting is within the framework of generalized additive models with cubic smoothing splines. Through simulation studies, we evaluate the performance of the cubic smoothing splines in terms of the mean squared errors of estimable functions. Our results support the use of cubic smoothing splines for APC modeling with sparse but unaggregated data from a Lexis diagram.

Smoothing Spline ANOVA Models:RPackagegss

Journal of Statistical Software, 2014

This document provides a brief introduction to the R package gss for nonparametric statistical modeling in a variety of problem settings including regression, density estimation, and hazard estimation. Functional ANOVA (analysis of variance) decompositions are built into models on product domains, and modeling and inferential tools are provided for tasks such as interval estimates, the "testing" of negligible model terms, the handling of correlated data, etc. The methodological background is outlined, and data analysis is illustrated using real-data examples.