Simplified Smooth Splines for APC Models (original) (raw)
Related papers
Simplified Smoothing Splines for APC Models
2021
Smoothing splines are splines fit including a roughness penalty. They can be used across groups of variables in regression models to produce more parsimonious models with improved accuracy. For APC (age-period-cohort) models, the variables in each direction can be numbered sequentially 1:N, which simplifies spline fitting. Further simplification is proposed using a different roughness penalty. Some key calculations then become closed-form, and numeric optimization for the degree of smoothing is simpler. Further, this allows the entire estimation to be done simply in MCMC for Bayesian and random-effects models, improving the estimation of the smoothing parameter and providing distributions of the parameters (or random effects) and the selection of the spline knots.
Age-period-cohort models using smoothing splines: a generalized additive model approach
Statistics in Medicine, 2013
Age-period-cohort (APC) models are used to analyze temporal trends in disease or mortality rates, dealing with linear dependency among associated effects of age, period, and cohort. However, the nature of sparseness in such data has severely limited the use of APC models. To deal with these practical limitations and issues, we advocate cubic smoothing splines. We show that the methods of estimable functions proposed in the framework of generalized linear models can still be considered to solve the non-identifiability problem when the model fitting is within the framework of generalized additive models with cubic smoothing splines. Through simulation studies, we evaluate the performance of the cubic smoothing splines in terms of the mean squared errors of estimable functions. Our results support the use of cubic smoothing splines for APC modeling with sparse but unaggregated data from a Lexis diagram.
2022
Fitting parameters on spline curves produces more parsimonious models, maintaining fit quality. Smoothing the splines reduces predictive variance. Individual splines are often fit by type of variable, e.g., in age-period-cohort models. Linear and cubic splines are most common. Several smoothing criteria have been used for parameter curves, with cubic splines fit by constraining the integral of splines' squared-second derivatives popular recently. Constraining the sum of squared second differences for linear splines is analogous. Generally the degree of smoothing is selected using cross-validation. Known spline dummy-variable matrices allow regression estimation of splines, with smoothing done via constrained regression. Smoothing criteria based on sums of squares or absolute values of parameters, as in ridge regression or LASSO, improves predictive accuracy and produces splines similar to smoothing by second-derivative constraints. Variables with very low t-statistics represent points where curve-shapes barely change. Eliminating those variables leaves knots concentrated where spline shapes change. A Bayesian version of this puts shrinkage priors on spline parameters. This yields realistic joint parameter distributions, avoids problems associated with using cross-validation for parameter estimation, and readily expands to non-linear modeling, such as interactions among variable types. Regularized regression and Bayesian spline methods are compared for two example datasets.
Tutorial in biostatistics: spline smoothing with linear mixed models
Statistics in Medicine, 2005
The semi-parametric regression achieved via penalized spline smoothing can be expressed in a linear mixed models framework. This allows such models to be ÿtted using standard mixed models software routines with which many biostatisticians are familiar. Moreover, the analysis of complex correlated data structures that are a hallmark of biostatistics, and which are typically analysed using mixed models, can now incorporate directly smoothing of the relationship between an outcome and covariates. In this paper we provide an introduction to both linear mixed models and penalized spline smoothing, and describe the connection between the two. This is illustrated with three examples, the ÿrst using birth data from the U.K., the second relating mammographic density to age in a study of female twin-pairs and the third modelling the relationship between age and bronchial hyperresponsiveness in families. The models are ÿtted in R (a clone of S-plus) and using Markov chain Monte Carlo (MCMC) implemented in the package WinBUGS.
Smoothing spline estimation in varying-coefficient models
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2004
Smoothing spline estimators are considered for inference in varying-coefficient models with one effect modifying covariate. Bayesian 'confidence intervals' are developed for the coefficient curves and efficient computational methods are derived for computing the curve estimators, fitted values, posterior variances and data-adaptive methods for selecting the levels of smoothing. The efficacy and utility of the methodology proposed are demonstrated through a small simulation study and the analysis of a real data set.
The Analysis of Designed Experiments and Longitudinal Data by Using Smoothing Splines
Journal of the Royal Statistical Society: Series C (Applied Statistics), 1999
Smoothing splines and other non-parametric smoothing methods are well accepted for exploratory data analysis. These methods have been used in regression, in repeated measures or longitudinal data analysis, and in generalized linear models. However, a major drawback is the lack of a formal inferential framework. An exception which has not been fully exploited is the cubic smoothing spline. The cubic smoothing spline admits a mixed model formulation, which places this non-parametric smoother firmly in a parametric setting. The formulation presented in this paper provides the mechanism for including cubic smoothing splines in models for the analysis of designed experiments and longitudinal data. Thus nonlinear curves can be included with random effects and random coefficients, and this leads to very flexible and informative modelling within the linear mixed model framework. Variance heterogeneity can also be accommodated. The advantage of using the cubic smoothing spline in the case of longitudinal data is particularly pronounced, because covariance modelling is achieved implicitly as for random coefficient models. Several examples are considered to illustrate the ideas. Verbyla, Cullis, Kenward and Welham mechanism driving the process being observed. A major difficulty is that both estimation and inference are approximate, because the nonlinearity precludes exact likelihoods from being found. In addition, the assumptions made for the random effects may be questionable. Methods which avoid these problems are therefore very attractive.
Marginal Longitudinal Curves Estimated via Bayesian Penalized Splines
The Six Cites air pollution data is used to estimate and investigate the marginal curve of a function describing lung growth for set of children in a longitudinal study. This article proposes penalized regression spline technique based on a semiparametric mixed models (MM) framework for an additive model. This smoothing approach fits marginal models for longitudinal unbalanced measurements by using a Bayesian inference approach, implemented using a Markov chain Monte Carlo (MCMC) approach with the Gibbs sampler. The unbalanced case in which missing or different number of measurements for a set of subjects is more practical and common in real life studies. This methodology makes it possible to establish a straightforward approach to estimating similar models using R programming, when it is not possible to do so using existing codes like lme().
Bayesian Approach to Spline Smoothing
Regressions using variables categorized or listed numerically, like 1 st one, 2 nd one, etc.-such as age, weight group, year measured, etc., are often modeled with a dummy variable for each age, etc. Cubic splines are used to smooth the fitted values along age curves, year curves, etc. This can give nearly as good a fit as straight regression but with fewer variables. Spline smoothing adds a smoothing constant times a smoothness measure, often the integral of the curve's squared-second derivative, to the negative loglikelihood, which is then minimized. The smoothing constant is estimated by cross validation. Picking the knots (curve-segment connection points) is a separate estimation. Here we look at using simpler measures of curve smoothness like sum of squares of the needed parameters. This gives very similar curves across the fitted values, both in goodness of fit and visual smoothness. It also allows the fitting to be done with more standard fitting methods, like Lasso or ridge regression, with the knot selection optimized in the process. This helps modelers incorporate spline smoothing into their own more complex models. It also makes it possible to smooth using Bayesian methods. That is slower but it gives distributions for each fitted parameter and a direct estimate of the probability distribution of the smoothing constant. Cross-validation is a good method to compare models but has problems if used for estimation, discussed. Also linear splines can be modeled this way as well, and after smoothing look similar to cubic splines and are often easier and faster to fit. Modern Bayesian methods do not rely on Bayesian interpretations of probability and can be done within frequentist random effects, liberally interpreted.
Analyzing Longitudinal Data using Gee-Smoothing Spline
… . Proceedings. Mathematics and Computers in Science …, 2009
This paper considers nonparametric regression to analyze longitudinal data. Some developments of nonparametric regression have been achieved for longitudinal or clustered categorical data. For exponential family distribution, Lin & Carroll [6] considered nonparametric regression for longitudinal data using GEE-Local Polynomial Kernel (LPK). They showed that in order to obtain an efficient estimator, one must ignore within subject correlation. This means within subject observations should be assumed independent, hence the working correlation matrix must be an identity matrix. With Lin & Carroll , to obtain efficient estimates we should ignore correlation that exist in longitudinal data, even if correlation is the interest of the study. In this paper we propose GEE-Smoothing spline to analyze longitudinal data and study the property of the estimator such as the bias, consistency and efficiency. We use natural cubic spline and combine with GEE of Liang & Zeger [5] in estimation. We want to explore numerically, whether the properties of GEE-Smoothing spline are better than of GEE-Local Polynomial Kernel that proposed by Lin & Carrol [6]. Using simulation we show that GEE-Smoothing Spline is better than GEE-local polynomial. The bias of pointwise estimator is decreasing with increasing sample size. The pointwise estimator is also consistent even with incorrect correlation structure, and the most efficient estimate is obtained if the true correlation structure is used.