Focused Information Criteria for the Linear Hazard Regression Model (original) (raw)

Focussed information criteria and model averaging for Cox's hazard regression model

2004

This article is concerned with variable selection methods for the proportional hazards regression model. Including too many covariates causes extra variability and inflated confidence intervals for regression parameters, so regimes for discarding the less informative ones are needed. Our framework has p covariates designated as 'protected' while variables from a further set of q covariates are examined for possible in-or exclusion. In addition to deriving results for the AIC method, defined via the partial likelihood, we develop a focussed information criterion that for given interest parameter finds the best subset of covariates. Thus the FIC might find that the best model for predicting median survival time might be different from the best model for estimating survival probabilities, and the best overall model for analysing survival for men might not be the same as the best overall model for analysing survival for women. We also develop methodology for model averaging, where the final estimate of a quantity is a weighted average of estimates computed for a range of submodels. Our methods are illustrated in simulations and for a survival study of Danish skin cancer patients.

Focused Information Criteria and Model Averaging for the Cox Hazard Regression Model

Journal of The American Statistical Association, 2004

This article is concerned with variable selection methods for the proportional hazards regression model. Including too many covariates causes extra variability and inflated confidence intervals for regression parameters, so regimes for discarding the less informative ones are needed. Our framework has p covariates designated as 'protected' while variables from a further set of q covariates are examined for possible in-or exclusion. In addition to deriving results for the AIC method, defined via the partial likelihood, we develop a focussed information criterion that for given interest parameter finds the best subset of covariates. Thus the FIC might find that the best model for predicting median survival time might be different from the best model for estimating survival probabilities, and the best overall model for analysing survival for men might not be the same as the best overall model for analysing survival for women. We also develop methodology for model averaging, where the final estimate of a quantity is a weighted average of estimates computed for a range of submodels. Our methods are illustrated in simulations and for a survival study of Danish skin cancer patients.

Selection between proportional and stratified hazards models based on expected log-likelihood

2009

The problem of selecting between semi-parametric and proportional hazards models is considered. We propose to make this choice based on the expectation of the log-likelihood (ELL) which can be estimated by the likelihood cross-validation (LCV) criterion. The criterion is used to choose an estimator in families of semi-parametric estimators defined by the penalized likelihood. A simulation study shows that the ELL criterion performs nearly as well in this problem as the optimal Kullback-Leibler criterion in term of Kullback-Leibler distance and that LCV performs reasonably well. The approach is applied to a model of age-specific risk of dementia as a function of sex and educational level from the data of a large cohort study.

Model selection in nonparametric hazard regression

Journal of Nonparametric Statistics, 2006

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article maybe used for research, teaching and private study purposes. Any substantial or systematic reproduction, redistribution , reselling , loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Choice of Estimators Based on Different Observations: Modified AIC and LCV Criteria

Scandinavian Journal of Statistics, 2010

Abstract. It is quite common in epidemiology that we wish to assess the quality of estimators on a particular set of information, whereas the estimators may use a larger set of information. Two examples are studied: the first occurs when we construct a model for an event which happens if a continuous variable is above a certain threshold. We can compare estimators based on the observation of only the event or on the whole continuous variable. The other example is that of predicting the survival based only on survival information or using in addition information on a disease. We develop modified Akaike information criterion (AIC) and Likelihood cross-validation (LCV) criteria to compare estimators in this non-standard situation. We show that a normalized difference of AIC has a bias equal to o(n−1) if the estimators are based on well-specified models; a normalized difference of LCV always has a bias equal to o(n−1). A simulation study shows that both criteria work well, although the normalized difference of LCV tends to be better and is more robust. Moreover in the case of well-specified models the difference of risks boils down to the difference of statistical risks which can be rather precisely estimated. For ‘compatible’ models the difference of risks is often the main term but there can also be a difference of mis-specification risks.

Model Selection Strategy for Cox Proportional Hazards Model

Dhaka University Journal of Science

Often in survival regression modelling, not all predictors are relevant to the outcome variable. Discarding such irrelevant variables is very crucial in model selection. In this research, under Cox Proportional Hazards (PH) model we study different model selection criteria including Stepwise selection, Least Absolute Shrinkage and Selection Operator (LASSO), Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and the extended versions of AIC and BIC to the Cox model. The simulation study shows that varying censoring proportions and correlation coefficients among the covariates have great impact on the performances of the criteria to identify a true model. In the presence of high correlation among the covariates, the success rate for identifying the true model is higher for LASSO compared to other criteria. The extended version of BIC always shows better result than the traditional BIC. We have also applied these techniques to real world data. Dhaka Univ. J. Sci....

Variable Selection by sNML Criterion in Logistic Regression with an Application to a Risk-Adjustment Model for Hip Fracture Mortality

2012

When comparing the performance of health care providers, it is important that the effect of such factors that have an unwanted effect on the performance indicator (eg. mortality) is ruled out. In register based studies randomization is out of question. We develop a risk adjustment model for hip fracture mortality in Finland by using logistic regression. The model is used to study the impact of the length of the register follow-up period on adjusting the performance indicator for a set of comorbidities. The comorbidities are congestive heart failure, cancer and diabetes. We also introduce an implementation of the minimum description length (MDL) principle for model selection in logistic regression. This is done by using the normalized maximum likelihood (NML) technique. The computational burden becomes too heavy to apply the usual NML criterion and therefore a technique based on the idea of sequentially normalized maximum likelihood (sNML) is introduced. The sNML criterion can be evaluated efficiently also for large models with large amounts of data. The results given by sNML are then compared to the corresponding results given by the traditional AIC and BIC model selection criteria. All three comorbidities have clearly an effect on hip fracture mortality. The results indicate that for congestive heart failure all available medical history should be used, while for cancer it is enough to use only records from half a year before the fracture. For diabetes the choice of time period is not as clear, but using records from three years before the fracture seems to be a reasonable choice.

coxphMIC: An R Package for Sparse Estimation of Cox Proportional Hazards Models via Approximated Information Criteria

The R Journal

In this paper, we describe an R package named coxphMIC, which implements the sparse estimation method for Cox proportional hazards models via approximated information criterion (Su et al., 2016). The developed methodology is named MIC which stands for "Minimizing approximated Information Criteria". A reparameterization step is introduced to enforce sparsity while at the same time keeping the objective function smooth. As a result, MIC is computationally fast with a superior performance in sparse estimation. Furthermore, the reparameterization tactic yields an additional advantage in terms of circumventing post-selection inference (Leeb and Pötscher, 2005). The MIC method and its R implementation are introduced and illustrated with the PBC data.

Sparse estimation of Cox proportional hazards models via approximated information criteria

Biometrics, 2016

We propose a new sparse estimation method for Cox (1972) proportional hazards models by optimizing an approximated information criterion. The main idea involves approximation of the 0 norm with a continuous or smooth unit dent function. The proposed method bridges the best subset selection and regularization by borrowing strength from both. It mimics the best subset selection using a penalized likelihood approach yet with no need of a tuning parameter. We further reformulate the problem with a reparameterization step so that it reduces to one unconstrained nonconvex yet smooth programming problem, which can be solved efficiently as in computing the maximum partial likelihood estimator (MPLE). Furthermore, the reparameterization tactic yields an additional advantage in terms of circumventing postselection inference. The oracle property of the proposed method is established. Both simulated experiments and empirical examples are provided for assessment and illustration.