Pischon et al. Respond to "Variable Selection versus Shrinkage in Control of Confounders (original) (raw)
Related papers
Assessing the impact of unmeasured confounders for credible and reliable real‐world evidence
Pharmacoepidemiology and Drug Safety, 2020
Purpose: We review statistical methods for assessing the possible impact of bias due to unmeasured confounding in real world data analysis and provide detailed recommendations for choosing among the methods. Methods: By updating an earlier systematic review, we summarize modern statistical best practices for evaluating and correcting for potential bias due to unmeasured confounding in estimating causal treatment effect from non-interventional studies. Results: We suggest a hierarchical structure for assessing unmeasured confounding. First, for initial sensitivity analyses, we strongly recommend applying a recently developed method, the E-value, that is straightforward to apply and does not require prior knowledge or assumptions about the unmeasured confounder(s). When some such knowledge is available, the E-value could be supplemented by the rule-out or array method at this step. If these initial analyses suggest results may not be robust to unmeasured confounding, subsequent analyses could be conducted using more specialized statistical methods, which we categorize based on whether they require access to external data on the suspected unmeasured confounder(s), internal data, or no data. Other factors for choosing the subsequent sensitivity analysis methods are also introduced and discussed, including the types of unmeasured confounders and whether the subsequent sensitivity analysis is intended to provide a corrected causal treatment effect. Conclusion: Various analytical methods have been proposed to address unmeasured confounding, but little research has discussed a structured approach to select appropriate methods in practice. In providing practical suggestions for choosing appropriate initial and, potentially, more specialized subsequent sensitivity analyses, we hope to facilitate the widespread reporting of such sensitivity analyses in non-interventional studies. The suggested approach also has the potential to inform pre-specification of sensitivity analyses before executing the analysis, and therefore increase the transparency and limit selective study reporting.
Sensitivity analysis for an unmeasured confounder: a review of two independent methods
Revista Brasileira De Epidemiologia, 2010
One of the main purposes of epidemiological studies is to estimate causal effects. Causal inference should be addressed by observational and experimental studies. A strong constraint for the interpretation of observational studies is the possible presence of unobserved confounders (hidden biases). An approach for assessing the possible effects of unobserved confounders may be drawn up through the use of a sensitivity analysis that determines how strong the effects of an unmeasured confounder should be to explain an apparent association, and which should be the characteristics of this confounder to exhibit such an effect. The purpose of this paper is to review and integrate two independent sensitivity analysis methods. The two methods are presented to assess the impact of an unmeasured confounder variable: one developed by Greenland under an epidemiological perspective, and the other developed from a statistical standpoint by Rosenbaum. By combining (or merging) epidemiological and statistical issues, this integration became a more complete and direct sensitivity analysis, encouraging its required diffusion and additional applications. As observational studies are more subject to biases and confounding than experimental settings, the consideration of epidemiological and statistical aspects in sensitivity analysis strengthens the causal inference.
Methodological issues of confounding in analytical epidemiologic studies
Caspian journal of internal medicine, 2012
Confounding can be thought of as mixing the effect of exposure on the risk of disease with a third factor which distorts the measure of association such as risk ratio or odds ratio. This bias arises because of complex functional relationship of confounder with both exposure and disease (outcome). In this article, we provided a conceptual framework review of confounding issues in epidemiologic studies, in particular in observational studies and nonrandomized experimental studies. We have shown in 2 by 2 tables with analytical examples how the index of association will be distorted when confounding is present. The criteria, source of confounding and several points in confounding issues have been addressed. The advantages and disadvantages of several strategies for control of confounding have been discussed.
European Journal of Epidemiology, 2019
A review of epidemiological papers conducted in 2009 concluded that several studies employed variable selection methods susceptible to introduce bias and yield inadequate inferences. Many new confounder selection methods have been developed since then. The goal of the study was to provide an updated descriptive portrait of which variable selection methods are used by epidemiologists for analyzing observational data. Studies published in four major epidemiological journals in 2015 were reviewed. Only articles concerned with a predictive or explicative objective and reporting on the analysis of individual data were included. Method(s) employed for selecting variables were extracted from retained articles. A total of 975 articles were retrieved and 299 met eligibility criteria, 292 of which pursued an explicative objective. Among those, 146 studies (50%) reported using prior knowledge or causal graphs for selecting variables, 34 (12%) used change in effect estimate methods, 26 (9%) used stepwise approaches, 16 (5%) employed univariate analyses, 5 (2%) used various other methods and 107 (37%) did not provide sufficient details to allow classification (more than one method could be employed in a single article). Despite being less frequent than in the previous review, stepwise and univariable analyses, which are susceptible to introduce bias and produce inadequate inferences, were still prevalent. Moreover, 37% studies did not provide sufficient details to assess how variables were selected. We thus believe there is still room for improvement in variable selection methods used by epidemiologists and in their reporting.
Attributable fraction for multiple risk factors: Methods, interpretations, and examples
Statistical Methods in Medical Research, 2019
The attributable fraction is the candidate tool to quantify individual shares of each risk factor on the disease burden in a population, expressing the proportion of cases ascribable to the risk factors. The original formula ignored the presence of other factors (i.e. multiple risk factors and/or confounders), and several adjusting methods for potential confounders have been proposed. However, crude and adjusted attributable fractions do not sum up to their joint attributable fraction (i.e. the number of cases attributable to all risk factors together) and their sum may exceed one. A different approach consists of partitioning the joint attributable fraction into exposure-specific shares leading to sequential and average attributable fractions. We provide an example using Italian case-control data on oral cavity cancer comparing crude, adjusted, sequential, and average attributable fractions for smoking and alcohol and provide an overview of the available software routines for their estimation. For each method, we give interpretation and discuss shortcomings. Crude and adjusted attributable fractions added up over than one, whereas sequential and average methods added up to the joint attributable fraction ¼ 0.8112 (average attributable fractions for smoking and alcohol were 0.4894 and 0.3218, respectively). The attributable fraction is a well-known epidemiological measure that translates risk factors prevalence and disease occurrence in useful figures for a public health perspective. This work endorses their proper use and interpretation.
To reduce bias by residual confounding in nonrandomized database studies, the high-dimensional propensity score (hd-PS) algorithm selects and adjusts for previously unmeasured confounders. The authors evaluated whether hd-PS maintains its capabilities in small cohorts that have few exposed patients or few outcome events. In 4 North American pharmacoepidemiologic cohort studies between 1995 and 2005, the authors repeatedly sampled the data to yield increasingly smaller cohorts. They identified potential confounders in each sample and estimated both an hd-PS that included 0–500 covariates and treatment effects adjusted by decile of hd-PS. For sensitivity analyses, they altered the variable selection process to use zero-cell correction and, separately, to use only the variables' exposure association. With >50 exposed patients with an outcome event, hd-PS-adjusted point estimates in the small cohorts were similar to the full-cohort values. With 25–50 exposed events, both sensitivity analyses yielded estimates closer to those obtained in the full data set. Point estimates generally did not change as compared with the full data set when selecting >300 covariates for the hd-PS. In these data, using zero-cell correction or exposure-based covariate selection allowed hd-PS to function robustly with few events. hd-PS is a flexible analytical tool for nonrandomized research across a range of study sizes and event frequencies.
Statistical Methods in Medical Research, 2012
In a previously published article in this journal, Vansteeland et al. [ Stat Methods Med Res. Epub ahead of print 12 November 2010. DOI: 10.1177/0962280210387717] address confounder selection in the context of causal effect estimation in observational studies. They discuss several selection strategies and propose a procedure whose performance is guided by the quality of the exposure effect estimator. The authors note that when a particular linearity condition is met, consistent estimation of the target parameter can be achieved even under dual misspecification of models for the association of confounders with exposure and outcome and demonstrate the performance of their procedure relative to other estimators when this condition holds. Our earlier published work on collaborative targeted minimum loss based learning provides a general theoretical framework for effective confounder selection that explains the findings of Vansteelandt et al. and underscores the appropriateness of their ...
Statistics in Medicine
In epidemiology one typically wants to estimate the risk of an outcome associated with an exposure after adjusting for confounders. Sometimes outcome and exposure and maybe some confounders are available in a large data set, whereas some important confounders are only available in a validation data set that is typically a subset of the main data set. A generally applicable method in this situation is the two stage calibration (TSC) method. We present a simplified easy-to-implement version of the TSC for the case where the validation data is a subset of the main data. We compared the simplified version to the standard TSC version for incidence rate ratios, odds-ratios, relative risks, and hazard ratios using simulated data and the simplified version performed better than our implementation of the standard version. The simplified version was also tested on real data and performed well.
Controlling for continuous confounding factors: non- and semiparametric approaches
Revue d'épidémiologie et de santé publique, 2005
Confounding is one of the major types of bias encountered in observational epidemiologic surveys designed to study the relation between an exposure factor and a health event. A common way to remove confounding bias during the statistical analysis phase is to adjust for the confounders in a regression model. If a confounding factor is assessed as a continuous variable, it is necessary to define how the variable is entered into the regression model. In the case of logistic regression, we illustrate through simulation that coding by a binary variable or a categorical variable with broad categories may lead to substantial residual confounding. Specific approaches can be used to define a coding method that limits residual confounding. Among these, we briefly present nonparametric approaches and describe in detail several semiparametric approaches (generalised partial linear models, spline regression and fractional polynomials). These can be used to estimate the relation between a continu...
Annals of Epidemiology, 2002
In cohort studies of common outcomes, odds ratios (ORs) may seriously overestimate the true effect of an exposure on the outcome of interest (as measured by the risk ratio [RR]). Since few study designs require ORs (most frequently, case-control studies), their popularity is due to the widespread use of logistic regression. Because ORs are used to approximate RRs so frequently, methods have been published in the general medical literature describing how to convert ORs to RRs; however, these methods may produce inaccurate confidence intervals (CIs). The authors explore the use of binomial regression as an alternative technique to directly estimate RRs and associated CIs in cohort studies of common outcomes. METHODS: Using actual study data, the authors describe how to perform binomial regression using the SAS System for Windows, a statistical analysis program widely used by US health researchers. RESULTS: In a sample data set, the OR for the exposure of interest overestimated the RR more than twofold. The 95% CIs for the OR and converted RR were wider than for the directly estimated RR. CONCLUSIONS: The authors argue that for cohort studies, the use of logistic regression should be sharply curtailed, and that instead, binomial regression be used to directly estimate RRs and associated CIs.