Bayesian sample size determination for binary regression with a misclassified covariate and no gold standard (original) (raw)
Related papers
Statistics in medicine, 2009
We develop a simulation-based procedure for determining the required sample size in binomial regression risk assessment studies when response data are subject to misclassification. A Bayesian average power criterion is used to determine a sample size that provides high probability, averaged over the distribution of potential future data sets, of correctly establishing the direction of association between predictor variables and the probability of event occurrence. The method is broadly applicable to any parametric binomial regression model including, but not limited to, the popular logistic, probit, and complementary log-log models. We detail a common medical scenario wherein ascertainment of true disease status is impractical or otherwise impeded, and in its place the outcome of a single binary diagnostic test is used as a surrogate. These methods are then extended to the two diagnostic test setting. We illustrate the method with categorical covariates using one example that involves screening for human papillomavirus. This example coupled with results from simulated data highlight the utility of our Bayesian sample size procedure with error prone measurements.
A flexible Bayesian algorithm for sample size determination in the case of misclassification
The problem of obtaining a flexible and easy to implement algorithm to derive the optimal sample size when the data are subject to misclassification is critical to practitioners. The problem is addressed from the Bayesian point of view where a special structure of the a priori parameter information is investigated. The proposed methodology is applied in specific examples.
Binary data in the presence of covariates and misclassifications: A Bayesian approach
Brazilian Journal of Probability and Statistics
In this paper we introduce a Bayesian analysis for binary data in the presence of covariates and misclassifications. As a special situation in diagnostic medical testing, we obtain Bayesian inferences for the sensitivity and the specificity in the presence of covariates. We consider a situation where the individuals can be verified or unverified about their real disease status after a test. When part or even all individuals are not verified, usually we have great difficulties to get classical inference results for the parameters of interest. For this situation, the introduction of latent variables gives a good alternative to deal with missing data under the Bayesian approach, specially using Markov chain monte Carlo (MCMC) methods to obtain the posterior summaries of interest. We illustrate the proposed methodology on three real data sets.
Journal of The Royal Statistical Society Series C-applied Statistics, 2000
We investigate the sample size problem when a binomial parameter is to be estimated, but some degree of misclassi®cation is possible. The problem is especially challenging when the degree to which misclassi®cation occurs is not exactly known. Motivated by a Canadian survey of the prevalence of toxoplasmosis infection in pregnant women, we examine the situation where it is desired that a marginal posterior credible interval for the prevalence of width w has coverage 1 À , using a Bayesian sample size criterion. The degree to which the misclassi®cation probabilities are known a priori can have a very large effect on sample size requirements, and in some cases achieving a coverage of 1 À is impossible, even with an in®nite sample size. Therefore, investigators must carefully evaluate the degree to which misclassi®cation can occur when estimating sample size requirements.
A flexible Bayesian algorithm for sample size calculations in misclassified data
Applied Mathematics and Computation, 2007
The problem of obtaining a flexible and easy to implement algorithm in order to derive the optimal sample size when the data are subject to misclassification is critical to practitioners. The topic is addressed from the Bayesian point of view where a special structure of the a priori parameter information is investigated. The proposed methodology is applied in specific examples.
Statistics in Medicine, 2005
Misclassiÿcation in a binary exposure variable within an unmatched prospective study may lead to a biased estimate of the disease-exposure relationship. It usually gives falsely small credible intervals because uncertainty in the recorded exposure is not taken into account. When there are several other perfectly measured covariates, interrelationships may introduce further potential for bias. Bayesian methods are proposed for analysing binary outcome studies in which an exposure variable is sometimes misclassiÿed, but its correct values have been validated for a random subsample of the subjects. This Bayesian approach can model relationships between explanatory variables and between exploratory variables and the probabilities of misclassiÿcation. Three logistic regressions are used to relate disease to true exposure, misclassiÿed exposure to true exposure and true exposure to other covariates. Credible intervals may be used to make decisions about whether certain parameters are unnecessary and hence whether the model can be reduced in complexity. In the disease-exposure model, for parameters representing coe cients related to perfectly measured covariates, the precision of posterior estimates is only slightly lower than would be found from data with no misclassiÿcation. For the risk factor which has misclassiÿcation, the estimates of model coe cients obtained are much less biased than those with misclassiÿcation ignored. Copyright ? 2005 John Wiley & Sons, Ltd.
Statistics in Medicine, 2005
The authors of this paper present an analysis of matched data where the exposure is misclassified. Their approach is fully Bayesian, with vague priors on all parameters; as is common these are intended to be 'non-informative' in an informal way, and the authors pragmatically investigate this property by conducting a sensitivity analysis. While this is a reasonable and cautious approach, I write to suggest that it be contrasted with my own likelihood-based approach (Rice, 2003), a generalization of the standard conditional-likelihood approach for non-misclassified data. As illustrated below, I believe this random-effects formulation is simpler, more flexible, and much more computationally straightforward.
Binomial Regression with Misclassification
Biometrics, 2003
Motivated by a study of human papillomavirus infection in women, we present a Bayesian binomial regression analysis in which the response is subject to an unconstrained misclassification process. Our iterative approach provides inferences for the parameters that describe the relationships of the covariates with the response and for the misclassification probabilities. Furthermore, our approach applies to any meaningful generalized linear model, making model selection possible. Finally, it is straightforward to extend it to multinomial settings.
Cancer Epidemiology, 2013
Background-Recent research suggests that the Bayesian paradigm may be useful for modeling biases in epidemiological studies, such as those due to misclassification and missing data. We used Bayesian methods to perform sensitivity analyses for assessing the robustness of study findings to the potential effect of these two important sources of bias. Methods-We used data from a study of the joint associations of radiotherapy and smoking with primary lung cancer among breast cancer survivors. We used Bayesian methods to provide an operational way to combine both validation data and expert opinion to account for misclassification of the two risk factors and missing data. For comparative purposes we considered a "full model" that allowed for both misclassification and missing data, along with alternative models that considered only misclassification or missing data, and the naïve model that ignored both sources of bias. Results-We identified noticeable differences between the four models with respect to the posterior distributions of the odds ratios that described the joint associations of radiotherapy and smoking with primary lung cancer. Despite those differences we found that the general conclusions regarding the pattern of associations were the same regardless of the model used.