Methods for analyzing data from probabilistic linkage strategies based on partially identifying variables - PubMed (original) (raw)
. 2012 Dec 30;31(30):4231-42.
doi: 10.1002/sim.5498. Epub 2012 Jul 16.
Affiliations
- PMID: 22807060
- DOI: 10.1002/sim.5498
Methods for analyzing data from probabilistic linkage strategies based on partially identifying variables
M H P Hof et al. Stat Med. 2012.
Abstract
In record linkage studies, unique identifiers are often not available, and therefore, the linkage procedure depends on combinations of partially identifying variables with low discriminating power. As a consequence, wrongly linked covariate and outcome pairs will be created and bias further analysis of the linked data. In this article, we investigated two estimators that correct for linkage error in regression analysis. We extended the estimators developed by Lahiri and Larsen and also suggested a weighted least squares approach to deal with linkage error. We considered both linear and logistic regression problems and evaluated the performance of both methods with simulations. Our results show that all wrong covariate and outcome pairs need to be removed from the analysis in order to calculate unbiased regression coefficients in both approaches. This removal requires strong assumptions on the structure of the data. In addition, the bias significantly increases when the assumptions do not hold and wrongly linked records influence the coefficient estimation. Our simulations showed that both methods had similar performance in linear regression problems. With logistic regression problems, the weighted least squares method showed less bias. Because the specific structure of the data in record linkage problems often leads to different assumptions, it is necessary that the analyst has prior knowledge on the nature of the data. These assumptions are more easily introduced in the weighted least squares approach than in the Lahiri and Larsen estimator.
Copyright © 2012 John Wiley & Sons, Ltd.
Similar articles
- A mixture model for the analysis of data derived from record linkage.
Hof MH, Zwinderman AH. Hof MH, et al. Stat Med. 2015 Jan 15;34(1):74-92. doi: 10.1002/sim.6315. Epub 2014 Oct 2. Stat Med. 2015. PMID: 25274539 - Comparing least-squares and quantile regression approaches to analyzing median hospital charges.
Olsen CS, Clark AE, Thomas AM, Cook LJ. Olsen CS, et al. Acad Emerg Med. 2012 Jul;19(7):866-75. doi: 10.1111/j.1553-2712.2012.01388.x. Acad Emerg Med. 2012. PMID: 22805633 - Linear mixed models for replication data to efficiently allow for covariate measurement error.
Bartlett JW, De Stavola BL, Frost C. Bartlett JW, et al. Stat Med. 2009 Nov 10;28(25):3158-78. doi: 10.1002/sim.3713. Stat Med. 2009. PMID: 19777493 - Bayesian perspectives for epidemiological research. II. Regression analysis.
Greenland S. Greenland S. Int J Epidemiol. 2007 Feb;36(1):195-202. doi: 10.1093/ije/dyl289. Epub 2007 Feb 28. Int J Epidemiol. 2007. PMID: 17329317 Review. - Measurement error correction using validation data: a review of methods and their applicability in case-control studies.
Thürigen D, Spiegelman D, Blettner M, Heuer C, Brenner H. Thürigen D, et al. Stat Methods Med Res. 2000 Oct;9(5):447-74. doi: 10.1177/096228020000900504. Stat Methods Med Res. 2000. PMID: 11191260 Review.
Cited by
- A MULTIPLE IMPUTATION PROCEDURE FOR RECORD LINKAGE AND CAUSAL INFERENCE TO ESTIMATE THE EFFECTS OF HOME-DELIVERED MEALS.
Shan M, Thomas KS, Gutman R. Shan M, et al. Ann Appl Stat. 2021 Mar;15(1):412-436. doi: 10.1214/20-aoas1397. Epub 2021 Mar 18. Ann Appl Stat. 2021. PMID: 35755005 Free PMC article. - Error adjustments for file linking methods using encrypted unique client identifier (eUCI) with application to recently released prisoners who are HIV+.
Gutman R, Sammartino CJ, Green TC, Montague BT. Gutman R, et al. Stat Med. 2016 Jan 15;35(1):115-29. doi: 10.1002/sim.6586. Epub 2015 Jul 21. Stat Med. 2016. PMID: 26202853 Free PMC article. - Effect of parental and ART treatment characteristics on perinatal outcomes.
Pontesilli M, Hof MH, Ravelli ACJ, van Altena AJ, Soufan AT, Mol BW, Kostelijk EH, Slappendel E, Consten D, Cantineau AEP, van der Westerlaken LAJ, van Inzen W, Dumoulin JCM, Ramos L, Baart EB, Broekmans FJM, Rijnders PM, Curfs MHJM, Mastenbroek S, Repping S, Roseboom TJ, Painter RC. Pontesilli M, et al. Hum Reprod. 2021 May 17;36(6):1640-1665. doi: 10.1093/humrep/deab008. Hum Reprod. 2021. PMID: 33860303 Free PMC article. - Evaluating bias due to data linkage error in electronic healthcare records.
Harron K, Wade A, Gilbert R, Muller-Pebody B, Goldstein H. Harron K, et al. BMC Med Res Methodol. 2014 Mar 5;14:36. doi: 10.1186/1471-2288-14-36. BMC Med Res Methodol. 2014. PMID: 24597489 Free PMC article. - ATLAS: an automated association test using probabilistically linked health records with application to genetic studies.
Zhang HG, Hejblum BP, Weber GM, Palmer NP, Churchill SE, Szolovits P, Murphy SN, Liao KP, Kohane IS, Cai T. Zhang HG, et al. J Am Med Inform Assoc. 2021 Nov 25;28(12):2582-2592. doi: 10.1093/jamia/ocab187. J Am Med Inform Assoc. 2021. PMID: 34608931 Free PMC article.
MeSH terms
LinkOut - more resources
Research Materials