Data cleaning: detecting, diagnosing, and editing data abnormalities - PubMed (original) (raw)
Data cleaning: detecting, diagnosing, and editing data abnormalities
Jan Van den Broeck et al. PLoS Med. 2005 Oct.
Abstract
In this policy forum the authors argue that data cleaning is an essential part of the research process, and should be incorporated into study design.
Conflict of interest statement
Competing Interests: The authors have declared that no competing interests exist.
Figures
Figure 1. A Data-Cleaning Framework
(Illustration: Giovanni Maki)
Figure 2. Areas within the Range of a Continuous Variable Defined by Hard and Soft Cutoffs for Error Screening and Diagnosis, with Recommended Diagnostic Steps for Data Points Falling in Each Area
(Illustration: Giovanni Maki)
Similar articles
- Commentary.
Breslow NE. Breslow NE. Biostatistics. 2010 Jul;11(3):379-80. doi: 10.1093/biostatistics/kxq025. Biostatistics. 2010. PMID: 20538870 No abstract available. - The role of the unblinded sponsor statistician.
Snapinn S, Cook T, Shapiro D, Snavely D. Snapinn S, et al. Stat Med. 2004 May 30;23(10):1531-3. doi: 10.1002/sim.1790. Stat Med. 2004. PMID: 15122733 - Independence of the statistician who analyses unblinded data.
Siegel JP, O'Neill RT, Temple R, Campbell G, Foulkes MA. Siegel JP, et al. Stat Med. 2004 May 30;23(10):1527-9. doi: 10.1002/sim.1789. Stat Med. 2004. PMID: 15122732 - What we owe the author: rethinking editorial peer review.
Crigger NJ. Crigger NJ. Nurs Ethics. 1998 Sep;5(5):451-8. doi: 10.1177/096973309800500508. Nurs Ethics. 1998. PMID: 9782929 Review. - Five easy pieces on evidence-based medicine (4).
Kalso E, Edwards J, McQuay HJ, Moore RA. Kalso E, et al. Eur J Pain. 2002;6(1):89-93. doi: 10.1053/eujp.2001.0306. Eur J Pain. 2002. PMID: 11888232 Review. No abstract available.
Cited by
- Person-, Job-, and Environment-Related Factors Associated with Long-Term Job Retention of People with Physical Disabilities.
Kudla A, Dinelli EJ, Capraro P, S Crown D, Sheth M, Trierweiler R, Munsell E, Wong J, Heinemann AW. Kudla A, et al. J Occup Rehabil. 2024 Nov 3. doi: 10.1007/s10926-024-10245-4. Online ahead of print. J Occup Rehabil. 2024. PMID: 39488815 - Optimizing Response Rates to Examine Health IT Maturity and Nurse Practitioner Care Environments in US Nursing Homes: Mixed Mode Survey Recruitment Protocol.
Alexander GL, Poghosyan L, Zhao Y, Hobensack M, Kisselev S, Norful AA, McHugh J, Wise K, Schrimpf MB, Kolanowski A, Bhatia T, Tasnova S. Alexander GL, et al. JMIR Res Protoc. 2024 Aug 29;13:e56170. doi: 10.2196/56170. JMIR Res Protoc. 2024. PMID: 39207828 Free PMC article. - Detecting potential outliers in longitudinal data with time-dependent covariates.
Mramba LK, Liu X, Lynch KF, Yang J, Aronsson CA, Hummel S, Norris JM, Virtanen SM, Hakola L, Uusitalo UM, Krischer JP. Mramba LK, et al. Eur J Clin Nutr. 2024 Apr;78(4):344-350. doi: 10.1038/s41430-023-01393-6. Epub 2024 Jan 3. Eur J Clin Nutr. 2024. PMID: 38172348 Free PMC article. - Seminar: Scalable Preprocessing Tools for Exposomic Data Analysis.
Grady SK, Dojcsak L, Harville EW, Wallace ME, Vilda D, Donneyong MM, Hood DB, Valdez RB, Ramesh A, Im W, Matthews-Juarez P, Juarez PD, Langston MA. Grady SK, et al. Environ Health Perspect. 2023 Dec;131(12):124201. doi: 10.1289/EHP12901. Epub 2023 Dec 18. Environ Health Perspect. 2023. PMID: 38109119 Free PMC article. - Risk Prediction Model for Chronic Kidney Disease in Thailand Using Artificial Intelligence and SHAP.
Tsai MC, Lojanapiwat B, Chang CC, Noppakun K, Khumrin P, Li SH, Lee CY, Lee HC, Khwanngern K. Tsai MC, et al. Diagnostics (Basel). 2023 Nov 28;13(23):3548. doi: 10.3390/diagnostics13233548. Diagnostics (Basel). 2023. PMID: 38066789 Free PMC article.
References
- International Conference on Harmonization. Guideline for good clinical practice: ICH harmonized tripartite guideline. Geneva: International Conference on Harmonization; 1997. Available: http://www.ich.org/MediaServer.jser?@_ID=482&@_MODE=GLB. Accessed 29 July 2005.
- Association for Clinical Data Management. ACDM guidelines to facilitate production of a data handling protocol. St. Albans (United Kingdom): Association for Clinical Data Management; 2003. Available: http://www.acdm.org.uk/files/pubs/DHP%20Guidelines.doc. Accessed 28 July 2005.
- Food and Drug Administration. Guidance for industry: Computerized systems used in clinical trials. Washington (D. C.): Food and Drug Administration; 1999. Available: http://www.fda.gov/ora/compliance_ref/bimo/ffinalcct.htm. Accessed 28 July 2005.
- Society for Clinical Data Management. Good clinical data management practices, version 3.0. Milwaukee (Wisconsin): Society for Clinical Data Management; 2003. Available: http://www.scdm.org/GCDMP. Accessed 28 July 2005.
- Armitage P, Berry G. Statistical methods in medical research, 2nd ed. Oxford: Blackwell Scientific Publications; 1987. 559 pp.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical