Methods to impute missing genotypes for population data - PubMed (original) (raw)

. 2007 Dec;122(5):495-504.

doi: 10.1007/s00439-007-0427-y. Epub 2007 Sep 13.

Affiliations

Methods to impute missing genotypes for population data

Zhaoxia Yu et al. Hum Genet. 2007 Dec.

Abstract

For large-scale genotyping studies, it is common for most subjects to have some missing genetic markers, even if the missing rate per marker is low. This compromises association analyses, with varying numbers of subjects contributing to analyses when performing single-marker or multi-marker analyses. In this paper, we consider eight methods to infer missing genotypes, including two haplotype reconstruction methods (local expectation maximization-EM, and fastPHASE), two k-nearest neighbor methods (original k-nearest neighbor, KNN, and a weighted k-nearest neighbor, wtKNN), three linear regression methods (backward variable selection, LM.back, least angle regression, LM.lars, and singular value decomposition, LM.svd), and a regression tree, Rtree. We evaluate the accuracy of them using single nucleotide polymorphism (SNP) data from the HapMap project, under a variety of conditions and parameters. We find that fastPHASE has the lowest error rates across different analysis panels and marker densities. LM.lars gives slightly less accurate estimate of missing genotypes than fastPHASE, but has better performance than the other methods.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Hum Hered. 2005;59(4):185-9 - PubMed
    1. Annu Rev Genet. 1995;29:423-44 - PubMed
    1. Nat Genet. 2006 Aug;38(8):904-9 - PubMed
    1. Genet Epidemiol. 2006 Dec;30(8):690-702 - PubMed
    1. Am J Hum Genet. 2003 Nov;73(5):1162-9 - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources