Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays - PubMed (original) (raw)
Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays
Nils Homer et al. PLoS Genet. 2008.
Abstract
We use high-density single nucleotide polymorphism (SNP) genotyping microarrays to demonstrate the ability to accurately and robustly determine whether individuals are in a complex genomic DNA mixture. We first develop a theoretical framework for detecting an individual's presence within a mixture, then show, through simulations, the limits associated with our method, and finally demonstrate experimentally the identification of the presence of genomic DNA of specific individuals within a series of highly complex genomic mixtures, including mixtures where an individual contributes less than 0.1% of the total genomic DNA. These findings shift the perceived utility of SNPs for identifying individual trace contributors within a forensics mixture, and suggest future research efforts into assessing the viability of previously sub-optimal DNA sources due to sample contamination. These findings also suggest that composite statistics across cohorts, such as allele frequency or genotype counts, do not mask identity within genome-wide association studies. The implications of these findings are discussed.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Figure 1. To give insight into the intuition behind our method, we present for a given SNP three different scenarios for the possible allele frequency of the person of interest corresponding to the genotypes AA, AB, and BB.
The allele frequencies of the reference population, person of interest, and the mixture are described as Mi, Yi, and Popi respectively. We see that the distance measure is greater (and positive) when the Yi of the person of interest is closer to the Mi of the mixture than to the Popi of the reference population. Similarly, the distance measure is smaller (and negative) when the Yi of the person of interest is closer to the Popi of the reference population than to Mi of the mixture. Our test statistic is then the z-score using this distance measure.
Figure 2. Simulation Results.
Using 1423 Wellcome Trust 58C individuals, we give log scaled p-values for simulations based on three variables: the number of SNPs (s), the fraction of the individual in the mixture (f), and the probe variance (vp). The graphs plot the relationships between the three variables with a different variable fixed in each graph. The log scaled p-values are represented by the color of each point in the graph, as well as the z-axis on the right graphs. These simulations suggest that we should be able to resolve mixtures where a given individual is 0.1% of the mixture (f), probe variance is at most 0.01 (vp) and the number of SNPs probed is 50,000 (s).
Figure 3. Experimental validation using a series of mixtures (see Methods A–F) assayed on the Affymetrix GeneChip 5.0, Illumina BeadArray 550 and the Illumina 450S Duo Human BeadChip.
The x-axis shows each individual in the CEU HapMap population, the left y-axis shows the p-value (log scaled), and the right y-axis shows the value of the test statistic. For mixtures A, B, E and F those in the mixture are colored green and those not in the mixture are colored red. For mixtures C and D those individuals who are not in the mixtures are colored red, those individuals who are related to the 1% or 10% individuals in the mixtures are colored orange, those individuals who are related to the 90% or 99% are colored yellow, and those people in the mixture are colored green. In all mixtures, the identification of the presence of a person's genomic DNA was possible.
Comment in
- Public access to genome-wide data: five views on balancing research with privacy and protection.
P3G Consortium; Church G, Heeney C, Hawkins N, de Vries J, Boddington P, Kaye J, Bobrow M, Weir B. P3G Consortium, et al. PLoS Genet. 2009 Oct;5(10):e1000665. doi: 10.1371/journal.pgen.1000665. Epub 2009 Oct 2. PLoS Genet. 2009. PMID: 19798440 Free PMC article. No abstract available.
Similar articles
- On inferring presence of an individual in a mixture: a Bayesian approach.
Clayton D. Clayton D. Biostatistics. 2010 Oct;11(4):661-73. doi: 10.1093/biostatistics/kxq035. Epub 2010 Jun 3. Biostatistics. 2010. PMID: 20522729 Free PMC article. - Single-nucleotide polymorphism genotyping using microarrays.
Patil N, Nouri N, McAllister L, Matsukaki H, Ryder T. Patil N, et al. Curr Protoc Hum Genet. 2001 May;Chapter 2:Unit 2.9. doi: 10.1002/0471142905.hg0209s27. Curr Protoc Hum Genet. 2001. PMID: 18428273 - Dynamic model based algorithms for screening and genotyping over 100 K SNPs on oligonucleotide microarrays.
Di X, Matsuzaki H, Webster TA, Hubbell E, Liu G, Dong S, Bartell D, Huang J, Chiles R, Yang G, Shen MM, Kulp D, Kennedy GC, Mei R, Jones KW, Cawley S. Di X, et al. Bioinformatics. 2005 May 1;21(9):1958-63. doi: 10.1093/bioinformatics/bti275. Epub 2005 Jan 18. Bioinformatics. 2005. PMID: 15657097 - [Novel approaches; improved diagnostics and therapeutics with DNA microarrays. II. Applications].
Linn SC, van de Rijn M, Giaccone G. Linn SC, et al. Ned Tijdschr Geneeskd. 2003 Apr 26;147(17):800-4. Ned Tijdschr Geneeskd. 2003. PMID: 12741168 Review. Dutch. - [Advances in high-density whole genome-wide single nucleotide polymorphism array in cancer research].
Zeng ZY, Xiong W, Zhou YH, Li XL, Li GY. Zeng ZY, et al. Ai Zheng. 2006 Nov;25(11):1454-8. Ai Zheng. 2006. PMID: 17094921 Review. Chinese.
Cited by
- Incomplete human reference genomes can drive false sex biases and expose patient-identifying information in metagenomic data.
Guccione C, Patel L, Tomofuji Y, McDonald D, Gonzalez A, Sepich-Poore GD, Sonehara K, Zakeri M, Chen Y, Dilmore AH, Damle N, Baranzini SE, Nakatsuji T, Gallo RL, Langmead B, Okada Y, Curtius K, Knight R. Guccione C, et al. Res Sq [Preprint]. 2024 Oct 23:rs.3.rs-4721159. doi: 10.21203/rs.3.rs-4721159/v1. Res Sq. 2024. PMID: 39502785 Free PMC article. Preprint. - Privacy-Preserving Visualization of Brain Functional Connectivity.
Tao Y, Sarwate AD, Panta S, Plis S, Calhoun VD. Tao Y, et al. bioRxiv [Preprint]. 2024 Oct 15:2024.10.11.617267. doi: 10.1101/2024.10.11.617267. bioRxiv. 2024. PMID: 39464157 Free PMC article. Preprint. - Safeguarding Privacy in Genome Research: A Comprehensive Framework for Authors.
Ghasemian M, Gerido LH, Ayday E. Ghasemian M, et al. bioRxiv [Preprint]. 2024 Sep 24:2024.09.20.614092. doi: 10.1101/2024.09.20.614092. bioRxiv. 2024. PMID: 39386658 Free PMC article. Preprint. - The goldmine of GWAS summary statistics: a systematic review of methods and tools.
Kontou PI, Bagos PG. Kontou PI, et al. BioData Min. 2024 Sep 5;17(1):31. doi: 10.1186/s13040-024-00385-x. BioData Min. 2024. PMID: 39238044 Free PMC article. - Secure discovery of genetic relatives across large-scale and distributed genomic data sets.
Hong MM, Froelicher D, Magner R, Popic V, Berger B, Cho H. Hong MM, et al. Genome Res. 2024 Oct 11;34(9):1312-1323. doi: 10.1101/gr.279057.124. Genome Res. 2024. PMID: 39111815 Free PMC article.
References
- Egeland T, Dalen I, Mostad PF. Estimating the number of contributors to a DNA profile. Int J Legal Med. 2003;117:271–275. - PubMed
- Hu YQ, Fung WK. Interpreting DNA mixtures with the presence of relatives. Int J Legal Med. 2003;117:39–45. - PubMed
- Balding DJ. Likelihood-based inference for genetic correlation coefficients. Theor Popul Biol. 2003;63:221–230. - PubMed
- Clayton TM, Whitaker JP, Sparkes R, Gill P. Analysis and interpretation of mixed forensic stains using DNA STR profiling. Forensic Sci Int. 1998;91:55–70. - PubMed
- Cowell RG, Lauritzen SL, Mortera J. Identification and separation of DNA mixtures using peak area information. Forensic Sci Int. 2007;166:28–34. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials