Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays - PubMed (original) (raw)

Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays

Nils Homer et al. PLoS Genet. 2008.

Abstract

We use high-density single nucleotide polymorphism (SNP) genotyping microarrays to demonstrate the ability to accurately and robustly determine whether individuals are in a complex genomic DNA mixture. We first develop a theoretical framework for detecting an individual's presence within a mixture, then show, through simulations, the limits associated with our method, and finally demonstrate experimentally the identification of the presence of genomic DNA of specific individuals within a series of highly complex genomic mixtures, including mixtures where an individual contributes less than 0.1% of the total genomic DNA. These findings shift the perceived utility of SNPs for identifying individual trace contributors within a forensics mixture, and suggest future research efforts into assessing the viability of previously sub-optimal DNA sources due to sample contamination. These findings also suggest that composite statistics across cohorts, such as allele frequency or genotype counts, do not mask identity within genome-wide association studies. The implications of these findings are discussed.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. To give insight into the intuition behind our method, we present for a given SNP three different scenarios for the possible allele frequency of the person of interest corresponding to the genotypes AA, AB, and BB.

The allele frequencies of the reference population, person of interest, and the mixture are described as Mi, Yi, and Popi respectively. We see that the distance measure is greater (and positive) when the Yi of the person of interest is closer to the Mi of the mixture than to the Popi of the reference population. Similarly, the distance measure is smaller (and negative) when the Yi of the person of interest is closer to the Popi of the reference population than to Mi of the mixture. Our test statistic is then the z-score using this distance measure.

Figure 2

Figure 2. Simulation Results.

Using 1423 Wellcome Trust 58C individuals, we give log scaled p-values for simulations based on three variables: the number of SNPs (s), the fraction of the individual in the mixture (f), and the probe variance (vp). The graphs plot the relationships between the three variables with a different variable fixed in each graph. The log scaled p-values are represented by the color of each point in the graph, as well as the z-axis on the right graphs. These simulations suggest that we should be able to resolve mixtures where a given individual is 0.1% of the mixture (f), probe variance is at most 0.01 (vp) and the number of SNPs probed is 50,000 (s).

Figure 3

Figure 3. Experimental validation using a series of mixtures (see Methods A–F) assayed on the Affymetrix GeneChip 5.0, Illumina BeadArray 550 and the Illumina 450S Duo Human BeadChip.

The x-axis shows each individual in the CEU HapMap population, the left y-axis shows the p-value (log scaled), and the right y-axis shows the value of the test statistic. For mixtures A, B, E and F those in the mixture are colored green and those not in the mixture are colored red. For mixtures C and D those individuals who are not in the mixtures are colored red, those individuals who are related to the 1% or 10% individuals in the mixtures are colored orange, those individuals who are related to the 90% or 99% are colored yellow, and those people in the mixture are colored green. In all mixtures, the identification of the presence of a person's genomic DNA was possible.

Comment in

Similar articles

Cited by

References

    1. Egeland T, Dalen I, Mostad PF. Estimating the number of contributors to a DNA profile. Int J Legal Med. 2003;117:271–275. - PubMed
    1. Hu YQ, Fung WK. Interpreting DNA mixtures with the presence of relatives. Int J Legal Med. 2003;117:39–45. - PubMed
    1. Balding DJ. Likelihood-based inference for genetic correlation coefficients. Theor Popul Biol. 2003;63:221–230. - PubMed
    1. Clayton TM, Whitaker JP, Sparkes R, Gill P. Analysis and interpretation of mixed forensic stains using DNA STR profiling. Forensic Sci Int. 1998;91:55–70. - PubMed
    1. Cowell RG, Lauritzen SL, Mortera J. Identification and separation of DNA mixtures using peak area information. Forensic Sci Int. 2007;166:28–34. - PubMed

Publication types

MeSH terms

LinkOut - more resources