Inference of relationships in population data using identity-by-descent and identity-by-state - PubMed (original) (raw)

Inference of relationships in population data using identity-by-descent and identity-by-state

Eric L Stevens et al. PLoS Genet. 2011 Sep.

Abstract

It is an assumption of large, population-based datasets that samples are annotated accurately whether they correspond to known relationships or unrelated individuals. These annotations are key for a broad range of genetics applications. While many methods are available to assess relatedness that involve estimates of identity-by-descent (IBD) and/or identity-by-state (IBS) allele-sharing proportions, we developed a novel approach that estimates IBD0, 1, and 2 based on observed IBS within windows. When combined with genome-wide IBS information, it provides an intuitive and practical graphical approach with the capacity to analyze datasets with thousands of samples without prior information about relatedness between individuals or haplotypes. We applied the method to a commonly used Human Variation Panel consisting of 400 nominally unrelated individuals. Surprisingly, we identified identical, parent-child, and full-sibling relationships and reconstructed pedigrees. In two instances non-sibling pairs of individuals in these pedigrees had unexpected IBD2 levels, as well as multiple regions of homozygosity, implying inbreeding. This combined method allowed us to distinguish related individuals from those having atypical heterozygosity rates and determine which individuals were outliers with respect to their designated population. Additionally, it becomes increasingly difficult to identify distant relatedness using genome-wide IBS methods alone. However, our IBD method further identified distant relatedness between individuals within populations, supported by the presence of megabase-scale regions lacking IBS0 across individual chromosomes. We benchmarked our approach against the hidden Markov model of a leading software package (PLINK), showing improved calling of distantly related individuals, and we validated it using a known pedigree from a clinical study. The application of this approach could improve genome-wide association, linkage, heterozygosity, and other population genomics studies that rely on SNP genotype data.

PubMed Disclaimer

Conflict of interest statement

TJD is an employee and stockholder of Partek. GH is an employee of Partek.

Figures

Figure 1

Figure 1. Genetic relatedness plots of the Human Variation Panel genotype data.

Abbreviations: AA, African American; CAU, Caucasian; CHI, Chinese; MEX, Mexican. (A) IBS2* plot of the within-group comparisons (n = 19,800). The IBS2*_ratio values are centered on 2/3 for unrelated individuals within a population. The relationship of NA17251 to 99 other AA individuals is indicated (arrow). A group of 9 MEX individuals have atypically low heterozygosity rates and form a cluster separated from other within-MEX comparisons (arrow 1). (B) IBS2* plot in which pairwise comparisons with IBS2*_ratio values >0.8 are removed (n = 13) and data points are colored by the sum of autosomal heterozygosity of each pair of individuals. (C) IBS2* plot for between-group comparisons (n = 60,000) for which none are expected to be genetically related. For groups having individuals with large differences in heterozygosity rates, such as AA-CHI comparisons, the IBS2*_ratio values are significantly lower than 2/3. The MEX individuals with atypical heterozygosity rates tend to form outlier clusters in between-group comparisons such as AA-MEX (arrow 1) and CHI-MEX (arrow 2). A group of five pairwise comparisons having relatively high IBS2*_ratio values (0.685 to 0.692; arrow 3) involve MEX individual NA17709 in comparison to CAU individuals.

Figure 2

Figure 2. Visualization of shared chromosomal regions based on IBS for related individuals.

IBS values for comparisons of two individuals are shown for a representative chromosome for the following pairs: (A) replicate samples NA17255/NA17263, (B) parent-child NA17624/NA17626, (C) full siblings NA17671/NA17674, (D) individuals sharing one quarter of their alleles (NA17655/NA17656; e.g. half-siblings), (E) distantly related individuals NA17673/NA17680. Data analysis was performed using SNPduo software. Note that for pericentromeric regions and the short arms of acrocentric chromosomes (as in panel C) no SNP data were available, producing no IBS measurements.

Figure 3

Figure 3. Relationship of IBS2* values to IBD estimates for recently related within-group comparisons.

IBS2* plots for recently related within-group comparisons having IBS2*_ratio values >0.80. The y-axis shows IBD1 and IBD2 estimates derived from our approach (K1, panel A; K2, panel B), PLINK's HMM which had removed individuals due to low genotyping rates (Z1, panel C; Z2, panel D), and PLINK's HMM with the same quality control metrics except that no individuals were removed (Z1, panel E; Z2, panel F). Note that the x-axis and y-axis scales are the same for panels A–F. Arrows and brackets indicate groups of pairwise comparisons representing one relationship type (see text for details). Colors correspond to ethnic group and are matched across the four panels.

Figure 4

Figure 4. Relationship of IBS2* values to IBD estimates for distantly related within-group comparisons.

IBS2* plots for within-group comparisons having IBS2*_ratio values <0.76 are shown. The y-axis shows IBD1 and IBD2 estimates derived from our approach (panels A, B) and PLINK's HMM which had removed individuals due to low genotype rates (panels C, D). Note that the x-axes scales are the same for all panels, and the y-axes are comparable for panels A, C and B, D. Arrows indicate pairwise comparisons: arrow 1, NA17673/NA17680 (MEX; see also arrow 1 in Figure 1A, 1B); arrow 2, NA17454/NA17459 (MEX; see also arrow 1 in Figure 1A, 1B); arrow 3, NA17203/NA17257 (CAU); arrow 4, NA17289/NA17299 (CAU); arrow 5, NA17785/NA17794 (CHI).

Figure 5

Figure 5. Validation of IBS2* methodology using annotated relationships from a known pedigree.

IBS2* plot analysis of real data from a large pedigree annotated by distance (i.e. proportion of IBD). The y-axis shows IBD1 and IBD2 estimates derived from our approach (panels A, C) and PLINK's HMM (panels B, D). These distances refer to Cotterman coefficients of relatedness. Note the linear relationship between K1and Z1 and IBS2*_ratio values in panels A and C.

Similar articles

Cited by

References

    1. Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J Clin Invest. 2008;118:1590–1605. - PMC - PubMed
    1. Bishop DT, Williamson JA. The power of identity-by-state methods for linkage analysis. Am J Hum Genet. 1990;46:254–265. - PMC - PubMed
    1. Lee W. Testing the genetic relation between two individuals using a panel of frequency-unknown single nucleotide polymorphisms. Ann Hum Genet. 2003:618–619. - PubMed
    1. Rosenberg NA. Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives. Ann Hum Genet. 2006;70:841–847. - PubMed
    1. Cotterman C. A calculus for statistico-genetics: Ohio State University 1940

Publication types

MeSH terms

LinkOut - more resources