Linkage analysis in the presence of errors I: complex-valued recombination fractions and complex phenotypes - PubMed (original) (raw)

Linkage analysis in the presence of errors I: complex-valued recombination fractions and complex phenotypes

H H Göring et al. Am J Hum Genet. 2000 Mar.

Erratum in

Am J Hum Genet 2000 Apr;66(4):1472

Abstract

Linkage is a phenomenon that correlates the genotypes of loci, rather than the phenotypes of one locus to the genotypes of another. It is therefore necessary to convert the observed trait phenotypes into trait-locus genotypes, which can then be analyzed for coinheritance with marker-locus genotypes. However, if the mode of inheritance of the trait is not known accurately, this conversion can often result in errors in the inferred trait-locus genotypes, which, in turn, can lead to the misclassification of the recombination status of meioses. As a result, the recombination fraction can be overestimated in two-point analysis, and false exclusions of the true trait locus can occur in multipoint analysis. We propose a method that increases the robustness of multipoint analysis to errors in the mode of inheritance assumptions of the trait, by explicitly allowing for misclassification of trait-locus genotypes. To this end, the definition of the recombination fraction is extended to the complex plane, as Theta=straight theta+straightepsiloni; theta is the recombination fraction between actual ("real") genotypes of marker and trait loci, and straightepsilon is the probability of apparent but false ("imaginary") recombinations between the actual and inferred trait-locus genotypes. "Complex" multipoint LOD scores are proven to be stochastically equivalent to conventional two-point LOD scores. The greater robustness to modeling errors normally associated with two-point analysis can thus be extended to multiple two-point analysis and multipoint analysis. The use of complex-valued recombination fractions also allows the stochastic equivalence of "model-based" and "model-free" methods to be extended to multipoint analysis.

PubMed Disclaimer

Figures

Figure 1

Possible explanations for observed recombination events. Only the relevant alleles are shown in the pedigree drawing. Gametes that correspond to the four offspring possibilities are shown below the pedigree (R and N refer to recombinants and nonrecombinants, respectively). I shows recombination between nonsyntenic loci due to random assortment of chromosomes, independent of their grandparental origin. II depicts crossing over, a mechanism by which sister chromosomes exchange genetic material, which leads to the existence of recombinants between syntenic loci. III shows that an observed recombination event can also be a mere artifact when errors in assumed genotypes exist.

Figure 2

Probability model for misclassification of recombination status. The observed recombination status of a meiosis may be misclassified because of errors in the assigned genotypes at the disease locus. Notice that true recombinants are mistaken for apparent nonrecombinants and true nonrecombinants for apparent recombinants with equal probability,

, as explained in the text.

P(R obs)=θ(1-ε)+(1-θ)ε=θ+ε-2θε>θ

and the estimate of the recombination fraction is biased upward if

θ<0.5

and

ε>0

Figure 3

Complex recombination fraction. The recombination fraction,

Θ=θ+ε_i_

, is modeled in the complex number system. Its real-valued component θ represents the true probability of recombination between the disease locus (D) and the marker locus (M). The common upward bias in the estimated recombination fraction, due to errors in inferred trait-locus genotypes, is modeled as an imaginary component of the recombination fraction,

ε_i_

. The magnitude of the imaginary component corresponds to the apparent recombination frequency between the actual and the assumed alleles of the trait locus. Note that the frequency of an observed recombination is given by formula image .

Figure 4

Example pedigree for likelihood computation that uses complex recombination fractions. The trait-locus genotypes are shown as inferred for a fully penetrant dominant disease. See text for the likelihood computation on this pedigree.

Figure 5

Misclassification of recombination status in the presence of errors in assumed meiotic informativeness. This figure shows the relationship of the error component,

, to misclassification of the recombination status and of meiotic informativeness. Note that the meiotic informativeness refers to the parental trait-locus genotype, whereas the observed recombination status refers to an offspring, in whom trait-locus genotype errors may also have occurred. The presented error model is still valid, with

ε=0.5ψ+ω(1-ψ)

Figure 6

Taxicab geometry. In “taxicab geometry,” only horizontal and vertical movement on a grid is allowed, just like the movement of a taxicab on a regular grid of city streets. The taxicab distance between two points (A and B) is therefore the sum of the horizontal and vertical distances between them, in this case

2+1=3

. In contrast, in Euclidean geometry, where diagonal movement (“as the crow flies”) is also allowed, the distance between these two points is formula image . A taxicab circle (with radius 2), consisting of points equidistant from its center (C), is also shown and looks like a square in Euclidean space.

Figure 7

Complex map distance. Since recombination fractions are not additive, it is useful to convert them into additive measures of genetic map distance. Figure 7 is the analog in complex map-distance space of figure 3 in complex recombination fraction space. For a given value of the total recombination probability, formula image , under the assumption of

ε⩾0

and no directional orientation of θ relative to other marker loci, two edges of a right isosceles triangle are defined by the set formula image .

Figure 8

Two-point and multiple two-point linkage analysis. In two-point linkage analysis (A), errors in the recombination fractions between each marker locus (M1, M2, M3) and the disease (D) are implicitly allowed through inflation of the recombination fraction estimate, but this error component is not estimated separately. Since the recombination fractions are typically overestimated, the triangles (corresponding to a specific map-distance estimate from each marker locus) are shown to extend beyond the true position of the trait locus. Not all triangles intersect at the same position, because the estimate of

is not constrained to be equal for the different marker loci. In multiple two-point analysis without an error component (B), all triangles are forced to intersect on the chromosome at the assumed position of the disease gene, because of linearity constraints. When the error component is incorporated into this type of analysis (C), the triangles are still forced to intersect at the position of the trait locus—not necessarily on the chromosome but rather at some distance,

x(ε)

, above in the error dimension.

Figure 9

Multiple two-point and multipoint linkage analysis with complex recombination fractions. In multiple two-point linkage analysis without an error component (A), the recombination fractions between each marker locus and the disease locus are constrained by the map. An error component,

, can be incorporated into multiple two-point analysis, as shown in (B). The real recombination fractions are again constrained, and only the error component—equal for all marker loci—is estimated. The last two panels show multipoint analysis without (C) and with (D) the error component. (Note that θ23 is used in multipoint analysis instead of θD3 in multiple two-point analysis.) The information provided by all marker loci is collapsed into a fictitious, highly informative marker locus at the assumed position of the trait locus, and linkage analysis is then performed between this hypothetical marker locus and the assigned trait-locus genotypes. In conventional multipoint analysis (C), the recombination fraction between this fictitious marker locus and the trait locus is fixed at 0, which is no longer the case when an error component is incorporated into the model (D), as proposed.

Figure 10

Effect of the misclassification parameter

on the expected 3-LOD-unit support interval for the disease locus. The effect of allowing for nonzero values of

in the analysis is shown graphically, plotted as the ratio of the mean width of the 3-LOD-unit support interval (S.I.) for the disease locus for given values of formula image to the expected width of the interval for the traditional multipoint LOD score in the absence of trait-locus genotype assignment errors. Allowing for misclassifications greatly increases the width of the interval to which the disease gene can be localized. See Terwilliger for more details.

Cited by

Genome-wide analysis of high-risk primary brain cancer pedigrees identifies PDXDC1 as a candidate brain cancer predisposition gene.
Cannon-Albright LA, Farnham JM, Stevens J, Teerlink CC, Palmer CA, Rowe K, Cessna MH, Blumenthal DT. Cannon-Albright LA, et al. Neuro Oncol. 2021 Feb 25;23(2):277-283. doi: 10.1093/neuonc/noaa161. Neuro Oncol. 2021. PMID: 32644145 Free PMC article.
Evidence for pelvic organ prolapse predisposition genes on chromosomes 10 and 17.
Allen-Brady K, Cannon-Albright LA, Farnham JM, Norton PA. Allen-Brady K, et al. Am J Obstet Gynecol. 2015 Jun;212(6):771.e1-7. doi: 10.1016/j.ajog.2014.12.037. Epub 2014 Dec 31. Am J Obstet Gynecol. 2015. PMID: 25557205 Free PMC article.
On the validity of the likelihood ratio test and consistency of resulting parameter estimates in joint linkage and linkage disequilibrium analysis under improperly specified parametric models.
Hiekkalinna T, Göring HH, Terwilliger JD. Hiekkalinna T, et al. Ann Hum Genet. 2012 Jan;76(1):63-73. doi: 10.1111/j.1469-1809.2011.00683.x. Epub 2011 Nov 14. Ann Hum Genet. 2012. PMID: 22082140 Free PMC article.
PSEUDOMARKER: a powerful program for joint linkage and/or linkage disequilibrium analysis on mixtures of singletons and related individuals.
Hiekkalinna T, Schäffer AA, Lambert B, Norrgrann P, Göring HH, Terwilliger JD. Hiekkalinna T, et al. Hum Hered. 2011;71(4):256-66. doi: 10.1159/000329467. Epub 2011 Jul 28. Hum Hered. 2011. PMID: 21811076 Free PMC article.
Family-based designs for genome-wide association studies.
Ott J, Kamatani Y, Lathrop M. Ott J, et al. Nat Rev Genet. 2011 Jun 1;12(7):465-74. doi: 10.1038/nrg2989. Nat Rev Genet. 2011. PMID: 21629274 Review.

References

1. Almasy L, Blangero J (1998) Multipoint quantitative linkage analysis in general pedigrees. Am J Hum Genet 62:1198–1211 - PMC - PubMed
1. Buskes G, van Rooij A (1997) Topological spaces—from distance to neighborhood. Springer Verlag, New York
1. Clerget-Darpoux F, Bonaïti-Peilié C, Hochez J (1986) Effects of misspecifying genetic parameters in LOD score analysis. Biometrics 42:393–399 - PubMed
1. Dupuis J, Brown PO, Siegmund D (1995) Statistical methods for linkage analysis of complex traits from high-resolution maps of identity by descent. Genetics 140:843–856 - PMC - PubMed
1. Göring HHH, Terwilliger JDT (2000a) Linkage analysis in the presence of errors II: marker-locus genotyping errors modeled with hypercomplex recombination fractions. Am J Hum Genet 66:1107–1118 (in this issue) - PMC - PubMed

Linkage analysis in the presence of errors I: complex-valued recombination fractions and complex phenotypes - PubMed (original) (raw)