Multipoint Mapping of Viability and Segregation Distorting Loci Using Molecular Markers (original) (raw)

Journal Article

Department of Biology

, University of Oulu, FIN-90401 Oulu, Finland

Department of Botany and Plant Sciences

, University of California, Riverside, California 92521

Corresponding author: Claus Vogl, Department of Botany and Plant Sciences, University of California, Riverside, CA 92521. E-mail: claus@genetics.ucr.edu

Search for other works by this author on:

Department of Botany and Plant Sciences

, University of California, Riverside, California 92521

Search for other works by this author on:

Navbar Search Filter Mobile Enter search term Search

Abstract

In line-crossing experiments, deviations from Mendelian segregation ratios are usually observed for some markers. We hypothesize that these deviations are caused by one or more segregation-distorting loci (SDL) linked to the markers. We develop both a maximum-likelihood (ML) method and a Bayesian method to map SDL using molecular markers. The ML mapping is implemented via an EM algorithm and the Bayesian method is performed via the Markov chain Monte Carlo (MCMC). The Bayesian mapping is computationally more intensive than the ML mapping but can handle more complicated models such as multiple SDL and variable number of SDL. Both methods are applied to a set of simulated data and real data from a cross of two Scots pine trees.

CHROMOSOMAL regions that cause distorted segregation ratios in early life stages may be referred to as segregation-distorting loci (SDL). These distortions are caused either by differential representation of SDL genotypes in gametes before fertilization or by viability differences of SDL genotypes after fertilization but before genotype scoring. In both cases, the observable phenotype is a distortion of marker locus genotypes in chromosomal regions close to the SDL. Hence, regardless of the timing of action of the SDL, mapping of locations and estimation of effects of SDL follow the same statistical treatment.

Let us first discuss mechanisms that cause deviated segregation ratios by altering the gametic proportions. With meiotic drive, gametic proportions become distorted during meiosis because one chromosome type may preferentially end up in the egg nucleus (meiotic drive). Meiotic drive is known, e.g., for the maize chromosome 10 where a variant carrying a heterochromatic knob is preferentially transmitted (reviewed in Grant 1975). Gametes carrying a certain allele act to render gametes carrying the homologous chromosome, e.g., the segregation distorter (SD) and sex ratio (SR) loci of Drosophila and the _t_-alleles of mice (e.g., Hartl and Clark 1997, p. 244ff). Meiotic drive can be a powerful selective force. The _t_-alleles are maintained in the population, even though they are homozygous lethals, due to their 0.95 probability of being passed to the next generation in heterozygotes. In many species hybridizations, outbreeding depression and segregation distortion have been observed in the F2 generation. These are often caused by structural differences between chromosomes (Whitkus 1998), i.e., by events before fertilization.

Haploid life stages can be exposed to selection, especially in plants. In the life cycle of mosses, the haploid life stage (the gametophyte) is dominant over the diploid life stage (the sporophyte). In vascular plants, maize gametophytic mutations indicate that pollen tube growth rates are determined in part by the genotypes of the microgametophytes (reviewed in Grant 1975).

Viability selection after fertilization may be more important than gametic selection. Viability selection is common in consanguinous matings where inbreeding depression reduces the survival of homozygotes compared to heterozygotes (Charlesworth and Charlesworth 1987). Viability selection gives rise to segregation ratios distorted from 1:2:1 at linked loci. Inbreeding depression is often expressed in very early life stages (Husband and Schemske 1996). In Scots pine, only ~15% of self-fertilized embryos develop into mature seeds, whereas ~75% do so in wind-pollinated seeds (Kärkkäinen et al. 1996). Some aspects of the genetic basis of inbreeding depression require further investigation, e.g., number and effects of loci and degree of dominance. Yet these factors have major consequences for mating system evolution (Charlesworth and Charlesworth 1998), conservation genetics (Hedrick 1994), and plant breeding (e.g., Williams and Savolainen 1996). A biased segregation ratio due to viability differences of genotypes also occurs in the F2 generation of wide crosses. This is generally thought to be caused by epistatic interactions.

Often events before fertilization cannot be distinguished from events after fertilization. McColdrick and Hedgecock (1997) reported that crosses of Crassostrea gigas, the Pacific oyster, produced biased segregation ratios when tested as adults. Later Launey and Hedgecock (1999) showed that, for many loci, the ratios were Mendelian when 6-hr-old larvae were assessed, but the ratios deviated from the Mendelian ratios when the animals were 2 to 3 mo old in the same crosses. Hence, the differences are due to post-fertilization viability selection.

Quantitative trait loci (QTL) are usually mapped in agronomically important plants and animals. To increase differences of parental types, and thus to increase the power of mapping, crosses are often conducted between inbred lines or between distantly related cultivars or even between species. As discussed above, these conditions promote segregation distortion.

For molecular characterization of the genetic causes of distorted segregation ratios, mapping of the location and effects of SDL would be desirable. As the phenotype in SDL mapping is different from that of QTL mapping (data in SDL mapping usually consist of frequencies of genotypes among survivors), QTL methods cannot be used for SDL mapping. Development of advanced methods for estimation of locations and effects of SDL has been lagging behind that for QTL mapping. In the past, often a single marker was considered at a time, where only the linkage between one fully informative marker and a single SDL was tested (Sorensen 1967; Servitová and Cetl 1984; Hedrick and Muona 1990; Fu and Ritland 1994a; Kärkkäinen et al. 1999). In a single-marker test, the number of distinguishable genotypic configurations of the marker is at best equal to the number of genotypic configurations of a linked SDL, but the genotypic frequencies of the marker are affected by the recombination fraction in addition to the frequencies of the SDL's genotypic configurations. Hence, for a single-marker test, estimations of the position and effect are confounded.

Errors in marker genotyping may also cause systematic deviations from the expected segregation ratio. Randomly amplified polymorphic DNA (RAPD) markers are often misscored as a faint band and may be interpreted as absent. This may lead to misscoring of only a single marker. In contrast, if segregation distortion is caused by SDL, all markers in the vicinity of the SDL will be affected.

Fu and Ritland (1994b), Mitchell-Olds (1995), and Cheng et al. (1996) have developed maximum-likelihood methods for mapping one SDL using flanking markers, i.e., an interval mapping strategy (Lander and Botstein 1989). Given a map of fully informative markers, no missing data, no interference between recombinations, and no more than one SDL per chromosome, this theory can be used to scan the genome for SDL. Under these assumptions, loci outside the interval flanking the SDL contribute no information to the segregation of the SDL. But more than one SDL per chromosome may be present and markers may be only partially informative. Furthermore, due to the effects of SDL, estimation of map distances of markers might become biased (Lorieux et al. 1995a,b; Liu 1998). This might cause the interval mapping method to become inefficient and biased.

The SDL analysis is based on binomial (or multinomial) distributions instead of normal distributions, and hence multiple regression is not readily available and cannot be combined with conventional interval mapping as in the composite interval mapping (CIM; Zeng 1994) or the multiple QTL mapping (MQM) scheme (Jansen and Stam 1994). Therefore, multiple SDL on a single chromosome pose an unsolved theoretical problem. On the other hand, if maps are inferred correctly and if SDL on different chromosomes do not interact epistatically, i.e., SDL effects combine multiplicatively, linkage to an SDL is solely responsible for the phenotype. SDL analysis of one chromosome is therefore usually independent from other chromosomes.

We present a multipoint method for mapping multiple SDL using a backcross design. The multipoint method is developed under both the maximum-likelihood and the Bayesian frameworks.

THEORY

Model: We develop and present the model under a backcross design only, although the method can be applied to other controlled mating designs as well. We assume that the parents that initiate the cross are pure inbred lines. The F1 of the cross is backcrossed to one of the parents and a total of N individuals are generated in the backcross (BC) family for mapping. We are interested in mapping loci responsible for segregation distortion using multiple markers that are already mapped on the genome. The data here are the observed marker genotypes (configurations). The parameters, however, are the number of SDL, the locations, and effects of these loci. We assume that all markers are neutral in the sense that their segregations would be Mendelian if there were no linked SDL on the same chromosome. The observed segregation distortions on these neutral markers, however, are caused by one or more SDL near the markers.

Note that the flow of causality is from the SDL to the genotypic configurations of the SDL, then from the genotypic configurations of the SDL to the genotypic configurations of the marker loci, and finally from the genotypic configurations of the marker loci to the observed marker information. We first consider a single SDL. The genotype of the F1 is heterozygous and that of a BC individual (generated from F1 backcrossed to the first inbred parent) is either heterozygous or homozygous for the allele of the first parent with an unequal probability. The degree of asymmetry in the probability is determined by the effect (size) of the SDL. Define

φi={0ifiis homozygous1ifiis heterozygous}

for i = 1, … , N. This indicator variable, ϕ_i_, is also called the “inheritance digit” because it indicates which of the two alleles carried by the F1 has been inherited to the i_th progeny. Parameters of interest are the effect, denoted by π, and location, denoted by λ of the segregation distorting locus. The distribution of ϕ_i is Bernoulli with

Pr(φi∣π,λ)=Pr(φi∣π)=π1−φi(1−π)φi

(1)

for i = 1, … , N, with

Note that in the SDL case the distribution of the inheritance digit of the SDL given π is independent of the location. Another parameter of interest is the location of the SDL on the chromosome, denoted by λ, which will be dealt with later. In the absence of segregation distortion, we have π = ½. Therefore, the deviation of π from ½ is the effect or size of the SDL. If ϕ_i_ were observable, we could directly estimate and test π. The maximum-likelihood estimate would be

if we could maximize the following log-likelihood function:

l(π∣φ)=Σi=1NlnPr(φi∣π)=Σi=1N[(1−φi)logπ+φilog(1−π)].

(4)

But ϕ_i_ is not observable; only the inheritance digits of marker alleles can be observed. Therefore, an entirely different approach is required to estimate π. Consider M markers with known map positions on the chromosome of interest. Define the inheritance digits of the _i_th individual at the _j_th marker locus as

ϕij={0ifiis homogyzous for markerj1ifiis heterogyzous for markerj}

for i = 1, … , N. Without genotyping errors, there are just three possibilities of marker information Iij of the i_th individual at the j_th marker locus. The first two cases are mutually exclusive events: either one or the other marker inheritance digit is observed. In the third case of a missing observation, we define the marker information as the union of the former two cases. Thus, Pr(Iij|ϕ_ij) = 1 if the marker information is compatible with ϕ_ij and Pr(Iij|ϕ_ij_) = 0 otherwise. In the latter case, Pr(Iij|ϕ_ij_) = 1 is equal to 1 independent of the inheritance digit. If there are genotyping errors Pr(Iij|ϕ_ij_) will assume that values intermediate between 0 and 1. Note that Pr(Ii1,…,IiM∣ϕi1,…,ϕiM)=Πj=1MPr(Iij∣ϕij) because conditional on the _j_th inheritance digit the _j_th marker information is independent from all other variables.

Given the position (λ) of the SDL on the chromosome, the joint distribution for ϕ_i_ and ϕ_i_1, … , ϕ_iM_ is

Pr(ϕi1,…,ϕiM,φi∣π,λ)=Pr(ϕi1,…,ϕiM∣φi,λ)Pr(φi∣π),

(5)

where Pr(ϕ_i_1, … , ϕ_iM_|φ_i_, λ) can be found using the property of a two-state Markov chain (Lander and Green 1987; Jiang and Zeng 1997). We assume that there is no interference between two consecutive crossovers so that Haldane's mapping function applies. Under this assumption, the sequence

{ϕi1,…,ϕik,φi,ϕi(k+1),…,ϕiM}

forms a Markov chain with two discrete states, where the markers are ordered according to their positions on the chromosome and the SDL is located between markers k and k + 1. We, thus, have

Pr(ϕi1,…,ϕiM∣φi,λ)=[Pr(ϕik∣φi,λ)∏j=1k−1Pr(ϕij∣ϕi(j+1))]×[Pr(ϕi(k+1)∣φi,λ)∏j=k+1M−1Pr(ϕi(j+1)∣ϕij)],

(6)

where

Pr(ϕij∣ϕi[j+1))={1−rj(j+1)ifϕij=ϕi(j+1)rj(j+1)ifϕij≠ϕi(j+1)}

is the transition probability between two consecutive loci and rj(j + 1) is the recombination fraction between loci j and j + 1. The transition probability between the SDL and the nearby marker k is

Pr(ϕik∣φi,λ)={1−rklifϕik=ϕirklifϕik≠ϕi,}

where rkl is the recombination fraction between the _k_th marker and the SDL identified as locus l. The transition probability between the SDL and the (k + 1)th locus is obtained similarly.

Let Ii = [_Ii_1, … , _IiM_]. Combining formula (6) with the marker information and “summing out” the marker inheritance digits, we get

Pr(Ii∣φi,λ)=Σϕi1⋯ΣϕiM(Pr(ϕi1,…,ϕiM∣φi,λ)∏j=1MPr(Iij∣ϕij)),

where we have made use of the independence from other markers of the _j_th marker information conditional on the _j_th marker inheritance digit. Combining the previous formula with formula (5) results in the following equation:

Pr(Ii,φi∣π,λ)=Pr(Ii∣φi,λ)Pr(φi∣π)=Pr(Ii∣φi,λ)π(1−φi)(1−π)φi.

(7)

Maximum likelihood: Having formulated the probability model, we now introduce a maximum-likelihood method to estimate and test the SDL. There are several ways to find the maximum-likelihood estimate of π; we adopt an expectation maximization (EM) algorithm and treat ϕ_i_ as missing data. We treat λ as a known constant for the moment. Let I = [_I_1, … , _IN_] and φ = [φ1, … , φ_N_]. For the EM algorithm we need to determine the logarithm of Pr(I, ϕ|π, λ), i.e.,

logPr(I,φ∣π,λ)=Σi=1Nlog[Pr(Ii∣φi,λ)π(1−φi)(1−π)φi]=const+Σi=1N[(1−φi)log(π)+φilog(1−π)].

(8)

The constant does not depend on the parameter of interest, π.

Conditional on the data, the position, and the initial value of the parameter, π(0), the posterior probabilities of ϕ_i_ = 0 and ϕ_i_ = 1 are, respectively,

Pr(φi=0∣Ii,π(0),λ)=Pr(Ii∣φ=1,λ)(1−π(0))Pr(Ii∣φ=0,λ)π(0)+Pr(Ii∣φ=1,λ)(1−π(0))

(9a)

and

Pr(φi=1∣Ii,π(0),λ)=Pr(Ii∣φ=0,λ)π(0)Pr(Ii∣φ=0,λ)π(0)+Pr(Ii∣φ=1,λ)(1−π(0)).

(9b)

Because Pr(ϕ_i_|Ii, π(0), λ) follows a Bernoulli distribution, the probability in (9a) is equivalent to the expectation E[φi∣Ii,π(0),λ]=φ^i(0)⁠. Taking the expectation of (8) with respect to ϕ and substituting ϕ_i_ into the resulting formula, we have completed the expectation step in the EM-algorithm. The M-step consists of maximizing the resulting equation to obtain

Equations 9a and 9b and 10 are iterated until convergence.

We can now test the null hypothesis that there is no segregation distortion for the particular location λ. The null hypothesis is formulated as _H_0: π = ½, which can be tested using the likelihood-ratio test statistic Λ=−2(1(1∕2,λ)−1(π^,λ))⁠, where 1(π^,λ) is the log likelihood

logPr(I∣π,λ)=Σi=1Nlog[ΣφiPr(Ii,φi∣π,λ)]

(11)

evaluated at the maximum-likelihood estimate π^⁠, and l(½, λ) = N log(½) is the log-likelihood value under Mendelian segregation. Under the null model, Λ is approximately distributed as a chi-square variable with 1 d.f.

The maximum-likelihood estimate of the position of the SDL, λ, can be obtained by examining the likelihood-ratio profile along the chromosome, as is commonly done in interval mapping of QTL.

Bayesian analysis: We now introduce the Bayesian analysis of SDL implemented via the Markov chain Monte Carlo (MCMC). We first classify variables into observables and unobservables. The observables are the data, denoted by I. The unobservables include parameters and missing information. The parameters here include π and λ, and the missing information consists of the inheritance digits ϕ and ϕ in the current situation. We always sum over all the missing information, such that inheritance digits will only appear in intermediate steps. The joint posterior distribution of the parameters is

Pr(π,λ∣I)∝Pr(π)Pr(λ)∏i=1NPr(Ii∣π,λ)=Pr(π)Pr(λ)∏i=1NΣφiPr(Ii∣φi,λ)Pr(φi∣π),

(12)

where Pr(π) and Pr(λ) are the prior distributions for the parameters of interest; beta with Beta(1, 1) for the former and uniform for the latter. Samples are simulated from the joint posterior distribution via the MCMC. In the MCMC analysis, instead of sampling all the unobservables simultaneously, we sample one unobservable at a time with others taking values simulated in the previous cycle. When all the unobservables are updated, we have completed one cycle of the Markov chain. When the chain reaches a stationary stage, subsequent samples are considered to be drawn from the joint posterior distribution.

Starting with an initial value for each parameter, {π(0), λ(0)}, we sample π using the Metropolis-Hastings algorithm (e.g., Gelman et al. 1995). A new proposal, π*, is sampled from a beta proposal distribution J(π*|π(0)) = Beta(π(0)N + 2, (1 − π(0))N + 2). The proposal π* is accepted with probability min{1, a(π*, π(0))}, where

a(π∗,π(0))=Pr(π∗,λ(0)∣I)Pr(π(0),λ(0)∣I)J(π(0)∣π∗)J(π∗∣π(0)).

(13)

Note that the first term is the ratio of posterior probabilities of the parameters and the second term is the ratio of proposal probabilities. If π* is accepted, we take π(1) = π*; otherwise we do not update the effect of the SDL and simply take π(1) = π(0). The beta proposal distribution assures that 0 ≤ π ≤ 1. The simulated value of π, denoted by π(1), is then used to generate λ. We use the Metropolis algorithm (e.g., Gelman et al. 1995). First, a new value of λ is proposed by a small perturbation from λ(0), i.e.,

where x is a uniform variable sampled from U(0, d) and d is a small positive number, e.g., 0.1 times the length of the linkage group. We accept this proposal with probability min{1, a(λ*, λ(0))}, where

a(λ∗,λ(0))=Pr(λ∗,π(1)∣I)Pr(λ(0),π(1)∣I).

(14)

If λ* is accepted, we take λ(1) = λ*; otherwise λ(1) = λ(0).

Multiple-SDL model: Consider the joint action of L SDL located on the chromosome of interest. Define the locations of these SDL by λ = {λ_l_} for l = 1, … , L, in contrast to the single-SDL model where λ is a scalar. Also define the marginal effects of the SDL by π = {π_l_} for l = 1, … , L. Assume that these SDL act multiplicatively then the joint effect of all the SDL can be formulated as a product of these marginal effects. Define ϕ_i_ = [ϕ_i_1, … , ϕ_iL_] and ϕ_i_ = [ϕ_i_1, … , ϕ_iM_] as vectors of inheritance digits of all SDL and marker loci, respectively, for the i_th individual. Using Bayes' theorem, the joint posterior distribution of ϕ_i can be formulated as

Pr(φi∣π,λ)=(Πl=1L−1Pr(φi(i+1)∣φi1,λ))Πl=1LPr(φi1∣πl)∑ϕi(Πl=1L−1Pr(φi(i+1)∣φi1,λ))∑l=1LPr(φi1∣π1).

(15)

The joint posterior distribution of the parameters is

Pr(π,λ∣I)∝Pr(π)Pr(λ)∏i=1NΣφi(Pr(Ii∣φi,λ)Pr(ϕi∣π,λ)),

(16)

where Pr(π)=Πl=1LPr(π1)⁠, Pr(λ)=Πl=1LPr(λ1)⁠, and

Pr(Ii∣φi,λ)=Pr(Ii,φi∣λ)Pr(φi∣λ)=∑φi(Pr(φi,φi∣λ)ΠjPr(Iij∣ϕij))Pr(φi∣λ).

(17)

Under the multiple-SDL model, formulation of an EM algorithm seems impossible. On the other hand, the Bayesian method requires little modification: instead of updating the effect and location of a single locus at a time, λ and π are updated iteratively for all loci.

With the Bayesian approach, the number of SDL (L) can be treated as an unknown variable. This involves a change in the dimension of the model. Reversible jump MCMC (Green 1995; Satagopan and Yandell 1996; Heath 1997; Richardson and Green 1997; Sillanpää and Arjas 1998; Stephens and Fisch 1998) is an extension to the Metropolis-Hastings sampler, permitting moves to be made between models with different dimensions. The joint posterior distribution of the parameters is

Pr(π,λ,L∣I)∝Pr(π∣L)Pr(λ∣L)Pr(L)×∏i=1NΣφi(Pr(Ii∣φi,λ,L)Pr(φi∣π,λ,L)),

(18)

where Pr(L) is the prior probability of the number of SDL. We chose a Poisson prior (with mean μ = 1) for Pr(L) truncated at _L_max. After each existing SDL has been updated, we propose two types of move to update L, adding a locus if L < _L_max (with probability _p_a) and deleting a locus if _L_ > 0 (with probability _p_d).

For adding an SDL, a new location λ_L_+1 and effect π_L_+1 are sampled from their uniform priors for the new SDL. The new sets of parameters are π* = (π(0), π_L_+1) and λ* = (λ(0), λ_L_+1). We then accept this new SDL with probability min{1, a(L + 1, L)}, where

a(L+1,L)=Pr(I∣π∗,λ∗,L+1)Pr(I∣λ(0),L)1L+1pdpa.

(19)

If the new SDL is accepted, its location and effect are accepted simultaneously; otherwise, the number of SDL remains the same. In the deleting step, a random SDL is proposed to be deleted. Then the SDL are renumbered such that the candidate SDL is the last SDL, i.e., the _L_th SDL. The new parameter sets will be π* = (π1(0), … , π_L_−1(0)) and λ* = (λ1(0), … , λ_L_−1(0)). The proposal is accepted with probability min{1, a(L − 1, L)}, where

a(L+1,L)=Pr(I∣π∗,λ∗,L−1)Pr(I∣π(0),λ(0),L)L1papd.

(20)

Note that we handle SDL within the same marker interval in exactly the same way as SDL in different intervals and that (20) is just the inverse of (19). Our interpretation of the terms (L + 1)−1 and L in (19) and (20), respectively, differs from the usual. Usually, these terms are included to account for a perceived imbalance in the number of loci selected for a delete step vs. that selected for an addition step if the order of loci is not fixed. We believe that the balance is one to one in both the addition and deletion steps and no balancing is necessary; we include these terms because of the Poisson prior. The difference to the usual algorithm, however, is just a minor modification of the prior distribution and thus irrelevant in most biological applications.

APPLICATIONS

To illustrate the method, a simulation study and an analysis of a data set from one cross of two Scots pine (Pinus sylvatica) trees are presented. The simulation study conforms to an inbred line BC situation. In the pine data analysis, we concentrate on the maternal part of the progeny of a single tree, i.e., a pseudobackcross design. In a backcross it is not possible to distinguish between gametic selection and viability selection after fertilization.

Simulations: In the simulation study, first, a single viability locus that eliminates 50% of the progeny of the heterozygous genotype, i.e., π = 2/3, was placed in the middle of a chromosome of length 1 M; six markers were spaced at regular intervals of 0.2 M along the chromosome; no missing data were considered. In the second simulation, two SDL with the same effects as in the single-SDL situation were placed at locations 0.33 M and 0.67 M, respectively. In both cases, simulations with sample sizes of 500 were repeated five times and results were compared; additionally, simulations with sample sizes of 100 and less were also performed. Compared to empirical reports of distortions of marker loci from Mendelian ratios, the simulated effect is high but not unrealistic. The marker map is rather dense and fully informative.

The outcomes of the analyses of the five simulated data sets were almost identical such that we present only one of them. In the maximum-likelihood (ML) analysis, the number of SDL was fixed to one. The inferred effect, the likelihood-ratio statistic Λ, is reported at each location. We also performed an MCMC analysis of the same data. From Figure 1A, we see that the position and effect of the SDL are estimated quite accurately. For the other four simulations, the inferred positions were also mostly between the two middle markers and the estimated effects were close to the true value. Reducing sample sizes did not appreciably change the estimate of location or effect. The likelihood-ratio statistic, however, dropped considerably (results not shown). We do not present the ML results with two SDL, because the model is not appropriate.

With the Bayesian MCMC analysis, the Poisson prior mean was set to μ = 1 and the maximum number of SDL was set to three. The chain length was 105. The chain was thinned by storing only after every 10th cycle. No burn-in period was discarded because the chain reached approximate stationarity very quickly. The posterior probability of the simulated number of SDL (i.e., one or two, respectively) was always between 0.6 and 0.9. In the one-SDL case, frequencies are higher at the center, i.e., close to the simulated position (Figure 1B). Effects are very similar to those estimated with the ML method. In the two-SDL case, posterior distributions of both the locations as well as the effects are about correct (Figure 1C). It can be easily discerned from the posterior distribution of frequencies that there are actually two SDL present. When the number of individuals was reduced, the posterior probability of the different numbers of SDL approached that of the prior distribution rapidly (data not shown). This corresponds to the decrease in the likelihood-ratio statistic with decreasing sample size.

Pine data: In the second application, data consisted of the megagametophytes of open-pollinated offspring of a single Scots pine P. sylvestris tree, P304 (Hurme and Savolainen 1999). Megagametophytes are haploid tissues consisting of the maternal part of the seedling's genome and can be scored at the seedling stage without damaging the seedling. We treated the progeny of this tree as a pseudobackcross family. Map distances and linkage phases were determined with Mapmaker as described in Hurme and Savolainen (1999). Five RAPD markers from linkage group 2 were used in this family:

Figure 1.

Simulated data. (A and B) A simulation with one SDL; (C) a simulation with two SDL. The scale on the _x_-axis is 1 M, the positions of the markers are indicated with an “×,” while the positions of the SDL are indicated with a circle. “Likelihood” refers to the broken line and to twice the log-likelihood ratio; “frequency” to the posterior probability of an SDL in an interval of 0.04 the length of the linkage group; and “effect” to the solid line and to the probability of finding the homozygote genotype in the BC.

C02-680, G13-750, K09-750, E09-250, and AC15-270 at positions 0.038 M, 0.115 M, 0.287 M, 0.461 M, and 0.478 M, respectively. As determined from other crosses, the map length of the whole linkage group was ~0.85 M. The sample size was 73 individuals, and in many individuals some markers were scored as missing.

With the ML analysis, the log-likelihood ratio statistic was appreciable only close to the marker G13-750 (Figure 2A). At this location the inferred effect was an excess

Pine data. The notation is the same as in Figure 1. The ML result is presented in A, and the posterior distribution of the single-SDL case is in B and of the two-SDL cases in C. The marker loci are (from left to right) C02-680, G13-750, K09-750, E09-250, and AC15-270.

Figure 2.

of the heterozygous genotype of ~0.2 over the Mendelian value of 0.5. For the Bayesian MCMC analysis, the prior distribution was the same as for the simulation study. The posterior probabilities of zero, one, two, and three SDL were 0.01, 0.15, 0.61, and 0.23, respectively. This result is, however, quite sensitive to the prior distribution of SDL number. We report the posterior distributions of both one and two inferred SDL. If a single SDL was inferred, it was most often placed close to marker C02-680 (the beginning of the marker region), and the inferred effect was a considerable increase in the second genotype, as in the ML analysis (Figure 2B). If two SDL were inferred, most often location and effect of one of the SDL was similar to the single-SDL case, while the other counteracted its effect at the other end of the linkage group (Figure 2C).

DISCUSSION

Herein, a method for mapping SDL in a backcross is presented. The method makes efficient use of a map of partially or fully informative marker loci by using the multipoint method (Lander and Green 1987; Jiang and Zeng 1997). A maximum-likelihood analysis via an EM algorithm as well as a Markov chain Monte Carlo Bayesian analysis using a reversible jump algorithm for varying the number of loci is presented in detail. Given a dense marker map, the method can be used for precision analysis of positions and effects of the SDL. The best previously available methods (Fu and Ritland 1994b; Mitchell-Olds 1995; Cheng et al. 1996) rely on fully informative markers flanking the putative SDL and assume just one SDL per chromosome.

With our approach, it is possible to efficiently analyze the number, positions, and effects of SDL in organisms, for which a high-resolution marker map has been developed and where inbred line crosses can be performed easily. Analysis can be extended easily to a general full-sib family or to the selfing of an outcrossing individual: the dimension changes from two to four, binomial distributions change to multinomial distributions, and the transition probabilities between adjacent loci change. Marker information now contributes to the full or partial identification of four combinations of genotypic configurations. As with the BC case, partial marker information can be defined as the union of compatible cases. All the above changes are rather trivial consequences of the change in dimension but complicate presentation substantially. Additionally, the missing phase information needs to be considered. Furthermore, the multipointing algorithm becomes more important for the full-sib design.

Presently, our method for the backcross can only be used to analyze the SDL currently segregating in the two lines, not those that have been segregating in the ancestral population from which the inbred lines derived. Segregation distortion might have already affected the inbreeding process for creation of the lines. Extrapolation from the current to the ancestral situation is therefore problematic. This problem is even more pressing for recombinant inbred lines, where overrepresention of chromosomal fragments of one or the other parent is commonly observed (e.g., Lister and Dean 1993) and requires a more elaborate approach.

A distinction needs to be made between segregation distortion before and after fertilization. An SDL acting before fertilization can only alter gametic proportions. Thus genotypic proportions will only be altered indirectly through the combination of gametic proportions, which restricts the achievable combinations of genotypic proportions. On the other hand, SDL acting after fertilization may alter genotypic proportions directly. Thus, many more combinations of genotypic proportions are possible for SDL acting after fertilization. In experimental crosses more complex than the backcross design, inferred genotypic proportions of an SDL may thus render unlikely prefertilization mechanisms of segregation distortion. Two or more SDL acting before fertilization may, however, mimic the effect of SDL acting after fertilization because of the increase in combinatorial possibilities.

In hybrids of species or subspecies, segregation distortion commonly occurs (see, e.g., Whitkus 1998 and references therein). This may be caused by structural rearrangements, e.g., inversions, which constitute a prefertilization mechanism. Alternatively, the segregation distortion may be caused by postfertilization differences in viability between genotypic configurations, most probably caused by epistatic interactions. Our method can be used to detect chromosomal areas that are causing these distortions. But because of the presumed epistasis, relaxation of the assumption of a multiplicative effect of different SDL may be necessary.

Our method may also be used to map loci influencing early viability. This would enhance our understanding of the nature of early inbreeding depression. The method provides another approach for estimating the number and effects of loci causing inbreeding depression. Traditionally, such information has been derived mainly from biometric analysis of crosses (e.g., Dudash and Carr 1998). But as inbreeding depression can be expressed in embryonic life stages not amenable to biometric analysis, application of this method is limited. To gain insight on these early life stages, sparse maps and single-marker methods have been used to infer the effect of a viability locus influencing inbreeding depression (Sorensen 1967; Servitova and Cetl 1984; Hedrick and Muona 1990; Fu and Ritland 1994a; Kärkkäinen et al. 1999). With single-marker analysis, estimation of position and effect of the SDL is, however, confounded and multiple SDL on a single linkage group cannot be handled at all. Interval methods (Fu and Ritland 1994b; Mitchell-Olds 1995; Cheng et al. 1996) rely on fully informative markers flanking the putative SDL and assume just one SDL per chromosome. Dense linkage maps of fully informative markers may be hard to obtain in closely related individuals that need to be considered in the analysis of inbreeding depression. Like the interval methods, our method requires a dense linkage map of polymorphic markers but is not restricted to fully informative markers; instead it can make efficient use of, e.g., dominant markers.

Only rarely have data sets been gathered for mapping segregation distortion or viability selection (see, however, Harushima et al. 1996 and Kuang et al. 1998). But often in QTL experiments, wide crosses are used to increase differences between parents and thus the power of mapping. Probably for this reason, markers with segregation ratio distortions are commonly observed in data sets used for QTL mapping resulting from wide crosses (e.g., van Ooijen et al. 1994). Segregation ratio distortion is also commonly observed in doubled haploid lines (e.g., Fulton et al. 1997).

Usually generation of a linkage map of marker loci precedes QTL analysis. If a dense map of informative markers is inferred correctly, the bias introduced by segregation distortion into QTL analysis will be negligible. But if recombination fractions or, worse, order of marker loci are inferred incorrectly, basic assumptions of QTL analysis do not hold and results will be imprecise at best. Hence, aside from being interesting in themselves, SDL cause practical problems in QTL projects as observed, e.g., by Sandbrink et al. (1995). Thus, segregation distortion should be accounted for in mapping projects.

Segregation distortion is known to bias estimation of recombination fractions in two-point inference of recombination distances between markers (Lorieux et al. 1995a,b; Liu 1998). If markers are fully informative, estimation of the recombination fraction of only the markers flanking the SDL will be affected. Only in the unlikely case of coincidence of SDL and marker location will no bias be observed. If less than fully informative markers are used, the effects of the distortion are spread out to the smallest interval of fully informative markers flanking the distorted region. As a remedy, markers that show obvious segregation distortion are often excluded from the map. But that reduces coverage of the genome and qualitative or quantitative trait loci might be missed.

Our method can be extended to allow for detection of SDL concurrently with estimation of a linkage. Cheng et al. (1996) have already developed an EM algorithm to infer positions of two fully informative markers in the presence of a single SDL (an interval method) in a backcross or doubled haploid lines. This could be extended to a multipoint inference of a marker map in the presence of SDL by augmenting the EM or MCMC schemes presented herein by allowing the markers to change their positions relative to each other.

The source code for a C++ program and executables for a Sun workstation, with which the above calculations can be performed, are available from Claus Vogl (claus@genetics.ucr.edu).

Acknowlegement

We thank Päivi Hurme and Outi Savolainen for the data set and Elja Arjas, Anita de Haan, Mikko Sillanpää, and Nengjun Yi for discussion of this and related issues. Outi Savolainen, Elja Arjas, and Lori Weingartner have commented on earlier versions of this manuscript. We thank Zhao-Bang Zeng and two anonymous reviewers for their patient work, which helped to improve this article a lot. This work was supported by grants from the Environment and Natural Resources Research Council and the Medical Research Council to Outi Savolainen and by the National Institutes of Health Grant GM-55321 and the U.S. Department of Agriculture National Research Initiative Competitive Grants Program 97-35205-5075 to S.X.

Footnotes

Communicating editor: Z-B. Zeng

LITERATURE CITED

Charlesworth

1987

Inbreeding depression and its evolutionary consequences

Annu. Rev. Ecol. Syst.

237

–

268

Charlesworth

1998

Some evolutionary consequences of deleterious mutations

Genetica

102/103

–

Cheng

Saito

Ukai

1996

Estimation of the position and effect of a lethal factor locus on a molecular marker linkage map

Theor. Appl. Genet.

494

–

502

Dudash

M W

Carr

D E

1998

Genetics underlying inbreeding depression in Mimulus with contrasting mating systems

Nature

393

682

–

684

Y-B

Ritland

1994a

Evidence for the partial dominance of viability genes contributing to inbreeding depression in Mimulus guttatus

Genetics

136

323

–

331

Y-B

Ritland

1994b

On estimating the linkage of marker genes to viability genes controlling inbreeding depression

Theor. Appl. Genet.

925

–

932

Fulton

T-M

Nelson

J C

Tanksley

S D

1997

Introgression and DNA marker analysis of Lycopersicum peruvianum, a wild relative of the cultivated tomato, into Lycopersicum esculentum, followed through three successive backcross generations

Theor. Appl. Genet.

895

–

902

Gelman

Carlin

J B

Stern

H S

Rubin

D B

1995

Bayesian Data Analysis

Chapman and Hall

London

Grant

1975

Genetics of Flowering Plants

Columbia University Press

New York

Green

P J

1995

Reversible jump Markov chain Monte Carlo computation and Bayesian model determination

Biometrika

711

–

732

Hartl

D L

Clark

A G

1997

Principles of Population Genetics

, Ed. 3.

Sinauer

Sunderland, MA

Harushima

Kurata

Yano

Nagamura

Sasaki

et al. ,

1996

Detection of segregation distortions in an indica-japonica rice cross using a high-resolution molecular map

Theor. Appl. Genet.

145

–

150

Heath

S C

1997

Markov-chain Monte Carlo segregation and linkage analysis for oligogenic models

Am. J. Hum. Genet.

748

–

760

Hedrick

P W

1994

Purging inbreeding depression and the probability of extinction: full-sib families

Heredity

363

–

372

Hedrick

P W

Muona

1990

Linkage of viability genes to marker loci in selfing organisms

Heredity

–

Hurme

Savolainen

1999

Comparison of homology and linkage of RAPD markers between individual trees of Scots pine (Pinus sylvestris L.)

Mol. Ecol.

–

Husband

B C

Schemske

D W

1996

Evolution of magnitude and timing of inbreeding depression in plants

Evolution

554

–

570

Jansen

R C

Stam

1994

High resolution of quantitative traits into multiple loci via interval mapping

Genetics

136

1447

–

1455

Jiang

Zeng

Z-B

1997

Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines

Genetica

101

–

Kärkkäinen

Koski

Savolainen

1996

Geographical variation in inbreeding depression in Scots pine

Evolution

111

–

119

Kärkkäinen

Kuittinen

van Treuren

Vogl

Savolainen

1999

Genetic basis of inbreeding depression in Arabis petrea

Evolution

1354

–

1365

Kuang

Richardson

T E

Carson

S D

Bongarten

B C

1998

An allele responsible for seedling death in Pinus radiata D

Don. Theor. Appl. Genet.

640

–

644

Lander

E S

Botstein

1989

Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps

Genetics

121

185

–

199

Lander

E S

Green

1987

Construction of multilocus genetic maps in humans

Proc. Natl. Acad. Sci. USA

2363

–

2367

Launey

Hedgecock

1999

Genetic load causes segregation ratio distortion in oysters: mapping at 6 hours

Plant and Animal Genome

VII

abstracts W14

, p.

Lister

Dean

1993

Recombinant inbred lines for mapping RFLP and phenotypic markers in Arabidopsis thaliana

Plant J.

745

–

750

Liu

B H

1998

Statistical Genomics: Linkage, Mapping, and QTL Analysis

CRC Press

Boca Raton, FL

Lorieux

Goffinet

Perrier

González de León

Lanaud

1995a

Maximum likelihood models for mapping genetic markers showing segregegation distortion. 1. Backcross populations

Theor. Appl. Genet.

–

Lorieux

Perrier

Goffinet

Lanaud

González de León

1995b

Maximum likelihood models for mapping genetic markers showing segregegation distortion. 2. F2-populations

Theor. Appl. Genet.

–

McColdrick

D J

Hedgecock

1997

Fixation, segregation and linkage of allozyme loci in inbred families of the Pacific oyster Crassostrea giga (Thunberg): implications for the causes of inbreeding depression

Genetics

146

321

–

334

Mitchell-Olds

1995

Interval mapping of viability loci causing heterosis in Arabidopsis

Genetics

140

1105

–

1109

Richardson

Green

P J

1997

On Bayesian analysis of mixtures with an unknown number of components

J. R. Stat. Soc. B

731

–

792

Sandbrink

J M

van Oijen

J W

Purimahua

C C

Vrielink

Verkerk

et al. ,

1995

Localization of genes for bacterial resistance in Lycopersicon peruvianum using RFLPs

Theor. Appl. Genet.

444

–

450

Satagopan

R J

Yandell

B S

1996

Estimating the number of quantitative trait loci via Bayesian model determination

Special Contributed Paper Session on

Genetic Analysis of Quantitative Traits and Complex Diseases

Biometric Section, Statistical Meeting

Chicago, IL

Servitová

Cetl

1984

The use of recessive lethal chlorophyll mutants for linkage mapping of Arabidopsis thaliana (L.) Heynh

Arabidopsis Inf. Serv.

–

Sillanpää

Arjas

1998

Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data

Genetics

148

1373

–

1388

Sorensen

F C

1967

Linkage between marker genes and embryonic lethal factors may cause distrubed segregation rations

Silvae Genet.

132

–

134

Stephens

D A

Fisch

R D

1998

Bayesian analysis of quantitative trait locus data using reversible jump Markov chain Monte Carlo

Biometrics

1334

–

1347

van Ooijen

J W

Sandbrink

J M

Vrielink

Verkerk

Zabel

et al. ,

1994

An RFLP linkage map of Lycopersicum peruvianum

Theor. Appl. Genet.

1007

–

1013

Whitkus

1998

Genetics of adaptive radiation in Hawaiian and Cook Island species of Tetramolopium (Asteraceae). II. Genetic linkage map and its implications for interspecific breeding barriers

Genetics

150

1209

–

1216

Williams

C G

Savolainen

1996

Inbreeding depression in conifers implications for breeding strategy

For. Sci.

102

–

117

Zeng

Z-B

1994

Precision mapping of quantitative trait loci

Genetics

136

1457

–

1468

Citations

Views

Altmetric

Metrics

Total Views 257

181 Pageviews

76 PDF Downloads

Since 2/1/2021

Month:	Total Views:
February 2021	1
March 2021	2
April 2021	20
May 2021	9
June 2021	5
July 2021	1
August 2021	5
September 2021	2
October 2021	12
November 2021	12
December 2021	2
January 2022	4
February 2022	2
March 2022	4
April 2022	6
May 2022	4
June 2022	3
July 2022	4
August 2022	11
September 2022	1
October 2022	2
November 2022	3
December 2022	4
January 2023	3
February 2023	1
March 2023	4
April 2023	3
May 2023	3
June 2023	5
July 2023	5
August 2023	12
September 2023	5
October 2023	4
November 2023	5
December 2023	12
January 2024	10
February 2024	9
March 2024	2
April 2024	5
May 2024	6
June 2024	6
July 2024	7
August 2024	21
September 2024	7
October 2024	3

Multipoint Mapping of Viability and Segregation Distorting Loci Using Molecular Markers (original) (raw)

Abstract

THEORY

APPLICATIONS

DISCUSSION

Acknowlegement

Footnotes

LITERATURE CITED

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Cited

Multipoint Mapping of Viability and Segregation Distorting Loci Using Molecular Markers (original) (raw)

Abstract

THEORY

APPLICATIONS

DISCUSSION

Acknowlegement

Footnotes

LITERATURE CITED

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited