Indel-based evolutionary distance and mouse-human divergence - PubMed (original) (raw)

Comparative Study

Indel-based evolutionary distance and mouse-human divergence

Aleksey Y Ogurtsov et al. Genome Res. 2004 Aug.

Abstract

We propose a method for estimating the evolutionary distance between DNA sequences in terms of insertions and deletions (indels), defined as the per site number of indels accumulated in the course of divergence of the two sequences. We derive a maximal likelihood estimate of this distance from differences between lengths of orthologous introns or other segments of sequences delimited by conservative markers. When indels accumulate, lengths of orthologous introns diverge only slightly slower than linearly, because long indels occur with substantial frequencies. Thus, saturation is not a major obstacle for estimating indel-based evolutionary distance. For introns of medium lengths, our method recovers the known evolutionary distance between rat and mouse, 0.014 indels per site, with good precision. We estimate that mouse-human divergence exceeds rat-mouse divergence by a factor of 4, so that mouse-human evolutionary distance in terms of selectively neutral indels is 0.056. Because in mammals, indels are approximately 14 times less frequent than nucleotide substitutions, mouse-human evolutionary distance in terms of selectively neutral substitutions is approximately 0.8.

Copyright 2004 Cold Spring Harbor Laboratory Press ISSN

PubMed Disclaimer

Figures

Figure 1

Figure 1

Lengths of individual indels. pm(δ) and ph(δ), distributions of lengths of all indels in all alignments (A) of rat–mouse (blue line) and human–OWM (red line) intron pairs. p+(δ) and p_–(–δ), distributions of the absolute value of length of indels of only positive lengths (red line) and only negative lengths (blue line) in all rat–mouse alignments (B). P(δ) = (p+(δ)_ + p_–(–δ)_)/2, the averaged distribution of the absolute value of length of indels with positive and negative lengths in all rat–mouse (blue line) and human–OWM (red line) alignments (C). The same as the previous figure, but indels were recorded only in those parts of alignments where neither of the two sequences was masked by RepeatMasker (D). P(δ) in all rat–mouse alignments, multiplied by δ2(E). Properties of distributions P(δ) obtained for rat–mouse pairs of introns with the following average lengths: 0–100, 100–200, 200–400,..., 6400–12800. For each distribution, fractions of indels of length 1 and of indels longer than 100, 300, and 1000 nucleotides are shown (F).

Figure 2

Figure 2

Data on rat–mouse pairs of orthologous introns with different numbers of accumulated indels, k. Numbers and average length L of intron pairs (A). Data on M(Δ) (decreasing lines) and Med(|Δ|) (increasing lines) in all intron alignments (rugged lines) compared with theoretical predictions (equation 1; smooth lines) obtained with a = 0.5 (blue lines), 0.46 (green lines), and 0.38 (red lines) under P(δ) (equation 7) for intron pairs with the average lengths between 150 and 2500 (B), or with P(δ) for intron pairs of average lengths >150 (blue lines), between 150 and 2500 (green lines), and <2500 (red lines) under a = 0.46 (C).

Figure 3

Figure 3

Properties of intron pairs as functions of their average length, L. Numbers of introns with different values of L (in bins of size 50), and the corresponding M(Δ) (decreasing lines) and Med(|Δ|) (increasing lines) are shown for rat–mouse (A) and mouse–human (B) intron pairs.

Figure 4

Figure 4

The relationship between M(Δ) and Med(|Δ|) in intron pairs with different L (as in Fig. 3) in rat–mouse (A) and mouse–human (B) intron pairs, compared with theoretical predictions (equation 1), obtained under P(δ) calculated for intron pairs of with 150 < L < 2500 and several values of a.

Figure 5

Figure 5

Indel-based evolutionary distance q for intron pairs of different average lengths L (in bins of size 100, data points are shown at the top boundaries of bins; for each bin, its own P(δ) was used). For rat and mouse, actual data (red line) and the maximal likelihood estimate of q (black line, a = 0.46) are shown. For mouse and human, estimates of q under a = 0.46, 0.42, and 0.38 are shown. The blue line shows the ratio of mouse–human over rat–mouse estimates of q. The green line shows the same ratio, computed for only those parts of mouse and human intron sequences that are not masked by RepeatMasker, on the basis of P(δ), calculated from repeat-free parts of rat–mouse alignments.

Figure 6

Figure 6

Length differences between rat and mouse introns, and between mouse and human introns that belong to the same rat–mouse–human triplet of orthologous introns.

Similar articles

Cited by

References

    1. Arndt, P.F., Petrov, D.A., and Hwa, T. 2003. Distinct changes of genomic biases in nucleotide substitution at the time of mammalian radiation. Mol. Biol. Evol. 20: 1887–1896. - PubMed
    1. Britten, R.J. 2002. Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels. Proc. Natl. Acad. Sci. 99: 13633–13635. - PMC - PubMed
    1. Britten, R.J., Rowen, L., Williams, J., and Cameron, R.A. 2003. Majority of divergence between closely related DNA samples is due to indels. Proc. Natl. Acad. Sci. 100: 4661–4665. - PMC - PubMed
    1. Castresana, J. 2002. Genes on human chromosome 19 show extreme divergence from the mouse orthologs and a high GC content. Nucleic Acids Res. 30: 1751–1756. - PMC - PubMed
    1. Cooper, G.M., Brudno, M., Stone, E.A., Dubchak, I., Batzoglou, S., and Sidow, A. 2004. Characterization of evolutionary rates and constraints in three mammalian genomes. Genome Res. 14: 539–548. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources