Majority of divergence between closely related DNA samples is due to indels - PubMed (original) (raw)

Comparative Study

. 2003 Apr 15;100(8):4661-5.

doi: 10.1073/pnas.0330964100. Epub 2003 Apr 2.

Affiliations

Comparative Study

Roy J Britten et al. Proc Natl Acad Sci U S A. 2003.

Abstract

It was recently shown that indels are responsible for more than twice as many unmatched nucleotides as are base substitutions between samples of chimpanzee and human DNA. A larger sample has now been examined and the result is similar. The number of indels is approximately 1/12th of the number of base substitutions and the average length of the indels is 36 nt, including indels up to 10 kb. The ratio (R(u)) of unpaired nucleotides attributable to indels to those attributable to substitutions is 3.0 for this 2 million-nt chimp DNA sample compared with human. There is similar evidence of a large value of R(u) for sea urchins from the polymorphism of a sample of Strongylocentrotus purpuratus DNA (R(u) = 3-4). Other work indicates that similarly, per nucleotide affected, large differences are seen for indels in the DNA polymorphism of the plant Arabidopsis thaliana (R(u) = 51). For the insect Drosophila melanogaster a high value of R(u) (4.5) has been determined. For the nematode Caenorhabditis elegans the polymorphism data are incomplete but high values of R(u) are likely. Comparison of two strains of Escherichia coli O157:H7 shows a preponderance of indels. Because these six examples are from very distant systematic groups the implication is that in general, for alignments of closely related DNA, indels are responsible for many more unmatched nucleotides than are base substitutions. Human genetic evidence suggests that indels are a major source of gene defects, indicating that indels are a significant source of evolutionary change.

PubMed Disclaimer

Figures

Figure 1

Figure 1

The raw data on gaps between chimp and human alignments. Shown is log–log plot of number of gaps of a given size as a function of size. The vertical axis is the number of gaps and the horizontal axis is the gap length in nucleotides. The line near the bottom is all of the larger gaps, which are present only once with a given length. Gaps >5 kb are uncertain.

Figure 2

Figure 2

The density of gaps vs. gap size. Shown is a log–log plot of the density function D k against gap size. The horizontal axis is gap length in nucleotides. The vertical axis is the density function, which is the number of gaps of a given size divided by the spacing in length between gaps, which is the average of the difference in length to the next smaller gap and the difference in length to the next larger gap. Shown are gaps <5 kb.

Figure 3

Figure 3

The cumulative total of the length of gaps vs. gap size. The number of gaps of a given size is multiplied by the length of the gap and added to the previous total to obtain the cumulative total. The horizontal logarithmic axis is the gap size and the vertical logarithmic axis is the cumulative total. It is clear that the larger gaps contribute heavily. The last four points represent sparse data because long gaps are difficult to measure. New data could easily raise this part of the curve.

Similar articles

Cited by

References

    1. Britten R J. Proc Natl Acad Sci USA. 2002;99:13633–13635. - PMC - PubMed
    1. Britten R J. Proc Natl Acad Sci USA. 1994;91:6148–6150. - PMC - PubMed
    1. Ebersberger I, Metzler D, Schwarz C, Pääbo S. Am J Hum Genet. 2002;70:1490–1497. - PMC - PubMed
    1. Gu X, Li W H. J Mol Evol. 1995;40:464–473. - PubMed
    1. Nickerson E, Nelson D L. Genomics. 1998;50:368–372. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources