Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes - PubMed (original) (raw)

Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes

W James Kent et al. Proc Natl Acad Sci U S A. 2003.

Abstract

This study examines genomic duplications, deletions, and rearrangements that have happened at scales ranging from a single base to complete chromosomes by comparing the mouse and human genomes. From whole-genome sequence alignments, 344 large (>100-kb) blocks of conserved synteny are evident, but these are further fragmented by smaller-scale evolutionary events. Excluding transposon insertions, on average in each megabase of genomic alignment we observe two inversions, 17 duplications (five tandem or nearly tandem), seven transpositions, and 200 deletions of 100 bases or more. This includes 160 inversions and 75 duplications or transpositions of length >100 kb. The frequencies of these smaller events are not substantially higher in finished portions in the assembly. Many of the smaller transpositions are processed pseudogenes; we define a "syntenic" subset of the alignments that excludes these and other small-scale transpositions. These alignments provide evidence that approximately 2% of the genes in the human/mouse common ancestor have been deleted or partially deleted in the mouse. There also appears to be slightly less nontransposon-induced genome duplication in the mouse than in the human lineage. Although some of the events we detect are possibly due to misassemblies or missing data in the current genome sequence or to the limitations of our methods, most are likely to represent genuine evolutionary events. To make these observations, we developed new alignment techniques that can handle large gaps in a robust fashion and discriminate between orthologous and paralogous alignments.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

Mouse/human alignments at Actinin α-3 before and after chaining and netting, as displayed at the genome browser at

http://genome.ucsc.edu

. The RefSeq genes track shows the exon/intron structure of this human gene, which has an ortholog as well as several paralogs and pseudogenes in the mouse. The all

blastz

Mouse track shows

blastz

alignments colored by mouse chromosome. The orthologous gene is on mouse chromosome 19, which is colored purple. Although

blastz

finds the homology in a very sensitive manner, it is fragmented. The chained

blastz

track shows the alignments after chaining. The chaining links related fragments. The orthologous genes and paralogs are each in a single piece. The chaining also merges some redundant alignments and eliminates a few very low-scoring isolated alignments. The Mouse/Human Alignment Net track is designed to show only the orthologous alignments. In this case, there has been no rearrangement other than moderate-sized insertions and deletions, so the net track is quite simple. Clicking on a chain or net track allows the user to open a new browser on the corresponding region in the other species.

Fig. 2.

Fig. 2.

(a) Small insertions and deletions. (b) Large insertions and deletions and transposons. Counts of genomewide insertions and deletions plotted vs. their size. The blue line shows a combination of human insertions and mouse deletions, whereas the red line shows mouse insertions and human deletions. The vertical scale is the natural logarithm of the number of insertions/deletions of that size. In b, the counts are grouped in bins of 25. The green line shows the percentage of bases in mouse gaps of that size that are covered by human-specific transposons. The peaks in the insertion/deletion graphs appear to be due to transposons.

Fig. 3.

Fig. 3.

(a) Log histogram of gap frequencies for gaps <50 bases long. (b) Log histogram of gap frequencies for gaps up to 1,500 bases long. (c) Log histogram of gap frequencies for gaps up to 50,000 bases long. Relative frequency of simultaneous and single gaps are shown in both sequences. The horizontal axis is used for gaps in the mouse sequence, which represent either insertions in human or deletions in mouse. The vertical axis is used for gaps in human. The log of the number of simultaneous gaps of a particular size range is converted into a level of gray to create a 2D histogram. The horizontal and vertical axes are not drawn in; the dark lines where the axis would be reflect the log frequencies of gaps that are in only one sequence. (a) Gaps of <50 bases. Gaps that are purely in mouse or purely in human are especially prominent here. (b) Gaps of 10–1,500 bases. The transposon-induced effects in Fig. 1 can also be seen here. Note also the relative concentration near the diagonal for inserts of <200 bases. This occurrence is mostly due to small inversions and locally divergent sequence. (c) Gaps of 1,000–50,000 bases. In this range, the log frequency of simultaneous gaps of a given combined (human and mouse) gap size differs roughly by a constant from the sum of the log frequencies of the individual one-sided gap sizes in each species. In this sense, these longer simultaneous gaps act as if they arise from independent gaps in each individual species.

Fig. 4.

Fig. 4.

A 15,000-base inversion containing two transcripts and showing chr7:2077222–2497100 in the November 2002 assembly of the human genome. Numerous smaller rearrangements are also visible in the net track in this picture. In some cases, the smaller ones simply represent paralogous mouse regions filling in when the orthologous mouse region is not yet sequenced.

Fig. 5.

Fig. 5.

Distribution of the spans of all 147,445 chains in the human net. The distribution consists of a bimodal portion for short chains of span <105 and a flat tail for 579 long chains of size between 105 and ≈108.

Comment in

Similar articles

Cited by

References

    1. Haldane, J. B. S. (1932) The Causes of Evolution (Longmans and Green, London).
    1. Graur, D. & Li, W. H. (2000) Fundamentals of Molecular Evolution (Sinauer, Sunderland, MA).
    1. The Mouse Sequencing Consortium (2002) Nature 420, 520–562. - PubMed
    1. Tesler, G. (2002) Bioinformatics 18, 492–493. - PubMed
    1. Yang, F., Alkalaeva, E. Z., Perelman, P. L., Pardini, A. T., Harrison, W. R., O'Brien, P. C., Fu, B., Graphodatsky, A. S., Ferguson-Smith, M. A. & Robinson, T. J. (2003) Proc. Natl. Acad. Sci. USA 100, 1062–1066. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources