Large-scale trends in the evolution of gene structures within 11 animal genomes - PubMed (original) (raw)

Mark Yandell et al. PLoS Comput Biol. 2006 Mar.

Abstract

We have used the annotations of six animal genomes (Homo sapiens, Mus musculus, Ciona intestinalis, Drosophila melanogaster, Anopheles gambiae, and Caenorhabditis elegans) together with the sequences of five unannotated Drosophila genomes to survey changes in protein sequence and gene structure over a variety of timescales--from the less than 5 million years since the divergence of D. simulans and D. melanogaster to the more than 500 million years that have elapsed since the Cambrian explosion. To do so, we have developed a new open-source software library called CGL (for "Comparative Genomics Library"). Our results demonstrate that change in intron-exon structure is gradual, clock-like, and largely independent of coding-sequence evolution. This means that genome annotations can be used in new ways to inform, corroborate, and test conclusions drawn from comparative genomics analyses that are based upon protein and nucleotide sequence similarities.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Global Overview of Gene Structure in Six Annotated Animal Genomes

(A) Intron length. Annotated intron length (log10) is plotted on the _x_-axis; the frequency at which introns of that length occur in an organism's genome is plotted on the _y_-axis. (B) Exon length. _x_-axis, coding-exon length in nucleotides; _y_-axis, frequency. (C) Intron density. A transcript's intron density is equal to its number of coding introns divided by the length of the protein it encodes. _y_-axis, frequency of annotated transcripts with a particular intron density. _x_-axis, intron density binned in increments of 0.5 introns/100 amino acid (see Materials and Methods). Deuterostomes are shown in shades of blue; protostomes in shades of red.

Figure 2

Figure 2. Cumulative Distribution Functions Illustrating Proteome-Wide Trends in Protein Similarity

_x_-axis, bits/aligned position; _y_-axis, cumulative fraction of HSPs having that number of bits/aligned amino acid pair or less. To facilitate display, only a subset of the 21 possible pair-wise combinations is shown. Data are based upon all reciprocal best BLASTP hits identified in all versus all BLASTP searches of the proteomes. Similarity calculations were restricted to the high-scoring HSP for each BLAST hit, in order to avoid data duplication due to overlapping alignments. There were 13,339 M. musculus–H. sapiens reciprocal best hits; 6,435 between D. melanogaster and A. gambiae; 5,828 between C. intestinalis and H. sapiens; 5,542 between D. melanogaster and H. sapiens; 4,669 between C. elegans and H. sapiens; 4,588 between C. elegans and D. melanogaster; 3,361 between H. sapiens and A. thaliana; and 2,835 between C. elegans and A. thaliana. atha, A. thaliana; cele, C. elegans; cint, C. intestinalis; dmel, D. melanogaster; hsap, H. sapiens; mmus, M. musculus.

Figure 3

Figure 3. Neighbor-Joining Trees Summarizing Proteome-Wide Trends in Protein Similarity and Genome-Wide Trends in Intron–Exon Structural Similarity

Proteome-wide trends in protein similarity (A), and genome-wide trends in intron–exon structural similarity (B). Numbers beneath tree nodes are bootstrap values.

Figure 4

Figure 4. Intron–Exon Structures Evolve Largely Independently of Protein Sequences

_x_-axis, human reciprocal best-hit best HSPs for four representative proteomes binned by percent identity in 5% increments. _y_-axis, percent of aligned introns among the HSPs in each bin. cele, C. elegans; cint, C. intestinalis; dmel, D. melanogaster; hsap, H. sapiens; mmus, M. musculus.

Figure 5

Figure 5. Controlling for the Impact of Unequal Rates of Protein Evolution on the Evolution of Intron–Exon Structures

(A) Unrooted neighbor-joining tree based upon amino acid similarities for reciprocal best-hit best HSPs having 1.25 bits/aligned amino acid pair. (B) Unrooted neighbor-joining tree based upon similarities in the intron–exon structures of those same HSPs.

Figure 6

Figure 6. Lengths of Orthologous Introns within a Quartet Are More Correlated than Those of Paralogous Introns

(A) Quartet orthologous intron pairs. _x_-axis, length (log10) of introns in human members of each quartet; _y_-axis, length (log10) of corresponding orthologous introns in the mouse member of the same quartet. Spearman correlation coefficient: 0.903; p < 0.001. (B) Paralogous introns. _x_-axis, length (log10) of introns in human members of each quartet; _y_-axis, length (log10) of corresponding paralogous introns in the other human member of the same quartet. Spearman correlation coefficient: 0.140; p < 0.001. The mouse distributions are essentially identical to their human counterparts.

Figure 7

Figure 7. D. melanogaster Intron Lengths Are Highly Correlated with Their Inferred D. pseudoobscura Orthologs; D. melanogaster Paralogous Introns Show No Such Correlation

(A) _x_-axis, length (log10) of annotated D. melanogaster introns; _y_-axis, length (log10) of their inferred orthologs in the D. pseudoobscura genome. Red circles indicate those introns containing a transposon in D. melanogaster; blue circles indicate those introns containing a transposon in D. pseudoobscura; gold circles indicate introns without identifiable transposons in either species. Spearman correlation coefficient: 0.637; p < 0.001. (B) Intron lengths of paralogs having the same intron–exon structure as judged by the positions of their splice junctions relative to the protein alignments of their reciprocal best-hit best HSPs. _x_-axis, length (log10) of introns in an annotated D. melanogaster gene; _y_-axis, length (log10) of corresponding paralogous introns. Spearman correlation coefficient: 0.448; p < 0.001.

Figure 8

Figure 8. Correlation in Orthologous Intron Lengths Is Proportional to Time since Last Common Ancestor

From left to right, and top to bottom: Annotated D. melanogaster lengths (_x_-axis) versus inferred orthologous intron lengths (_y_-axis) for D. simulans (strain 6), D. yakuba, D. ananassae, D. pseudoobscura, and D. virilis. Bottom right-hand panel: Annotated D. melanogaster lengths (_x_-axis) versus inferred A. gambiae intron lengths (_y_-axis). Approximate time since last common ancestor is shown in red in the lower left-hand corner in each panel; these are approximate estimates based upon protein data [30]. Spearman correlation coefficients: 0.886, 0.863, 0.670, 0.637, 0.550, and 0.410 for D. simulans, D. yakuba, D. ananassae, D. pseudoobscura, D. virilis, and A. gambiae distributions, respectively. p < 0.001 for each correlation coefficient. See Materials and Methods for analysis details.

Figure 9

Figure 9. Intron Lengths Can Be Used as a Molecular Clock

_y_-axis, magnitude of the Spearman correlation coefficient for the five Drosophila distributions shown in Figure 8. _x_-axis, time (millions of years) since last common ancestor based on protein similarities as calculated in [30]. Black bars above and below each data point denote observed variance in the data and were obtained by randomly resampling 1,000 orthologous intron pairs 100 times. Best-fitting curve (shown in black) y = −0.0057x + 0.9266; R2 = 0.9875.

Similar articles

Cited by

References

    1. Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, et al. The Sequence Ontology: A tool for the unification of genome annotations. Genome Biol. 2005;6:R44. - PMC - PubMed
    1. de Souza SJ, Long M, Klein RJ, Roy S, Lin S, et al. Toward a resolution of the introns early/late debate: Only phase zero introns are correlated with the structure of ancient proteins. Proc Natl Acad Sci U S A. 1998;95:5094–5099. - PMC - PubMed
    1. Stoltzfus A. Molecular evolution: Introns fall into place. Curr Biol. 2004;14:R351–R352. - PubMed
    1. Parsch J. Selective constraints on intron evolution in Drosophila . Genetics. 2003;165:1843–1851. - PMC - PubMed
    1. Qiu WG, Schisler N, Stoltzfus A. The evolutionary gain of spliceosomal introns: Sequence and phase preferences. Mol Biol Evol. 2004;21:1252–1263. - PubMed

Publication types

MeSH terms

LinkOut - more resources