Dynamic evolution of base composition: causes and consequences in avian phylogenomics - PubMed (original) (raw)

Dynamic evolution of base composition: causes and consequences in avian phylogenomics

Benoit Nabholz et al. Mol Biol Evol. 2011 Aug.

Abstract

Resolving the phylogenetic relationships among birds is a classical problem in systematics, and this is particularly so when it comes to understanding the relationships among Neoaves. Previous phylogenetic inference of birds has been limited to mitochondrial genomes or a few nuclear genes. Here, we apply deep brain transcriptome sequencing of nine bird species (several passerines, hummingbirds, dove, parrot, and emu), using next-generation sequencing technology to understand features of transcriptome evolution in birds and how this affects phylogenetic inference, and combine with data from two bird species using first generation technology. The phylogenomic data matrix comprises 1,995 genes and a total of 0.77 Mb of exonic sequence. First, we find an unexpected heterogeneity in the evolution of base composition among avian lineages. There is a pronounced increase in guanine + cytosine (GC) content in the third codon position in several independent lineages, with the strongest effect seen in passerines. Second, we evaluate the effect of GC content variation on phylogenetic reconstruction. We find important inconsistencies between the topologies obtained with or without taking GC variation into account, each supporting different conclusions of past studies and also influencing hypotheses on the evolution of the trait of vocal learning. Third, we demonstrate a link between GC content evolution and recombination rate and, focusing on the zebra finch lineage, find that recombination seems to drive GC content. Although we cannot reveal the causal relationships, this observation is consistent with the model of GC-biased gene conversion. Finally, we use this unparalleled amount of avian sequence data to study the rate of molecular evolution, calibrated by fossil evidence and augmented with data from alligator transcriptome sequencing. There is a 2- to 3-fold variation in substitution rate among lineages with passerines being the most rapidly evolving and ratites the slowest. This study illustrates the potential of next-generation sequencing for phylogenomic studies but also the pitfalls when using genome-wide data with heterogeneous base composition.

PubMed Disclaimer

Figures

F<sc>IG</sc>. 1.

FIG. 1.

GC content at the 1st, 2nd, and 3rd codon position (GC1, GC2, and GC3) in light-gray, dark-gray, and black boxes, respectively. Dashed red lines show the observed value (number of Gs + Cs divided by the sequence length). Boxes give the quartiles of the distribution of the GC content obtained by bootstrapping of the original matrix (100 times); whiskers extend to the most extreme values obtained by bootstrapping. Grey dots are values corrected for missing data (see text).

F<sc>IG</sc>. 2.

FIG. 2.

(A) Distribution of GC3 in 5 Mb windows according to their position in the zebra finch genome. Boxes give the quartiles of the distribution; whiskers extend to the most extreme values obtained by bootstrapping. (B) Relationship between mean GC3 in 5 Mb windows and the corresponding standard deviation.

F<sc>IG</sc>. 3.

FIG. 3.

ML tree obtained with a GTR + Gamma4 model and the complete nucleotide data set. Scale indicates substitution/site.

F<sc>IG</sc>. 4.

FIG. 4.

Phylogenetic trees based on transcriptome data. (A) ML tree obtained with a GTR + Gamma4 model and the third codon position data set. (B) ML tree obtained with a GTR + Gamma4 model and the 1st and 2nd codon position data set. (C) ML tree obtained with a GTR + Gamma4 model and the nucleotide data set with GC recorded as RY. Values are site bootstrap support (1,000 times). Scale indicates substitution/site; note the difference in the scale between trees (A) and (B).

F<sc>IG</sc>. 5.

FIG. 5.

Unrooted proteome tree. Majority-consensus tree of Bayesian phylogenetic inference conducted under the CAT + Gamma4 mixture model using the software PHYLOBAYES. Values behind the nodes are site bootstrap support (100 times). Values in front of the nodes in italics are genes bootstrap support (100 times). Branch lengths are ML estimate using WAG + Gamma8 model. Scale indicates substitution/site.

F<sc>IG</sc>. 6.

FIG. 6.

(A) Estimation of ancestral GC3 content at each node and current GC3 for each species and (B) Equilibrium GC3* estimated for each branch of the avian phylogeny. Values in brackets show the 95% bootstrap (100 times) CIs.

F<sc>IG</sc>. 7.

FIG. 7.

Relationship between recombination rate (in cM/Mb; sex-averaged; extracted from Backström et al. 2010) and (A) the current GC3 as well as (B) the equilibrium GC3 (GC3*). Sizes of the circles are proportional to the length of the alignments used in each 5 Mb window. Red lines indicate lowest fits of the data considering each window with the same weight.

Similar articles

Cited by

References

    1. Backström N, Forstmeier W, Schielzeth H, et al. 11 co-authors. The recombination landscape of the zebra finch Taeniopygia guttata genome. Genome Res. 2010;20:485–495. - PMC - PubMed
    1. Baker AJ, Pereira SL, Paton TA. Phylogenetic relationships and divergence times of Charadriiformes genera: multigene evidence for the Cretaceous origin of at least 14 clades of shorebirds. Biol Lett. 2007;3:205–209. - PMC - PubMed
    1. Barbazuk WB, Emrich SJ, Chen HD, Li L, Schnable PS. SNP discovery via 454 transcriptome sequencing. Plant J. 2007;51:910–918. - PMC - PubMed
    1. Barker FK, Cibois A, Schikler P, Feinstein J, Cracraft J. Phylogeny and diversification of the largest avian radiation. Proc Natl Acad Sci U S A. 2004;101:11040–11045. - PMC - PubMed
    1. Belle E, Galtier N, Duret L, Eyre-Walker A. The decline of isochores in mammals: an assessment of the GC-content variation along the mammalian phylogeny. J Mol Evol. 2004;58:653–660. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources