The complete chloroplast DNA sequence of Eleutherococcus senticosus (Araliaceae); comparative evolutionary analyses with other three asterids - PubMed (original) (raw)

The complete chloroplast DNA sequence of Eleutherococcus senticosus (Araliaceae); comparative evolutionary analyses with other three asterids

Dong-Keun Yi et al. Mol Cells. 2012 May.

Abstract

This study reports the complete chloroplast (cp) DNA sequence of Eleutherococcus senticosus (GenBank: JN 637765), an endangered endemic species. The genome is 156,768 bp in length, and contains a pair of inverted repeat (IR) regions of 25,930 bp each, a large single copy (LSC) region of 86,755 bp and a small single copy (SSC) region of 18,153 bp. The structural organization, gene and intron contents, gene order, AT content, codon usage, and transcription units of the E. senticosus chloroplast genome are similar to that of typical land plant cp DNA. We aligned and analyzed the sequences of 86 coding genes, 19 introns and 113 intergenic spacers (IGS) in three different taxonomic hierarchies; Eleutherococcus vs. Panax, Eleutherococcus vs. Daucus, and Eleutherococcus vs. Nicotiana. The distribution of indels, the number of polymorphic sites and nucleotide diversity indicate that positional constraint is more important than functional constraint for the evolution of cp genome sequences in Asterids. For example, the intron sequences in the LSC region exhibited base substitution rates 5-11-times higher than that of the IR regions, while the intron sequences in the SSC region evolved 7-14-times faster than those in the IR region. Furthermore, the Ka/Ks ratio of the gene coding sequences supports a stronger evolutionary constraint in the IR region than in the LSC or SSC regions. Therefore, our data suggest that selective sweeps by base collection mechanisms more frequently eliminate polymorphisms in the IR region than in other regions. Chloroplast genome regions that have high levels of base substitutions also show higher incidences of indels. Thirty-five simple sequence repeat (SSR) loci were identified in the Eleutherococcus chloroplast genome. Of these, 27 are homopolymers, while six are di-polymers and two are tri-polymers. In addition to the SSR loci, we also identified 18 medium size repeat units ranging from 22 to 79 bp, 11 of which are distributed in the IGS or intron regions. These medium size repeats may contribute to developing a cp genome-specific gene introduction vector because the region may use for specific recombination sites.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

The gene map of Eleutherococcus senticosus cp genome. A pair of thick lines at the outmost circle represents the inverted repeats (IRa and IRb; 25,930 bp each), which separate the large single copy region (LSC; 86,755 bp) from the small single copy region (SSC; 18,153 bp). Genes drawn inside the circle are transcribed clockwise, while those drawn outside the circle are transcribed counterclockwise. Intron-containing genes are marked by asterisks. The numbers at the outmost circle indicate the locations of 18 repeats including direct (black number), palimdromic (blue number), and dispersed repeats (red numbers), respectively (cf. Table 5).

Fig. 2.

Fig. 2.

Comparison of the LSC, IR and SSC border regions among four cp genomes.

Fig. 3.

Fig. 3.

The events and the lengths of indel mutations in the CDS regions of cp genomes between Eletherococcus and Panax.

Fig. 4.

Fig. 4.

Small inversion mutations and associated secondary structures between the cp genomes of Eletherococcus (E) and the cp genome of Panax (P).

Fig. 5.

Fig. 5.

Comparisons of protein coding genes (CDS), introns, and intergenic spacers (IGS) of the chloroplast genomes in the three different comparisons of Eleutherococcus vs. Panax, Eleutherococcus vs. Daucus, and Eleutherococcus vs. Nicotiana. Y asis indicate the sequence divergences. For the CDS comparisons (top), 86 gene coding regions except trn genes are classified into 16 functional groups (Table 1) and their average sequence diversity is given in the figure. In the intron region comparisons (middle), the low levels of sequence divergences are distinct in the introns that are located on the IR regions. For the IGS region comparisons (bottom), the IGS between the 300 to 800 bp in length are summarized in this figure (

Supplementary datas 3–5

).

Fig. 6.

Fig. 6.

The levels of evolutionary divergences among the SSC, LSC, and IR regions of cp genomes. Y-axis represents the sequence divergences. The IR region evolves slower than the SSC or the LSC regions regardless the CDS, intron, and IGS.

Fig. 7.

Fig. 7.

Indel size and indel number distribution pattern among three cp genomes. The X-axis and Y-axis represent the indel size in base pair and indel numbers, respectively.

Similar articles

Cited by

References

    1. APG III. An update of the angiosperm phylogeny group classification for the orders and families of flowering plants. APG III. Bot. J. Linn. Soc. 2009;161:105–121.
    1. Bausher M.G., Singh N.D., Lee S.B., Jansen R.K., Daniell H. The complete chloroplast genome sequence of Citrus sinensis (L.) Osbeck var ‘Ridge Pineapple’: organization and phylogenetic relationships to other angiosperms. BMC Plant Biol. 2006;6:21. - PMC - PubMed
    1. Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucl. Acids Res. 1999;27:573–580. - PMC - PubMed
    1. Bowman C.M., Dyer T.A. The location and possible evolutionary significance of small dispersed repeats in Wheat ctDNA. Curr. Genet. 1986;10:931–941.
    1. Bowman C.M., Barker R.F., Dyer T.A. In Wheat ctDNA, Segments of ribosomal-protein genes are dispersed repeats, probably conserved by nonreciprocal recombination. Curr. Genet. 1988;14:127–136. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources