Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote - PubMed (original) (raw)

doi: 10.1371/journal.pbio.0040286.

Robert S Coyne, Martin Wu, Dongying Wu, Mathangi Thiagarajan, Jennifer R Wortman, Jonathan H Badger, Qinghu Ren, Paolo Amedeo, Kristie M Jones, Luke J Tallon, Arthur L Delcher, Steven L Salzberg, Joana C Silva, Brian J Haas, William H Majoros, Maryam Farzad, Jane M Carlton, Roger K Smith Jr, Jyoti Garg, Ronald E Pearlman, Kathleen M Karrer, Lei Sun, Gerard Manning, Nels C Elde, Aaron P Turkewitz, David J Asai, David E Wilkes, Yufeng Wang, Hong Cai, Kathleen Collins, B Andrew Stewart, Suzanne R Lee, Katarzyna Wilamowska, Zasha Weinberg, Walter L Ruzzo, Dorota Wloga, Jacek Gaertig, Joseph Frankel, Che-Chia Tsao, Martin A Gorovsky, Patrick J Keeling, Ross F Waller, Nicola J Patron, J Michael Cherry, Nicholas A Stover, Cynthia J Krieger, Christina del Toro, Hilary F Ryder, Sondra C Williamson, Rebecca A Barbeau, Eileen P Hamilton, Eduardo Orias

Affiliations

Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote

Jonathan A Eisen et al. PLoS Biol. 2006 Sep.

Abstract

The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus (MIC) has its genome held in reserve for sexual reproduction. The soma-like macronucleus (MAC), which possesses a genome processed from that of the MIC, is the center of gene expression and does not directly contribute DNA to sexual progeny. We report here the shotgun sequencing, assembly, and analysis of the MAC genome of T. thermophila, which is approximately 104 Mb in length and composed of approximately 225 chromosomes. Overall, the gene set is robust, with more than 27,000 predicted protein-coding genes, 15,000 of which have strong matches to genes in other organisms. The functional diversity encoded by these genes is substantial and reflects the complexity of processes required for a free-living, predatory, single-celled organism. This is highlighted by the abundance of lineage-specific duplications of genes with predicted roles in sensing and responding to environmental conditions (e.g., kinases), using diverse resources (e.g., proteases and transporters), and generating structural complexity (e.g., kinesins and dyneins). In contrast to the other lineages of alveolates (apicomplexans and dinoflagellates), no compelling evidence could be found for plastid-derived genes in the genome. UGA, the only T. thermophila stop codon, is used in some genes to encode selenocysteine, thus making this organism the first known with the potential to translate all 64 codons in nuclear genes into amino acids. We present genomic evidence supporting the hypothesis that the excision of DNA from the MIC to generate the MAC specifically targets foreign DNA as a form of genome self-defense. The combination of the genome sequence, the functional diversity encoded therein, and the presence of some pathways missing from other model organisms makes T. thermophila an ideal model for functional genomic studies to address biological, biomedical, and biotechnological questions of fundamental importance.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Unrooted Consensus Phylogeny of Major Eukaryotic Lineages

Representative genera are shown for which whole genome sequence data are either in progress (marked with asterisks *) or available. The ciliates, dinoflagellates, and apicomplexans constitute the alveolates (lighter yellow box). Branch lengths do not correspond to phylogenetic distances. Adapted from the more detailed consensus in [197].

Figure 2

Figure 2. Relationship between MIC and MAC Chromosomes

The top horizontal bar shows a small portion of one of the five pairs of MIC chromosomes. MAC-destined sequences are shown in alternating shades of gray. MIC-specific IESs (internally eliminated sequences) are shown as blue rectangles, and sites of the 15-bp Cbs are shown as red bars (not to scale). Below the top bar are shown macronuclear chromosomes derived from the above region of the MIC by deletion of IESs, site-specific cleavage at Cbs sites, and amplification. Telomeres are added to the newly generated ends (green bars). Most of the MAC chromosomes are amplified to approximately 45 copies (only three shown). Through the process of phenotypic assortment, initially heterozygous loci generally become homozygous in each lineage within approximately 100 vegetative fissions. Polymorphisms located on the same MAC chromosome tend to co-assort.

Figure 3

Figure 3. Scaffold Sizes

(A) Scaffold sizes versus MAC chromosome size. Blue diamonds represent scaffolds capped by telomeres on both ends. Red squares and green triangles represent incomplete scaffolds capped by telomeres at one or neither end, respectively. (B) Size distribution of scaffolds capped by telomeres on both ends.

Figure 4

Figure 4. Depth of Coverage versus Scaffold Size

Black diamonds indicate all scaffolds; red diamonds, scaffolds capped with telomeres on both ends.

Figure 5

Figure 5. Codon Usage

(A) Effective number of codons (ENc; a measure of overall codon bias) for each predicted ORF is plotted versus GC3 (the fraction of codons that are synonymous at the third codon position that have either a guanine or a cytosine at that position). The upper limit of expected bias based on GC3 alone is represented by the black curve; most T. thermophila ORFs cluster below the curve [red dots as in (B)]. (B) Principal component analysis of relative synonymous codon usage in T. thermophila. The 232 genes in the tail of the comma-shaped distribution (those with the most biased codon usage) are colored red. (C) Principal component analysis of relative synonymous codon usage in P. falciparum.

Figure 6

Figure 6. Orthologs Shared among T. thermophila and Selected Eukaryotic Genomes

Venn diagram showing orthologs shared among human, the yeast S. cerevisiae, the apicomplexan P. falciparum, and T. thermophila. Lineage-specific gene duplications in each of the organisms were identified and treated as one single gene (or super-ortholog). Pairwise mutual best-hits by BLASTP were then identified as putative orthologs.

Figure 7

Figure 7. Tubulin Gene Diversity in T. thermophila

The figure shows a neighbor-joining tree built from a clustalX alignment. Species abbreviations: Hs, H. sapiens; Dm, D. melaogaster; Sc, S. cerevisiae; Tt, T. thermophila; Pt, P. tetraurelia; Cr, C. reinhardtii; Tb, T. brucei; Ec, E. coli; Xl, Xenopus laevis. A prokaryotic tubulin ortholog, Escherichia coli FtsZ, was used as the outgroup.

Comment in

Similar articles

Cited by

References

    1. Collins K, Gorovsky MA. Tetrahymena thermophila . Curr Biol. 2005;15:R317–R318. - PubMed
    1. Nanney DL, Simon EM. Laboratory and evolutionary history of Tetrahymena thermophila . Methods Cell Biol. 2000;62:3–25. - PubMed
    1. Zaug AJ, Cech TR. The intervening sequence RNA of Tetrahymena is an enzyme. Science. 1986;231:470–475. - PubMed
    1. Blackburn EH, Gall JG. A tandemly repeated sequence at the termini of the extrachromosomal ribosomal RNA genes in Tetrahymena . J Mol Biol. 1978;120:33–53. - PubMed
    1. Yao MC, Yao CH. Repeated hexanucleotide C-C-C-C-A-A is present near free ends of macronuclear DNA of Tetrahymena . Proc Natl Acad Sci U S A. 1981;78:7436–7439. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources