An automated phylogenetic tree-based small subunit rRNA taxonomy and alignment pipeline (STAP) - PubMed (original) (raw)

An automated phylogenetic tree-based small subunit rRNA taxonomy and alignment pipeline (STAP)

Dongying Wu et al. PLoS One. 2008.

Erratum in

Abstract

Comparative analysis of small-subunit ribosomal RNA (ss-rRNA) gene sequences forms the basis for much of what we know about the phylogenetic diversity of both cultured and uncultured microorganisms. As sequencing costs continue to decline and throughput increases, sequences of ss-rRNA genes are being obtained at an ever-increasing rate. This increasing flow of data has opened many new windows into microbial diversity and evolution, and at the same time has created significant methodological challenges. Those processes which commonly require time-consuming human intervention, such as the preparation of multiple sequence alignments, simply cannot keep up with the flood of incoming data. Fully automated methods of analysis are needed. Notably, existing automated methods avoid one or more steps that, though computationally costly or difficult, we consider to be important. In particular, we regard both the building of multiple sequence alignments and the performance of high quality phylogenetic analysis to be necessary. We describe here our fully-automated ss-rRNA taxonomy and alignment pipeline (STAP). It generates both high-quality multiple sequence alignments and phylogenetic trees, and thus can be used for multiple purposes including phylogenetically-based taxonomic assignments and analysis of species diversity in environmental samples. The pipeline combines publicly-available packages (PHYML, BLASTN and CLUSTALW) with our automatic alignment, masking, and tree-parsing programs. Most importantly, this automated process yields results comparable to those achievable by manual analysis, yet offers speed and capacity that are unattainable by manual efforts.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. A flow chart of the STAP pipeline.

Figure 2

Figure 2. Domain assignment.

In Step 1, STAP assigns a domain to each query sequence based on its position in a maximum likelihood tree of representative ss-rRNA sequences. Because the tree illustrated here is not rooted, domain assignment would not be accurate and reliable (sequence similarity based methods cannot make an accurate assignment in this case either). However the figure illustrates an important role of the tree-based domain assignment step, namely automatic identification of deep-branching environmental ss-rRNAs.

Figure 3

Figure 3. Determination of the quality score cutoff for automated alignment trimming.

The average quality score for all columns for alignments of randomly-generated sequences is plotted against the number of sequences in the alignment (see Methods). Standard deviations are indicated by gray shading.

Figure 4

Figure 4. Comparison of reliability of BLASTN and STAP taxonomic assignments.

The number below each taxonomic level indicates the number of bacterial sequences in the analysis that were annotated at that level (see Results and Discussion).

References

    1. Woese CR, Fox GE. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci U S A. 1977;74(11):5088–90. - PMC - PubMed
    1. Hugenholtz P, Goebel BM, Pace NR. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J Bacteriol. 1998;180(18):4765–74. - PMC - PubMed
    1. Pace NR. A molecular view of microbial diversity and the biosphere. Science. 1997;276(5313):734–40. - PubMed
    1. Lane DJ, Pace B, Olsen GJ, Stahl DA, Sogin ML, et al. Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proc. Natl. Acad. Sci., USA. 1985;82:6955–6959. - PMC - PubMed
    1. Vandamme P, Pot B, Gillis M, de Vos P, Kersters K, et al. Polyphasic taxonomy, a consensus approach to bacterial systematics. Microbiol Rev. 1996;60(2):407–38. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources