PhyloGenie: automated phylome generation and analysis - PubMed (original) (raw)

Comparative Study

. 2004 Sep 30;32(17):5231-8.

doi: 10.1093/nar/gkh867. Print 2004.

Affiliations

Comparative Study

PhyloGenie: automated phylome generation and analysis

Tancred Frickey et al. Nucleic Acids Res. 2004.

Abstract

Phylogenetic reconstruction is the method of choice to determine the homologous relationships between sequences. Difficulties in producing high-quality alignments, which are the basis of good trees, and in automating the analysis of trees have unfortunately limited the use of phylogenetic reconstruction methods to individual genes or gene families. Due to the large number of sequences involved, phylogenetic analyses of proteomes preclude manual steps and therefore require a high degree of automation in sequence selection, alignment, phylogenetic inference and analysis of the resulting set of trees. We present a set of programs that automates the steps from seed sequence to phylogeny and a utility to extract all phylogenies that match specific topological constraints from a database of trees. Two example applications that show the type of questions that can be answered by phylome analysis are provided. The generation and analysis of the Thermoplasma acidophilum phylome with regard to lateral gene transfer between Thermoplasmata and Sulfolobus, showed best BLAST hits to be far less reliable indicators of lateral transfer than the corresponding protein phylogenies. The generation and analysis of the Danio rerio phylome provided more than twice as many proteins as described previously, supporting the hypothesis of an additional round of genome duplication in the actinopterygian lineage.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Alignment excerpts showing the most commonly encountered problems when converting BLAST or PSIBLAST HSPs to multiple alignments. (A) Three BLAST HSPs combined to a multiple sequence alignment and the resulting gapping problems. (B) Extreme examples of excessive and inconsistent gapping.

Figure 2

Figure 2

Layout showing the BLAST/PSIBLAST post-processing steps used to reduce excessive and inconsistent gapping. (1) All full-length sequences are gathered for HSPs and form the database used for HMM-searching in 5. (2) All HSPs matching E-value, score and coverage cutoff criteria are converted to a multiple sequence alignment. (3) The alignment sequences are filtered by maximum sequence identity to remove duplicate entries and gapped regions are realigned to resolve gapping problems. (4) A profile-HMM is derived from the multiple sequence alignment. (5) Sequences from step 1 are searched with the HMM generated in step 4 so as to better define the start and end of alignable regions and thereby improve alignment. (6) HMM-HSPs are converted to a multiple sequence alignment.

Figure 3

Figure 3

Tree rooting scheme. (a) Unrooted tree. (b) Tree rooted at the seed sequence (Man) with taxonomic “level” assignments for each node. (c) Tree rooted at the tipnode least related and most distant from the seed sequence (counting nodes) after the second round of taxonomic assignment. (d) Final tree, rooted at the most basal node the most distant from the seed sequence.

Figure 4

Figure 4

Chromosomal distribution of presumed laterally transferred ORFs between Thermoplasmata and Sulfolobus, according to PhyloGenie, Pyphy and best BLAST hits. The light gray, dark gray and black circles encompass the LGTs predicted by BLAST, Pyphy and PhyloGenie, respectively.

Similar articles

Cited by

References

    1. Altschul S.F., Madden,T.L., Schäffer,A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. - PMC - PubMed
    1. Koski L.B. and Golding,G.B. (2001) The closest BLAST hit is often not the nearest neighbor. J. Mol. Evol., 52, 540–542. - PubMed
    1. International Human Genome Sequencing Consortium. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921. - PubMed
    1. Stanhope M.J., Lupas,A.N., Italia,M.J., Koretke,K.K., Volker,C. and Brown,J.R. (2001) Phylogenetic analyses do not support horizontal gene transfers from bacteria to vertebrates. Nature, 411, 940–944. - PubMed
    1. Salzberg S.L., White,O., Peterson,J. and Eisen,J.A. (2001) Microbial genes in the human genome: lateral transfer or gene loss? Science, 5523, 1903–1906. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources