Weighted genome trees: refinements and applications - PubMed (original) (raw)

Weighted genome trees: refinements and applications

Uri Gophna et al. J Bacteriol. 2005 Feb.

Abstract

There are many ways to group completed genome sequences in hierarchical patterns (trees) reflecting relationships between their genes. Such groupings help us organize biological information and bear crucially on underlying processes of genome and organismal evolution. Genome trees make use of all comparable genes but can variously weight the contributions of these genes according to similarity, congruent patterns of similarity, or prevalence among genomes. Here we explore such possible weighting strategies, in an analysis of 142 prokaryotic and 5 eukaryotic genomes. We demonstrate that alternate weighting strategies have different advantages, and we propose that each may have its specific uses in systematic or evolutionary biology. Comparisons of results obtained with different methods can provide further clues to major events and processes in genome evolution.

PubMed Disclaimer

Figures

FIG.1.

FIG.1.

Fitch-Margoliash tree based on conceptually translated complete genomic ORF sets, with equal weighting of all genes shared between a pair of genomes (see text for details). Bootstrap support (percentage of 20 replicates) was determined in a separate analysis by strict consensus. Where support is not shown, it is 100%. Where a node is marked with an empty circle, support is less than 50%. Taxa of particular interest are marked with an arrow. Whenever different strains of a single species form a single clade, they have been united to a single branch and the number of strains is given in parentheses as follows: Tropheryma whipplei includes T. whipplei Twist and TW0827; Mycobacterium tuberculosis/bovis includes M. bovis, M. tuberculosis H37Rv, and M. tuberculosis CDC1551; Agrobacterium tumefaciens includes A. tumefaciens C58 from both the Cereon and Dupont genomes; Vibrio vulnificus includes V. vulnificus strains CMCP6 and YJ016; Salmonella enterica Typhi includes S. enterica serovar Typhi strains CT18 and Ty2; Escherichia coli/Shigella flexneri includes E. coli strains O157:H7 EDL933, O157:H7, CFT073, and K-12 and S. flexneri strains 2a 2457T and 2a 301; Xylella fastidiosa includes X. fastidiosa strains 9a5c and Temecula1; Neisseria meningitidis includes strains MC58 and Z2491; Helicobacter pylori includes strains 26695 and J99; Streptococcus pneumoniae includes strains R6 and TIGR4; Streptococcus pyogenes includes strains MGAS315, SSI1, SF370, and MGAS8232; Streptococcus agalactiae includes strains 2603VR and NEM316; Staphylococcus aureus includes strains N315, Mu50, and MW2; and Chlamydophila pneumoniae includes strains AR39, TW183, J138, and CWL029. Branch end points for these species correspond to the former root of the clade of strains. For the full trees, which include all strains, see the supplemental material.

FIG. 2.

FIG. 2.

Fitch-Margoliash tree, as in Fig. 1, but excluding genes deemed phylogenetically discordant (8), at an alpha threshold of 5%.

FIG. 3.

FIG. 3.

Fitch-Margoliash tree, as in Fig. 1, but with preferential weighting of genes in the distance matrix according to their phylogenetic concordance (8) relative to the bulk of genes in the genome.

FIG. 4.

FIG. 4.

Fitch-Margoliash tree, as in Fig. 1, but with preferential weighting of genes in the distance matrix according to their phylogenetic discordance (8) relative to the bulk of genes in the genome.

FIG. 5.

FIG. 5.

Fitch-Margoliash tree, as in Fig. 1, but with preferential weighting of genes in the distance matrix according to their prevalence within the set of 147 genomes.

FIG. 6.

FIG. 6.

Fitch-Margoliash tree, as in Fig. 1, but with preferential weighting of genes in the distance matrix according to their rarity within the set of 147 genomes.

Similar articles

Cited by

References

    1. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402. - PMC - PubMed
    1. Andersson, S. G., O. Karlberg, B. Canback, and C. G. Kurland. 2003. On the origin of mitochondria: a genomics perspective. Philos. Trans. R. Soc. London B 358:165-179. - PMC - PubMed
    1. Brochier, C., E. Bapteste, D. Moreira, and H. Philippe. 2002. Eubacterial phylogeny based on translational apparatus proteins. Trends Genet. 18:1-5. - PubMed
    1. Bruno, W. J., N. D. Socci, and A. L. Halpern. 2000. Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol. Biol. Evol. 17:189-197. - PubMed
    1. Cavalier-Smith, T. 2002. The neomuran origin of archaebacteria, the negibacterial root of the universal tree and bacterial megaclassification. Int. J. Syst. Evol. Microbiol. 52:7-76. - PubMed

Publication types

MeSH terms

LinkOut - more resources