EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates - PubMed (original) (raw)

Comparative Study

EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates

Albert J Vilella et al. Genome Res. 2009 Feb.

Abstract

We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Computational pipeline for the EnsemblCompara process.

Figure 2.

Figure 2.

A diagram of the duplication consistency score on an example tree showing unlikely coordinated deletions on the subsequent lineages. The histogram shows the distribution of consistency scores for both PhyML/RAP and TreeBeST methods. PhyML/RAP has both a higher absolute number of duplications and far more at low consistency values.

Figure 3.

Figure 3.

A scatter plot of the duplication consistency score (_x_-axis) compared to the bootstrap value of duplication nodes (_y_-axis). Because of the large number of values, the density of points is shown using the smoothScatter kernel-based density function in R.

Figure 4.

Figure 4.

A plot showing different methods in terms of coverage in human genes (_x_-axis) vs. number of genes in syntenic relationships (_y_-axis).

Figure 5.

Figure 5.

A screen shot of the gene tree page at Ensembl for the INS (insulin peptide) gene. This shows two independent duplications in rodents (giving rise to Ins1 and Ins2 genes) and teleost fish. Duplication nodes are shown as red squares whereas speciation nodes are in blue. The green bars to the right provide a graphical view of the multiple alignment, showing partial gene structures in hamster (Cavia p.), cat (Felis c.), and rabbit (Oryctolagus c.), due to their low coverage status.

References

    1. Adams M.D., Celniker S.E., Holt R.A., Evans C.A., Gocayne J.D., Amanatides P.G., Scherer S.E., Li P.W., Hoskins R.A., Galle R.F., et al. The genome sequence of Drosophila melanogaster . Science. 2000;287:2185–2195. - PubMed
    1. Dehal P.S., Boore J.L. A phylogenomic gene cluster resource: The Phylogenetically Inferred Groups (PhIGs) database. BMC Bioinformatics. 2006;7:201. doi: 10.1186/1471-2105-7-201. - DOI - PMC - PubMed
    1. Dehal P., Satou Y., Campbell R.K., Chapman J., Degnan B., De Tomaso A., Davidson B., Di Gregorio A., Gelpke M., Goodstein D.M., et al. The draft genome of Ciona intestinalis: Insights into chordate and vertebrate origins. Science. 2002;298:2157–2167. - PubMed
    1. Dufayard J.F., Duret L., Penel S., Gouy M., Rechenmann F., Perriere G. Tree pattern matching in phylogenetic trees: Automatic search for orthologs or paralogs in homologous gene sequence databases. Bioinformatics. 2005;21:2596–2603. - PubMed
    1. Edgar R.C. MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources