EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates - PubMed (original) (raw)
Comparative Study
EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates
Albert J Vilella et al. Genome Res. 2009 Feb.
Abstract
We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project.
Figures
Figure 1.
Computational pipeline for the EnsemblCompara process.
Figure 2.
A diagram of the duplication consistency score on an example tree showing unlikely coordinated deletions on the subsequent lineages. The histogram shows the distribution of consistency scores for both PhyML/RAP and TreeBeST methods. PhyML/RAP has both a higher absolute number of duplications and far more at low consistency values.
Figure 3.
A scatter plot of the duplication consistency score (_x_-axis) compared to the bootstrap value of duplication nodes (_y_-axis). Because of the large number of values, the density of points is shown using the smoothScatter kernel-based density function in R.
Figure 4.
A plot showing different methods in terms of coverage in human genes (_x_-axis) vs. number of genes in syntenic relationships (_y_-axis).
Figure 5.
A screen shot of the gene tree page at Ensembl for the INS (insulin peptide) gene. This shows two independent duplications in rodents (giving rise to Ins1 and Ins2 genes) and teleost fish. Duplication nodes are shown as red squares whereas speciation nodes are in blue. The green bars to the right provide a graphical view of the multiple alignment, showing partial gene structures in hamster (Cavia p.), cat (Felis c.), and rabbit (Oryctolagus c.), due to their low coverage status.
References
- Adams M.D., Celniker S.E., Holt R.A., Evans C.A., Gocayne J.D., Amanatides P.G., Scherer S.E., Li P.W., Hoskins R.A., Galle R.F., et al. The genome sequence of Drosophila melanogaster . Science. 2000;287:2185–2195. - PubMed
- Dehal P., Satou Y., Campbell R.K., Chapman J., Degnan B., De Tomaso A., Davidson B., Di Gregorio A., Gelpke M., Goodstein D.M., et al. The draft genome of Ciona intestinalis: Insights into chordate and vertebrate origins. Science. 2002;298:2157–2167. - PubMed
- Dufayard J.F., Duret L., Penel S., Gouy M., Rechenmann F., Perriere G. Tree pattern matching in phylogenetic trees: Automatic search for orthologs or paralogs in homologous gene sequence databases. Bioinformatics. 2005;21:2596–2603. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources