PLAZA: a comparative genomics resource to study gene and genome evolution in plants - PubMed (original) (raw)

PLAZA: a comparative genomics resource to study gene and genome evolution in plants

Sebastian Proost et al. Plant Cell. 2009 Dec.

Abstract

The number of sequenced genomes of representatives within the green lineage is rapidly increasing. Consequently, comparative sequence analysis has significantly altered our view on the complexity of genome organization, gene function, and regulatory pathways. To explore all this genome information, a centralized infrastructure is required where all data generated by different sequencing initiatives is integrated and combined with advanced methods for data mining. Here, we describe PLAZA, an online platform for plant comparative genomics (http://bioinformatics.psb.ugent.be/plaza/). This resource integrates structural and functional annotation of published plant genomes together with a large set of interactive tools to study gene function and gene and genome evolution. Precomputed data sets cover homologous gene families, multiple sequence alignments, phylogenetic trees, intraspecies whole-genome dot plots, and genomic colinearity between species. Through the integration of high confidence Gene Ontology annotations and tree-based orthology between related species, thousands of genes lacking any functional description are functionally annotated. Advanced query systems, as well as multiple interactive visualization tools, are available through a user-friendly and intuitive Web interface. In addition, detailed documentation and tutorials introduce the different tools, while the workbench provides an efficient means to analyze user-defined gene sets through PLAZA's interface. In conclusion, PLAZA provides a comprehensible and up-to-date research environment to aid researchers in the exploration of genome information within the green plant lineage.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Structure of the PLAZA Platform. Outline of the different data types (white boxes) and tools (gray rounded boxes) integrated in the PLAZA platform. White rounded boxes indicate the different tools implemented to explore the different types of data available through the website.

Figure 2.

Figure 2.

Gene Family Delineation Using Protein Clustering, Phylogenetic Tree Construction, and Similarity Heat Maps. (A) Phylogenetic tree of clathrin adaptors (HOM000575) with the AP1-4 subfamilies delineated using OrthoMCL. Black and gray squares on the tree nodes indicate duplication and speciation events identified using tree reconciliation, respectively. Only bootstrap values ≥70% are shown. (B) Similarity heat map displaying all pairwise similarity scores for all gene family members. BLAST bit scores were converted to a color gradient with white/bright green and dark green indicating high and low scores, respectively. Clustering of the sequence similarities supports the existence of the four AP subfamilies that were identified using protein clustering and confirmed using phylogenetic inference. Note that subfamilies AP3 and AP4 are inverted in the heat map compared with the tree. Species abbreviations as in Table 2.

Figure 3.

Figure 3.

Overview of Different Colinearity-Based Visualizations of the Genomic Region around Poplar Gene PT10G16600. (A) The WGDotplot shows that the gene of interest, indicated by the light-green line, is located in a duplicated block between chromosomes PT08 and PT10. The orange color refers to a K_s_ value of 0.2 to 0.3, indicating the most recent WGD in poplar. (B) The Skyline plot shows the number of colinear segments in different organisms detected using i-ADHoRe. (C) The Multiplicon view depicts the gene order alignment of the homologous segments indicated in (B). Whereas the rounded boxes represent the different genes color-coded according to the gene family they belong to, the square boxes at the right indicate the species the genomic segment was sampled from. The reference gene is indicated by the light-green arrow in (B) and (C).

Figure 4.

Figure 4.

GO Enrichment Analysis of Species-Specific Gene Duplicates. (A) The GO enrichment for species-specific block and tandem duplicates in different species is visualized using heat maps. Colors indicate the significance of the functional enrichment, while nonenriched cells are left blank. The number of genes per set is indicated in parentheses. (B) Family enrichments indicate expanded gene families for different species. The gene sets are identical as in (A). The gray bands link the enriched GO terms with the corresponding gene family expansions. (C) The genomic organization of the core histone genes in Chlamydomonas reveals a pattern of dense clustering (indicated by gray boxes). Genes are shown as arrows; the direction indicates the transcriptional orientation and colors refer to the gene family a gene belongs to (families occurring only once are not colored for simplicity).

References

    1. Allen, B.S., Stein, J.L., Stein, G.S., and Ostrer, H. (1991). Single-copy flanking sequences in human histone gene clusters map to chromosomes 1 and 6. Genomics 10 486–488. - PubMed
    1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25 3389–3402. - PMC - PubMed
    1. Arabidopsis Genome Initiative (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408 796–815. - PubMed
    1. Ashburner, M., et al. (2000). Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25 25–29. - PMC - PubMed
    1. Blanc, G., Hokamp, K., and Wolfe, K.H. (2003). A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res. 13 137–144. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources