WebGestalt: an integrated system for exploring gene sets in various biological contexts - PubMed (original) (raw)

WebGestalt: an integrated system for exploring gene sets in various biological contexts

Bing Zhang et al. Nucleic Acids Res. 2005.

Abstract

High-throughput technologies have led to the rapid generation of large-scale datasets about genes and gene products. These technologies have also shifted our research focus from 'single genes' to 'gene sets'. We have developed a web-based integrated data mining system, WebGestalt (http://genereg.ornl.gov/webgestalt/), to help biologists in exploring large sets of genes. WebGestalt is composed of four modules: gene set management, information retrieval, organization/visualization, and statistics. The management module uploads, saves, retrieves and deletes gene sets, as well as performs Boolean operations to generate the unions, intersections or differences between different gene sets. The information retrieval module currently retrieves information for up to 20 attributes for all genes in a gene set. The organization/visualization module organizes and visualizes gene sets in various biological contexts, including Gene Ontology, tissue expression pattern, chromosome distribution, metabolic and signaling pathways, protein domain information and publications. The statistics module recommends and performs statistical tests to suggest biological areas that are important to a gene set and warrant further investigation. In order to demonstrate the use of WebGestalt, we have generated 48 gene sets with genes over-represented in various human tissue types. Exploration of all the 48 gene sets using WebGestalt is available for the public at http://genereg.ornl.gov/webgestalt/wg\_enrich.php.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Schematic overview of WebGestalt. WebGestalt is composed of four main modules: gene set management, information retrieval, organization/visualization and statistics. The gene set management module uploads, saves, retrieves and deletes gene sets, as well as performs Boolean operations to generate the unions, intersections and differences between gene sets. The uploading tool accepts datasets defined by experiment data, GO categories or chromosome location ranges. WebGestalt is flexible in the input identifier (Entrez Gene ID, Swiss-Prot ID, Ensembl ID, Unigene ID, gene symbol and Affymetrix Probe Set ID). The saving tool saves sub-sets of genes generated by the organization/visualization module. The information retrieval module currently retrieves information for up to 20 attributes for all genes in a gene set, including nomenclatures, various gene identifiers, map and functional information. Retrieved information can be exported to Microsoft Excel files. The organization/visualization module organizes and visualizes a gene set in figures or tables using eight sub-modules: GO Tree, Tissue Expression Bar Chart, Chromosome Distribution Chart, KEGG Table and Maps, BioCarta Table and Maps, Protein Domain Table, PubMed Table and GRIF Table. The statistics module provides two statistical tests, the hypergeometric test and Fisher's exact test and suggests important biological areas in a gene set.

Figure 2

Figure 2

Enriched DAG under ‘biological process’ for a set of 23 genes that are significantly over-represented in adrenal cortex, using all genes in the human genome as a reference. The enriched GO categories are brought together and visualized as a DAG. Categories in red are enriched ones while those in black are non-enriched parents. Listed in the boxes are the name of the GO category, the number of genes in the category and the _P_-value indicating the significance of enrichment.

Similar articles

Cited by

References

    1. Pruitt K.D., Maglott D.R. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 2001;29:137–140. - PMC - PubMed
    1. Tsai J., Sultana R., Lee Y., Pertea G., Karamycheva S., Antonescu V., Cho J., Parvizi B., Cheung F., Quackenbush J. RESOURCERER: a database for annotating and linking microarray resources within and across species. Genome Biol. 2001;2:1–4. - PMC - PubMed
    1. Kasprzyk A., Keefe D., Smedley D., London D., Spooner W., Melsopp C., Hammond M., Rocca-Serra P., Cox T., Birney E. EnsMart: a generic system for fast and flexible access to biological data. Genome Res. 2004;14:160–169. - PMC - PubMed
    1. Diehn M., Sherlock G., Binkley G., Jin H., Matese J.C., Hernandez-Boussard T., Rees C.A., Cherry J.M., Botstein D., Brown P.O., et al. SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data. Nucleic Acids Res. 2003;31:219–223. - PMC - PubMed
    1. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., et al. Gene ontology: tool for the unification of biology. The Gene Ontology consortium. Nature Genet. 2000;25:25–29. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources