Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles - PubMed (original) (raw)
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
Aravind Subramanian et al. Proc Natl Acad Sci U S A. 2005.
Abstract
Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
Figures
Fig. 1.
A GSEA overview illustrating the method. (A) An expression data set sorted by correlation with phenotype, the corresponding heat map, and the “gene tags,” i.e., location of genes from a set S within the sorted list. (B) Plot of the running sum for S in the data set, including the location of the maximum enrichment score (ES) and the leading-edge subset.
Fig. 2.
Original (4) enrichment score behavior. The distribution of three gene sets, from the C2 functional collection, in the list of genes in the male/female lymphoblastoid cell line example ranked by their correlation with gender: S1, a set of chromosome X inactivation genes; S2, a pathway describing vitamin c import into neurons; S3, related to chemokine receptors expressed by T helper cells. Shown are plots of the running sum for the three gene sets: S1 is significantly enriched in females as expected, S2 is randomly distributed and scores poorly, and S3 is not enriched at the top of the list but is nonrandom, so it scores well. Arrows show the location of the maximum enrichment score and the point where the correlation (signal-to-noise ratio) crosses zero. Table 1 compares the nominal P values for S1, S2, and S3 by using the original and new method. The new method reduces the significance of sets like S3.
Fig. 3.
Leading edge overlap for p53 study. This plot shows the ras, ngf, and igf1 gene sets correlated with P53– clustered by their leading-edge subsets indicated in dark blue. A common subgroup of genes, apparent as a dark vertical stripe, consists of MAP2K1, PIK3CA, ELK1, and RAF1 and represents a subsection of the MAPK pathway.
Comment in
- Application of a priori established gene sets to discover biologically important differential expression in microarray data.
Bild A, Febbo PG. Bild A, et al. Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15278-9. doi: 10.1073/pnas.0507477102. Epub 2005 Oct 17. Proc Natl Acad Sci U S A. 2005. PMID: 16230612 Free PMC article. No abstract available.
Similar articles
- A SATS algorithm for jointly identifying multiple differentially expressed gene sets.
Yang TY. Yang TY. Stat Med. 2011 Jul 20;30(16):2028-39. doi: 10.1002/sim.4235. Epub 2011 Apr 7. Stat Med. 2011. PMID: 21472762 - Comparative evaluation of gene-set analysis methods.
Liu Q, Dinu I, Adewale AJ, Potter JD, Yasui Y. Liu Q, et al. BMC Bioinformatics. 2007 Nov 7;8:431. doi: 10.1186/1471-2105-8-431. BMC Bioinformatics. 2007. PMID: 17988400 Free PMC article. - Statistical Test of Expression Pattern (STEPath): a new strategy to integrate gene expression data with genomic information in individual and meta-analysis studies.
Martini P, Risso D, Sales G, Romualdi C, Lanfranchi G, Cagnin S. Martini P, et al. BMC Bioinformatics. 2011 Apr 11;12:92. doi: 10.1186/1471-2105-12-92. BMC Bioinformatics. 2011. PMID: 21481242 Free PMC article. - Current status of gene expression profiling in the diagnosis and management of acute leukaemia.
Bacher U, Kohlmann A, Haferlach T. Bacher U, et al. Br J Haematol. 2009 Jun;145(5):555-68. doi: 10.1111/j.1365-2141.2009.07656.x. Epub 2009 Mar 16. Br J Haematol. 2009. PMID: 19344393 Review. - Genomic approaches to the pathogenesis and treatment of acute lymphoblastic leukemias.
Armstrong SA, Hsieh JJ, Korsmeyer SJ. Armstrong SA, et al. Curr Opin Hematol. 2002 Jul;9(4):339-44. doi: 10.1097/00062752-200207000-00012. Curr Opin Hematol. 2002. PMID: 12042709 Review.
References
- Schena, M., Shalon, D., Davis, R. W. & Brown, P. O. (1995) Science 270, 467–470. - PubMed
- Lockhart, D. J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., et al. (1996) Nat. Biotechnol. 14, 1675–1680. - PubMed
- Fortunel, N. O., Otu, H. H., Ng, H. H., Chen, J., Mu, X., Chevassut, T., Li, X., Joseph, M., Bailey, C., Hatzfeld, J. A., et al. (2003) Science 302, 393, author reply 393. - PubMed
- Mootha, V. K., Lindgren, C. M., Eriksson, K. F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., et al. (2003) Nat. Genet. 34, 267–273. - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources