Large-scale gene function analysis with the PANTHER classification system - PubMed (original) (raw)

Large-scale gene function analysis with the PANTHER classification system

Huaiyu Mi et al. Nat Protoc. 2013 Aug.

Abstract

The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.

PubMed Disclaimer

Conflict of interest statement

Competing financial interests

The authors declare that they have no competing financial interests.

Figures

Figure 1

Figure 1. Overview of PANTHER infrastructure.

PANTHER is consisted of three modules. The core module is the PANTHER protein library (yellow shade) that contains a collection of PANTHER families and subfamilies, each of which is represented by a phylogenetic tree, an MSA and an HMM. The second module is the pathway that contains 176 expert-curated pathways (green shade). The pathway components are associated to the protein sequences that are used to build the protein library (the light green shade), and therefore, the pathways are also linked to the subfamilies and HMMs. The third module is the tool suite. In this diagram, the gene list analysis tool is used as an example (blue shade). When the user uploads a gene list to the tool, and if the IDs in the list are from one of 82 organisms in PANTHER, the tool will map the IDs to the IDs in the PANTHER protein library (green arrows). If the uploaded IDs are not from one of the 82 organisms, the user can score the sequences against the PANTHER HMM library and generate the PANTHER Generic Mapping file (see Box 2) (orange arrows). There are three tests in the tool: functional classification, statistical overrepresentation test and statistical enrichment test. Numeric values must be provided in order to use the statistical enrichment test.

Figure 2

Figure 2. Examples of PANTHER phylogenetic tree and pathway diagram

(A) A sample phylogenetic tree from PANTHER (PTHR11633, PLATELET-DERIVED GROWTH FACTOR). The family contains three subfamilies (blue arrows). SF1 is annotated as “PDGF/VEGF growth factor related protein 1” based on the annotation in the drosophila and c. elegans sequences (Q9VWP6 and Q9N143, respectively). There is a recent duplication that generates the PDGF A chain (SF3) and the PDGF B chain (SF2). Ontology terms are annotated to the node that represents the common ancestor if the extant family, in this case, AN0 (SF1). The classifications are propagated to all the descent nodes, including the AN4 (SF3) and AN33 (SF2). (B) An example of PANTHER pathway diagram (P00047, PDGF signaling pathway). The diagram is shown in CellDesigner process diagram, which is similar to the SBGN-PD format, for example (blue circle), a transition of an input (e.g., ERK) to an output (eg., phosphorylated ERK) catalyzed by a modifier (eg., phosphorylated MEK). A pathway component (e.g, PDGF in red circle) is associated with the protein sequences in the protein library (red arrows in 2A) through expert curation. This association is supported by literature evidence. As a result, the pathway component of PDGF can be inferred to other orthologous protein sequences in the subfamilies in the library (SF2 and SF3 in 2A).

Figure 3

Figure 3

The PANTHER home page with the Gene List Analysis Tools.

Figure 4

Figure 4

User interface of the statistical overrepresentation test to allow user to select additional test gene lists.

Figure 5

Figure 5

Results of functional classification displayed as a gene list page. The results are based on the sampleTestList_NP_500 file in the Supplemental Materials.

Figure 6

Figure 6

PANTHER pie chart results from the sampleTestList_NP_500 file in the Supplemental Materials. You can use the Select ontology drop-down menu to switch to the pie chart of different ontologies. Click on the pie chart section to display the child categories. Click on the legends on the right side to retrieve the list of the genes for that category.

Figure 7

Figure 7. Result from the statistical overrepresentation test. The results are based on the sampleTestList_NP_500 file in the Supplemental Materials.

(A) The summary of the results is displayed in a table. You can export the table in a tab-delimited file by clicking the Export results button. You can also view the results in other views by using the View drop-down menu. If your analysis is done in pathway as shown here, you can click the pathway name and display the pathway diagram. The pathway components that have genes in your test list will be highlighted. The color of the highlighted component can be defined at the top of the page (red circle). A total of 4 test lists can be analyzed and viewed at the same time. (B) The results viewed in the PANTHER pathway Heterotrimeric G-protein signaling pathway – Gi alpha and Gs alpha mediated pathway (P00026). The components that contain the genes in the test gene list are highlighted in red.

Figure 8.

Figure 8.

The results from the statistical enrichment test. The results are based on the sampleTestList_NP file in the Supplemental Materials. (A) The output of the tool with a list of P-values for each comparison between a functional category distribution and the reference distribution. (B) Comparison of the distributions from PDGF signaling pathway (red) and reference (blue) in graph view. (C) A pathway diagram of PDGF signaling pathway that is visualized using an interactive pathway Java applet that colors the pathway using a “heat map” derived from the input values.

Figure 8.

Figure 8.

The results from the statistical enrichment test. The results are based on the sampleTestList_NP file in the Supplemental Materials. (A) The output of the tool with a list of P-values for each comparison between a functional category distribution and the reference distribution. (B) Comparison of the distributions from PDGF signaling pathway (red) and reference (blue) in graph view. (C) A pathway diagram of PDGF signaling pathway that is visualized using an interactive pathway Java applet that colors the pathway using a “heat map” derived from the input values.

Similar articles

Cited by

References

    1. Mi H, Muruganujan A & Thomas PD PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Research 41, D377–D386 (2013). - PMC - PubMed
    1. Venter JC, Adams MD, Myers EW, Li PW, et al. The sequence of the human genome. Science 291, 1304–1351 (2001). - PubMed
    1. Thomas PD, Campbell MJ, Kejariwal A, Mi H, et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res 13, 2129–2141 (2003). - PMC - PubMed
    1. Thomas PD, Kejariwal A, Guo N, Mi H, et al. Applications for protein sequence-function evolution data: mRNA/protein expression analysis and coding SNP scoring tools. Nucleic Acids Res 34, W645–W650 (2006). - PMC - PubMed
    1. Mi H, Vandergriff J, Campbell M, Narechania A, et al. Assessment of genome-wide protein function classification for Drosophila melanogaster. Genome Res 13, 2118–2128 (2003). - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources