PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium (original) (raw)

Journal Article

,

1Evolutionary Systems Biology Group, SRI International, 2dictyBase, Northwestern University and 3Berkeley Bioinformatics and Open-source Projects (BBOP), Lawrence Berkeley National Laboratory, USA

Search for other works by this author on:

,

1Evolutionary Systems Biology Group, SRI International, 2dictyBase, Northwestern University and 3Berkeley Bioinformatics and Open-source Projects (BBOP), Lawrence Berkeley National Laboratory, USA

Search for other works by this author on:

,

1Evolutionary Systems Biology Group, SRI International, 2dictyBase, Northwestern University and 3Berkeley Bioinformatics and Open-source Projects (BBOP), Lawrence Berkeley National Laboratory, USA

Search for other works by this author on:

,

1Evolutionary Systems Biology Group, SRI International, 2dictyBase, Northwestern University and 3Berkeley Bioinformatics and Open-source Projects (BBOP), Lawrence Berkeley National Laboratory, USA

Search for other works by this author on:

,

1Evolutionary Systems Biology Group, SRI International, 2dictyBase, Northwestern University and 3Berkeley Bioinformatics and Open-source Projects (BBOP), Lawrence Berkeley National Laboratory, USA

Search for other works by this author on:

1Evolutionary Systems Biology Group, SRI International, 2dictyBase, Northwestern University and 3Berkeley Bioinformatics and Open-source Projects (BBOP), Lawrence Berkeley National Laboratory, USA

∗To whom correspondence should be addressed. Tel: +1 650 859 2324; Fax: +1 650 859 3735; Email: paul.thomas@sri.com

Search for other works by this author on:

Received:

15 September 2009

Accepted:

19 October 2009

Published:

16 December 2009

Cite

Huaiyu Mi, Qing Dong, Anushya Muruganujan, Pascale Gaudet, Suzanna Lewis, Paul D. Thomas, PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium, Nucleic Acids Research, Volume 38, Issue suppl_1, 1 January 2010, Pages D204–D210, https://doi.org/10.1093/nar/gkp1019
Close

Navbar Search Filter Mobile Enter search term Search

ABSTRACT

Protein Analysis THrough Evolutionary Relationships (PANTHER) is a comprehensive software system for inferring the functions of genes based on their evolutionary relationships. Phylogenetic trees of gene families form the basis for PANTHER and these trees are annotated with ontology terms describing the evolution of gene function from ancestral to modern day genes. One of the main applications of PANTHER is in accurate prediction of the functions of uncharacterized genes, based on their evolutionary relationships to genes with functions known from experiment. The PANTHER website, freely available at http://www.pantherdb.org, also includes software tools for analyzing genomic data relative to known and inferred gene functions. Since 2007, there have been several new developments to PANTHER: (i) improved phylogenetic trees, explicitly representing speciation and gene duplication events, (ii) identification of gene orthologs, including least diverged orthologs (best one-to-one pairs), (iii) coverage of more genomes (48 genomes, up to 87% of genes in each genome; see http://www.pantherdb.org/panther/summaryStats.jsp), (iv) improved support for alternative database identifiers for genes, proteins and microarray probes and (v) adoption of the SBGN standard for display of biological pathways. In addition, PANTHER trees are being annotated with gene function as part of the Gene Ontology Reference Genome project, resulting in an increasing number of curated functional annotations.

INTRODUCTION

PANTHER (Protein ANalysis THrough Evolutionary Relationships) is a database of phylogenetic trees of protein-coding gene families from all kingdoms of life (1). Ancestral genes (representing most recent common ancestors of extant genes) are annotated with ontology terms describing gene function, and likely functional divergence events are identified and used to divide protein families into subfamilies of genes with similar function. Hidden Markov models (HMMs) are constructed for all families and subfamilies, which can be used for genome annotation projects, alone or as part of the InterPro database (2) that includes PANTHER as well as several other well-known protein annotation resources.

The main goal of PANTHER is to infer the evolution of gene function across as many genes in as many genomes as possible, and apply these inferences to predict the functions of genes that have not been directly characterized by experiment. In particular, there are large communities of researchers elucidating gene function for so-called ‘model organisms’ (e.g. those listed in Table 1) and these results provide a basis for inferring the functions of related genes in humans and other organisms. PANTHER applies both software tools and manual curation to perform these inferences as accurately as possible, and to keep them up-to-date as new experimental results accumulate. Gene function—or, more commonly, the function of gene products such as proteins—is described using terms from the Gene ontology (GO) (3,4), or from representations of molecular pathways.

Table 1.

Sources for complete sets of protein-coding genes in PANTHER version 7

Organism or clade(s) Five-letter code Data source Reference
Arabidopsis thaliana ARATH TAIR (11)
Dicot plant
Caenorhabditis elegans CAEEL WormBase (12)
Nematode worm
Danio rerio DANRE Ensembl, ZFIN (13)
Zebrafish
Dictyostelium discoideum DICDI DictyBase (14)
Cellular slime mold
Drosophila melanogaster DROME FlyBase (15)
Fruit fly
Escherichia coli ECOLI EcoCyc (16)
Bacterium
Gallus gallus CHICK Entrez Gene (17)
Chicken
Homo sapiens HUMAN SwissProt (18)
Human
Mus musculus MOUSE MGI (19)
Mouse
Rattus norvegicus RAT RGD (20)
Rat
Saccharomyces cerevisiae YEAST SGD (21)
Budding yeast
Schizosaccharomyces pombe SCHPO GeneDB (22)
Fission yeast
Other chordate genomes Ensembl (23)
Other non-chordate genomes Entrez Gene (17)
Organism or clade(s) Five-letter code Data source Reference
Arabidopsis thaliana ARATH TAIR (11)
Dicot plant
Caenorhabditis elegans CAEEL WormBase (12)
Nematode worm
Danio rerio DANRE Ensembl, ZFIN (13)
Zebrafish
Dictyostelium discoideum DICDI DictyBase (14)
Cellular slime mold
Drosophila melanogaster DROME FlyBase (15)
Fruit fly
Escherichia coli ECOLI EcoCyc (16)
Bacterium
Gallus gallus CHICK Entrez Gene (17)
Chicken
Homo sapiens HUMAN SwissProt (18)
Human
Mus musculus MOUSE MGI (19)
Mouse
Rattus norvegicus RAT RGD (20)
Rat
Saccharomyces cerevisiae YEAST SGD (21)
Budding yeast
Schizosaccharomyces pombe SCHPO GeneDB (22)
Fission yeast
Other chordate genomes Ensembl (23)
Other non-chordate genomes Entrez Gene (17)

Table 1.

Sources for complete sets of protein-coding genes in PANTHER version 7

Organism or clade(s) Five-letter code Data source Reference
Arabidopsis thaliana ARATH TAIR (11)
Dicot plant
Caenorhabditis elegans CAEEL WormBase (12)
Nematode worm
Danio rerio DANRE Ensembl, ZFIN (13)
Zebrafish
Dictyostelium discoideum DICDI DictyBase (14)
Cellular slime mold
Drosophila melanogaster DROME FlyBase (15)
Fruit fly
Escherichia coli ECOLI EcoCyc (16)
Bacterium
Gallus gallus CHICK Entrez Gene (17)
Chicken
Homo sapiens HUMAN SwissProt (18)
Human
Mus musculus MOUSE MGI (19)
Mouse
Rattus norvegicus RAT RGD (20)
Rat
Saccharomyces cerevisiae YEAST SGD (21)
Budding yeast
Schizosaccharomyces pombe SCHPO GeneDB (22)
Fission yeast
Other chordate genomes Ensembl (23)
Other non-chordate genomes Entrez Gene (17)
Organism or clade(s) Five-letter code Data source Reference
Arabidopsis thaliana ARATH TAIR (11)
Dicot plant
Caenorhabditis elegans CAEEL WormBase (12)
Nematode worm
Danio rerio DANRE Ensembl, ZFIN (13)
Zebrafish
Dictyostelium discoideum DICDI DictyBase (14)
Cellular slime mold
Drosophila melanogaster DROME FlyBase (15)
Fruit fly
Escherichia coli ECOLI EcoCyc (16)
Bacterium
Gallus gallus CHICK Entrez Gene (17)
Chicken
Homo sapiens HUMAN SwissProt (18)
Human
Mus musculus MOUSE MGI (19)
Mouse
Rattus norvegicus RAT RGD (20)
Rat
Saccharomyces cerevisiae YEAST SGD (21)
Budding yeast
Schizosaccharomyces pombe SCHPO GeneDB (22)
Fission yeast
Other chordate genomes Ensembl (23)
Other non-chordate genomes Entrez Gene (17)

We have made several major modifications to the most recent version of PANTHER. One of the main developments is collaboration with the GO Consortium, in which PANTHER trees are being annotated with GO terms as part of the GO Reference Genome project (5). For PANTHER version 7, all previous associations of PANTHER subfamilies with function terms have been updated to GO terms. Ongoing annotation within the Reference Genome Project includes a complete evidence trail for inferred annotations all the way to the experimental results (literature articles) and evolutionary events upon which the inferences are based. Other important developments include improvements to the phylogenetic trees, inference of inter-species orthologs, inclusion of more genomes and support for several alternate database identifier types.

Improved hidden Markov Models and phylogenetic trees, and ortholog identification

Gene families covering fully sequenced genomes

Previous versions of PANTHER focused on identifying subfamilies and the underlying functional divergence events. PANTHER 7 expands upon this focus by supporting accurate ortholog identification, and annotation of gene families ‘at any point in gene family evolution’, not just the major divergences. In order to meet these requirements, we made several important improvements to PANTHER. First, PANTHER trees aim to represent ‘all’ protein-coding genes from a phylogenetically diverse set of organisms. For PANTHER 7 trees, complete protein-coding gene sets for 48 different organisms were carefully constructed from a number of different sources, in collaboration with the GO Consortium, with an effort to use curated sources for model organism genomes (Table 1). These sets can be downloaded at ftp://ftp.pantherdb.org/genome/pthr7.0\. We were careful to maintain stable PANTHER family and subfamily accession numbers from the previous version 6.1 to 7.0. To define protein family membership, each PANTHER 7 protein sequence was scored against the HMMs from version 6.1 and assigned to the family with the highest HMM score. If the resulting protein family contained over 1000 sequences, we attempted to manually divide it into smaller families to facilitate web browsing. We divided a total of 20 families from PANTHER 6.1, which have dramatically expanded due to numerous gene (or domain) duplication events, such as G protein-coupled receptors (GPCRs), ATP binding cassette (ABC) transporters, protein kinases, cytochrome P450s (CYP), and proteins containing ankyrin repeats, leucine-rich repeats (LRR), zinc finger and homeobox domains. Figure 1 shows the distribution of family sizes in terms of the number of distinct genes (Figure 1A) and the number of distinct genomes (Figure 1B) they contain.

Distribution of protein family sizes in PANTHER version 7. (A) The distribution of the total number of genes (in all 48 genomes) per family. The N50 is about 150, i.e. about half the genes are in families larger than 150 members, and half are in smaller families. (B) The distribution of the total number of genomes per family. Most families contain genes from over 15 different species.

Figure 1.

Distribution of protein family sizes in PANTHER version 7. (A) The distribution of the total number of genes (in all 48 genomes) per family. The N50 is about 150, i.e. about half the genes are in families larger than 150 members, and half are in smaller families. (B) The distribution of the total number of genomes per family. Most families contain genes from over 15 different species.

Improved multiple sequence alignments and HMMs

A multiple sequence alignment was constructed for each family using the MAFFT program (6) and a phylogenetic tree was estimated from the protein multiple alignment. Subfamily identifiers from version 6.1 were then ‘forward tracked’ to ancestral nodes in the version 7.0 trees whenever possible. In addition, in many cases, due to improvements in the phylogenetic trees in PANTHER 7 (see below), subfamily boundaries were refined during manual curation. After manual review and correction, if necessary, of the locations of both forward tracked and new subfamilies, a new HMM was constructed for each family and subfamily. We modified our existing HMM construction process (7) to make use of the multiple alignment from MAFFT. For PANTHER 7, we took the relevant sequences in the MAFFT alignment, trimmed it to include as match states only those columns aligned by ≥30% of the sequences in the subalignment [sequences were weighted using the same technique as in (1)], and used it to construct an initial model using the modelfromalign program in SAM3.1. We then used this initial model as input, in addition to the sequences themselves, to the buildmodel program using the same parameters as in (7). As a result, unlike in previous versions of PANTHER, the HMMs can have different lengths for different subfamilies, and now model any domains that are conserved across a single subfamily but not found in other subfamilies.

New algorithm for phylogenetic trees

PANTHER trees aim to accurately represent ‘all’ of the evolutionary events in the gene family; for PANTHER 7, this means accurately inferring speciation and gene duplication events. For the gene trees, we use a novel algorithm, GIGA (Gene tree Inference in the Genomic Age). GIGA makes use of the known species tree and the presumably complete gene sets to infer accurate gene trees and locate gene duplication events relative to speciation events. If more than one gene duplication event took place between given consecutive speciation events, this appears as a single, multifurcating duplication node (e.g. node ‘2’ in Figure 2). The algorithm also performs a fast, approximate reconstruction of ancestral protein sequences at each node in the tree, using an iterative procedure starting at the leaves of the tree (modern day sequences) that considers the descendant sequences and the nearest outgroup.

Example of human orthologs and LDO of the yeast RSP5 gene, identified using a phylogenetic tree. The figure shows part of the tree for PTHR11254 (HECT domain ubiquitin–protein ligase family), tracing the evolutionary relationship between RSP5 and its orthologs in humans, particularly its LDO, NEDD4. Orange nodes represent gene duplication events, green nodes represent speciation events, blue nodes represent subfamily nodes; in this figure blue nodes represent genes present in the bilaterian common ancestor that went on to found subfamilies. The solid outline ovals indicate the LDO pair in human and yeast, RSP5 and NEDD4 respectively. RSP5 has an additional nine orthologs in humans (dashed-outline ovals), but these have diverged to a greater degree than NEDD4. Conversely, 10 human genes have RSP5 as the ortholog, but only NEDD4 has RSP5 as the LDO. The LDO is identified by starting with the MRCA, and following the branch with the shortest length (least sequence divergence) after each gene duplication event. In this example, the MRCA is the speciation event that separated NEDD4 from RSP5 (labeled ‘1’), and there are at least two gene duplication events in the NEDD4 lineage: one at the base of the bilaterians representing multiple events that occurred in relatively rapid succession (labeled ‘2’) to create six genes in total and one at the base of the vertebrates (labeled ‘3’) to create the ancestors of NEDD4 and NEDD4L.

Figure 2.

Example of human orthologs and LDO of the yeast RSP5 gene, identified using a phylogenetic tree. The figure shows part of the tree for PTHR11254 (HECT domain ubiquitin–protein ligase family), tracing the evolutionary relationship between RSP5 and its orthologs in humans, particularly its LDO, NEDD4. Orange nodes represent gene duplication events, green nodes represent speciation events, blue nodes represent subfamily nodes; in this figure blue nodes represent genes present in the bilaterian common ancestor that went on to found subfamilies. The solid outline ovals indicate the LDO pair in human and yeast, RSP5 and NEDD4 respectively. RSP5 has an additional nine orthologs in humans (dashed-outline ovals), but these have diverged to a greater degree than NEDD4. Conversely, 10 human genes have RSP5 as the ortholog, but only NEDD4 has RSP5 as the LDO. The LDO is identified by starting with the MRCA, and following the branch with the shortest length (least sequence divergence) after each gene duplication event. In this example, the MRCA is the speciation event that separated NEDD4 from RSP5 (labeled ‘1’), and there are at least two gene duplication events in the NEDD4 lineage: one at the base of the bilaterians representing multiple events that occurred in relatively rapid succession (labeled ‘2’) to create six genes in total and one at the base of the vertebrates (labeled ‘3’) to create the ancestors of NEDD4 and NEDD4L.

Orthologs: identification of complete set of orthologs and best one-to-one (least diverged) ortholog

These improved gene trees provide the basis for accurate inference of orthologs, pairs of genes whose most recent common ancestor (MRCA) diverged due to a speciation event (8). Orthologs of each gene can be viewed on PANTHER gene pages, and the entire set of pairwise ortholog inferences can be downloaded from the PANTHER website (http://www.pantherdb.org/downloads). For orthologs, PANTHER reports not only one-to-one but also one-to-many (i.e. when gene duplication has occurred in one lineage following speciation) and many-to-many orthologs (i.e. when gene duplication has occurred in both lineages following speciation). In the case of multiple orthologs, PANTHER identifies the one-to-one relationship that has ‘diverged the least’ following any gene duplication events. The ‘least diverged ortholog’ (LDO) pairs therefore represent the most nearly ‘equivalent’ gene pairs between different organisms based on the phylogenetic tree. Following gene duplication, the most common fates of the copies are thought to be neofunctionalization (in which one copy retains the ancestral function, while the other adapts to a new function) and subfunctionalization (in which each copy specializes in a subset of the ancestral functions) (9). If neofunctionalization has occurred, the LDO is the copy predicted to retain the ancestral function, i.e. the ‘same gene’ as the ancestor. An example of ortholog and LDO identification is shown in Figure 2.

Expanded sets of genomes and sequence identifiers for PANTHER tools

Since its inception, the PANTHER website has provided, for a limited set of ‘fully supported’ genomes (human, mouse, rat and fruit fly), the following functionality: (i) stored classifications for all protein-coding genes, including family, subfamily, molecular function, biological process and pathway, (ii) visualization tools such as the whole genome pie chart view (Figure 3) of gene functions and (iii) analysis tools such as the Gene Expression Analysis Tool (10) for analyzing user-generated data relative to PANTHER classifications. For version 7, we have increased the number of fully supported genomes from 4 to 12 organisms, those participating in the GO Reference Genome Project (5), listed at the beginning of Table 1.

In addition, we have increased the number of different database identifiers supported by PANTHER tools and in searches of the PANTHER database. Previously, for genes only identifiers from NCBI Entrez Gene (17) or FlyBase (15) were supported; for proteins only RefSeq (24) or FlyBase identifiers. In PANTHER 7, we now also support identifiers from Ensembl (23), model organism databases, the International Protein Index (IPI) (25) and UniProt (18). All of these identifiers are obtained through the mapping files provided by UniProt (ftp://ftp.uniprot.org/pub/databases/uniprot/current\_release/knowledgebase/idmapping/).

Annotating a PANTHER tree with GO terms, and inferring GO terms for other genes by homology. The tree is the same as in Figure 2. The ‘x’ marks in the adjoining table (right panel) show the experimental GO annotations for each gene in the tree. For instance, yeast RSP5 has been determined experimentally to have the function ‘ubiquitin–protein ligase activity’, and be involved in the process of ‘cellular response to UV’. Based on the distribution of experimental annotations among genes, and, in some cases, the target of protein activity, one can infer annotations of ancestral genes. For instance, yeast RSP5 and human NEDD4 have been experimentally determined to operate in ‘cellular response to UV’, through targeting of the RNAPII protein for degradation, so this function was likely present in their common ancestor and inherited by descent from this ancestor. PANTHER captures this ancestral gene annotation, as well as rules for inferring functions for experimentally unannotated genes (shown with blue bars). In this example, the ancestral gene annotation allows us to infer ‘cellular response to UV’ for all least-diverged orthologs of NEDD4/RSP5 in animals and fungi. Note that different function annotations are inferred to have arisen in different ancestral genes (annotated nodes at left); this results in different inferred annotations across the genes in the family (blue bars indicating gene annotations at right). For instance, all genes in the tree can be inferred to have ‘ubiquitin–protein ligase activity’, while only a few genes (tetrapod orthologs of human NEDD4 and NEDD4L) can be inferred to have ‘sodium channel regulatory activity’ (as their targets, specific epithelial sodium channel subunits, apparently evolved first in tetrapods, not shown).

Figure 3.

Annotating a PANTHER tree with GO terms, and inferring GO terms for other genes by homology. The tree is the same as in Figure 2. The ‘x’ marks in the adjoining table (right panel) show the experimental GO annotations for each gene in the tree. For instance, yeast RSP5 has been determined experimentally to have the function ‘ubiquitin–protein ligase activity’, and be involved in the process of ‘cellular response to UV’. Based on the distribution of experimental annotations among genes, and, in some cases, the target of protein activity, one can infer annotations of ancestral genes. For instance, yeast RSP5 and human NEDD4 have been experimentally determined to operate in ‘cellular response to UV’, through targeting of the RNAPII protein for degradation, so this function was likely present in their common ancestor and inherited by descent from this ancestor. PANTHER captures this ancestral gene annotation, as well as rules for inferring functions for experimentally unannotated genes (shown with blue bars). In this example, the ancestral gene annotation allows us to infer ‘cellular response to UV’ for all least-diverged orthologs of NEDD4/RSP5 in animals and fungi. Note that different function annotations are inferred to have arisen in different ancestral genes (annotated nodes at left); this results in different inferred annotations across the genes in the family (blue bars indicating gene annotations at right). For instance, all genes in the tree can be inferred to have ‘ubiquitin–protein ligase activity’, while only a few genes (tetrapod orthologs of human NEDD4 and NEDD4L) can be inferred to have ‘sodium channel regulatory activity’ (as their targets, specific epithelial sodium channel subunits, apparently evolved first in tetrapods, not shown).

Pathway diagrams using SBGN

PANTHER 7 has adopted the Systems Biology Graphical Notation (SBGN) standard (26) for the 165 pathway diagrams currently available on the PANTHER website. This standard was recently released at http://sbgn.org and provides a consistent semantics for symbols used in pathway diagrams.

Collaboration with GO Consortium

For almost 2 years now, there has been a formal collaboration between the Gene Ontology Consortium and the PANTHER database (5). As a result, in PANTHER 7, all molecular function, biological process and cellular component terms are exclusively GO terms [previous versions of PANTHER used the PANTHER/X ontology (1), though a mapping file to GO was provided]. The PANTHER/X biological process ontology has been retired, but we have retained the PANTHER/X molecular function ontology and renamed it ‘Protein Class’ since many terms are quite different from those in GO, and we have gotten considerable feedback from users about its utility.

As part of the GO Reference Genome Project, GO curators are annotating trees from the PANTHER database with GO terms describing molecular function, biological process and cellular component. As described in (5), the goal of this project is to provide accurate, complete and consistent GO annotations for all genes in 12 model organism genomes. GO terms based on experimental data from the scientific literature are used to annotate ancestral genes in the phylogenetic tree; thus, unannotated descendants of these ancestral genes are inferred to have inherited these same GO annotations by descent. An example of this annotation process is shown in Figure 3.

This rigorous process for evolutionary inference provides a means for accurate inference of GO annotations by homology, as well as a means for comparing and consistency-checking annotations for related genes. While earlier versions of PANTHER have allowed annotation of ‘subfamily nodes’ (i.e. ancestral genes that founded a particular subfamily), this more generalized GO annotation process requires all ancestral genes to be annotatable in principle, which has only become supported with the release of PANTHER 7. For most end users, perhaps the most relevant outcomes of this collaboration will be: (i) an increased number of GO annotations, especially those inferred by homology and (ii) the ability to trace all of the evidence behind each homology-based annotation. This evidence includes not only the gene that was experimentally demonstrated to perform a particular function (and the scientific publication reporting the experiment), but also the ancestral gene in which the function was inferred to have evolved. In the long term, all PANTHER ontology annotations will be migrated to this new standard.

FUNDING

National Institute of General Medical Sciences (GM081084). Funding for open access: SRI International.

Conflict of interest statement. None declared.

REFERENCES

, , , , , , , , .

PANTHER: a library of protein families and subfamilies indexed by function

.

Genome Res.

(

2003

)

13

:

2129

2141

.

, , , , , , , , , , et al.

InterPro: the integrative protein signature database

.

Nucleic Acids Res.

(

2009

)

37

:

D211

D215

.

, , , , , , , , , , et al.

Gene ontology: tool for the unification of biology. The Gene Ontology Consortium

.

Nat. Genet.

(

2000

)

25

:

25

29

.

The Gene Ontology Consortium

.

The Gene Ontology in 2010: extensions and refinements

.

Nucleic Acids Res.

(

2010

)

38

:

D331

D335

.

, , , , , , , , , , et al.

The Gene Ontology's; Reference Genome Project: a unified framework for functional annotation across species

.

PLoS Comput. Biol.

(

2009

)

5

:

e1000431

.

, .

Recent developments in the MAFFT multiple sequence alignment program

.

Brief Bioinformatics

(

2008

)

9

:

286

298

.

, , , .

PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways

.

Nucleic Acids Res.

(

2007

)

35

:

D247

D252

.

.

Distinguishing homologous from analogous proteins

.

Syst. Zool.

(

1970

)

19

:

99

113

.

, .

The altered evolutionary trajectories of gene duplicates

.

Trends Genet.

(

2004

)

20

:

544

549

.

, , , , , , .

Applications for protein sequence-function evolution data: mRNA/protein expression analysis and coding SNP scoring tools

.

Nucleic Acids Res.

(

2006

)

34

:

W645

W650

.

, , , , , , , , , , et al.

The Arabidopsis Information Resource (TAIR): gene structure and function annotation

.

Nucleic Acids Res.

(

2008

)

36

:

D1009

D1014

.

, , , , , , , , , , et al.

WormBase 2007

.

Nucleic Acids Res.

(

2008

)

36

:

D612

D617

.

, , , , , , , , , , et al.

The Zebrafish Information Network: the zebrafish model organism database provides expanded support for genotypes and phenotypes

.

Nucleic Acids Res.

(

2008

)

36

:

D768

D772

.

, , , , , , , , , , et al.

dictyBase–a Dictyostelium bioinformatics resource update

.

Nucleic Acids Res.

(

2009

)

37

:

D515

D519

.

, , , , , , , , , , et al.

FlyBase: enhancing Drosophila Gene Ontology annotations

.

Nucleic Acids Res.

(

2009

)

37

:

D555

D559

.

, , , , , , , , , , et al.

EcoCyc: a comprehensive view of Escherichia coli biology

.

Nucleic Acids Res.

(

2009

)

37

:

D464

D470

.

, , , .

Entrez Gene: gene-centered information at NCBI

.

Nucleic Acids Res.

(

2007

)

35

:

D26

D31

.

, , , , , , , , , .

The Universal Protein Resource (UniProt) 2009

.

Nucleic Acids Res.

(

2009

)

37

:

D169

D174

.

, , , , .

The Mouse Genome Database genotypes::phenotypes

.

Nucleic Acids Res.

(

2009

)

37

:

D712

D719

.

, , , , , , , , , , et al.

The Rat Genome Database 2009: variation, ontologies and pathways

.

Nucleic Acids Res.

(

2009

)

37

:

D744

D749

.

, , , , , , , , , , et al.

Gene Ontology annotations at SGD: new data sources and annotation methods

.

Nucleic Acids Res.

(

2008

)

36

:

D577

D581

.

, , , , , , , , , , et al.

GeneDB: a resource for prokaryotic and eukaryotic organisms

.

Nucleic Acids Res.

(

2004

)

32

:

D339

D343

.

, , , , , , , , , , et al.

Ensembl 2009

.

Nucleic Acids Res.

(

2009

)

37

:

D690

D697

.

, , , .

NCBI Reference Sequences: current status, policy and new initiatives

.

Nucleic Acids Res.

(

2009

)

37

:

D32

D36

.

, , , , , .

The International Protein Index: an integrated database for proteomics experiments

.

Proteomics

(

2004

)

4

:

1985

1988

.

, , , , , , , , , , et al.

The Systems Biology Graphical Notation

.

Nat. Biotechnol.

(

2009

)

27

:

735

741

.

© The Author(s) 2009. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

I agree to the terms and conditions. You must accept the terms and conditions.

Submit a comment

Name

Affiliations

Comment title

Comment

You have entered an invalid code

Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.

Citations

Views

Altmetric

Metrics

Total Views 3,041

2,283 Pageviews

758 PDF Downloads

Since 4/1/2017

Month: Total Views:
April 2017 10
May 2017 7
June 2017 8
July 2017 14
August 2017 14
September 2017 9
October 2017 4
November 2017 10
December 2017 39
January 2018 21
February 2018 54
March 2018 28
April 2018 26
May 2018 34
June 2018 29
July 2018 39
August 2018 27
September 2018 22
October 2018 34
November 2018 33
December 2018 34
January 2019 46
February 2019 24
March 2019 42
April 2019 48
May 2019 48
June 2019 32
July 2019 27
August 2019 52
September 2019 62
October 2019 52
November 2019 24
December 2019 28
January 2020 21
February 2020 37
March 2020 31
April 2020 24
May 2020 26
June 2020 43
July 2020 47
August 2020 44
September 2020 50
October 2020 36
November 2020 55
December 2020 36
January 2021 37
February 2021 39
March 2021 53
April 2021 39
May 2021 35
June 2021 34
July 2021 37
August 2021 23
September 2021 38
October 2021 41
November 2021 36
December 2021 26
January 2022 28
February 2022 40
March 2022 35
April 2022 35
May 2022 44
June 2022 43
July 2022 24
August 2022 44
September 2022 32
October 2022 39
November 2022 48
December 2022 41
January 2023 37
February 2023 34
March 2023 41
April 2023 26
May 2023 20
June 2023 25
July 2023 12
August 2023 37
September 2023 30
October 2023 35
November 2023 28
December 2023 34
January 2024 40
February 2024 40
March 2024 39
April 2024 44
May 2024 32
June 2024 30
July 2024 40
August 2024 36
September 2024 39
October 2024 20

Citations

440 Web of Science

×

Email alerts

Citing articles via

More from Oxford Academic