Inparanoid: a comprehensive database of eukaryotic orthologs - PubMed (original) (raw)

Inparanoid: a comprehensive database of eukaryotic orthologs

Kevin P O'Brien et al. Nucleic Acids Res. 2005.

Abstract

The Inparanoid eukaryotic ortholog database (http://inparanoid.cgb.ki.se/) is a collection of pairwise ortholog groups between 17 whole genomes; Anopheles gambiae, Caenorhabditis briggsae, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Takifugu rubripes, Gallus gallus, Homo sapiens, Mus musculus, Pan troglodytes, Rattus norvegicus, Oryza sativa, Plasmodium falciparum, Arabidopsis thaliana, Escherichia coli, Saccharomyces cerevisiae and Schizosaccharomyces pombe. Complete proteomes for these genomes were derived from Ensembl and UniProt and compared pairwise using Blast, followed by a clustering step using the Inparanoid program. An Inparanoid cluster is seeded by a reciprocally best-matching ortholog pair, around which inparalogs (should they exist) are gathered independently, while outparalogs are excluded. The ortholog clusters can be searched on the website using Ensembl gene/protein or UniProt identifiers, annotation text or by Blast alignment against our protein datasets. The entire dataset can be downloaded, as can the Inparanoid program itself.

PubMed Disclaimer

Figures

Figure 1

Figure 1

A hypothetical gene tree and the resulting Inparanoid clusters are shown to illustrate inparalog (and thus co-ortholog) and outparalog assignments. (a) Protein A in an ancestral species ‘A’ undergoes a gene duplication. A speciation event occurs which gives rise to the two lineages leading to species ‘B’ and ‘C’. In the C genome the genes C2 and C3 are inparalogs since their gene duplication occurred after speciation; they are co-orthologous to the B2 gene (one common ancestral protein upon speciation). B1 is an outparalog of the C2 and C3 genes, as are B1 of B2 (duplication and divergence prior to speciation). (b) B2 and C2 are the original seed-ortholog pair (all inparalogs are clustered around this pair), thus both receiving an inparalog score of 1.0. Other inparalogs (in this case C3) are scored according to their relative similarity to the seed-inparalog (here C2). Inparalog score of C3 = (Blast[C2:_C3_]−Blast[C2:_B2_])/(Blast[C2:_C2_]−Blast[C2:_B2_]) where Blast[X:Y] is the averaged blast score between X and Y in bits. In this case C2 is relatively more similar to B2 than C3 is, and thus C3 receives a lower inparalog score (0.7). C1 and B1 are orthologous to each other but are outparalogs of the other cluster and thus form a cluster of their own.

Figure 2

Figure 2

An Inparanoid cluster is a representation of genes thought to share a single ancestral gene upon speciation. In this example output only human–mosquito and human–worm clusters are shown. In the human–worm cluster, the two human genes are inparalogs, i.e. resulted from a gene duplication after the speciation from worm occurred. They are thus both co-orthologous to the worm gene. The Inparanoid score is a measure of how similar an inparalog is to the inparalog that is the main ortholog. If they are identical the score is 1.0, but as the similarity drops towards the similarity of the main orthologs, the score goes to 0.0 (see Figure 1). For example, ENSP00000343386 is less similar to the C.briggsae gene than is ENSP00000322439, but both these human genes are still co-orthologous to it. The bootstrapping score is a measure of how reliably that gene is the main ortholog. Gene and protein identifiers are hyperlinks to the relevant databases for each species.

Figure 3

Figure 3

Using a gene name as a search generates a list of possible gene hits. Clicking on the ‘Search for Clusters’ icon queries the database as to whether a gene occurs in an Inparanoid-cluster in all organisms. Clicking on the cluster name generates a multiple FASTA file for this cluster and performs a multiple alignment using the Kalign program. These can be used to check the validity of the cluster in question and can be saved to disk.

Similar articles

Cited by

References

    1. Remm M., Storm,C.E. and Sonnhammer,E.L. (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol., 314, 1041–1052. - PubMed
    1. O'Brien K.P., Westerlund,I. and Sonnhammer,E.L. (2004) OrthoDisease: a database of human disease orthologs. Hum. Mutat., 24, 112–119. - PubMed
    1. Sonnhammer E.L. and Koonin,E.V. (2002) Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet., 18, 619–620. - PubMed
    1. Boeckmann B., Bairoch,A., Apweiler,R., Blatter,M.C., Estreicher,A., Gasteiger,E., Martin,M.J., Michoud,K., O'Donovan,C., Phan,I. et al. (2003) The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res., 31, 365–370. - PMC - PubMed
    1. Birney E., Andrews,D., Bevan,P., Caccamo,M., Cameron,G., Chen,Y., Clarke,L., Coates,G., Cox,T., Cuff,J. et al. (2004) Ensembl 2004. Nucleic Acids Res., 32, D468–D470. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources