Pseudofam: the pseudogene families database - PubMed (original) (raw)

. 2009 Jan;37(Database issue):D738-43.

doi: 10.1093/nar/gkn758. Epub 2008 Oct 28.

Affiliations

Pseudofam: the pseudogene families database

Hugo Y K Lam et al. Nucleic Acids Res. 2009 Jan.

Abstract

Pseudofam (http://pseudofam.pseudogene.org) is a database of pseudogene families based on the protein families from the Pfam database. It provides resources for analyzing the family structure of pseudogenes including query tools, statistical summaries and sequence alignments. The current version of Pseudofam contains more than 125,000 pseudogenes identified from 10 eukaryotic genomes and aligned within nearly 3000 families (approximately one-third of the total families in PfamA). Pseudofam uses a large-scale parallelized homology search algorithm (implemented as an extension of the PseudoPipe pipeline) to identify pseudogenes. Each identified pseudogene is assigned to its parent protein family and subsequently aligned to each other by transferring the parent domain alignments from the Pfam family. Pseudogenes are also given additional annotation based on an ontology, reflecting their mode of creation and subsequent history. In particular, our annotation highlights the association of pseudogene families with genomic features, such as segmental duplications. In addition, pseudogene families are associated with key statistics, which identify outlier families with an unusual degree of pseudogenization. The statistics also show how the number of genes and pseudogenes in families correlates across different species. Overall, they highlight the fact that housekeeping families tend to be enriched with a large number of pseudogenes.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

The generation of pseudofam. (1) Identify pseudogenes by existing proteins of the genome. (2) Map all the parent proteins to their protein families. (3) Assign the identified pseudogenes to their parent protein families. (4) Align the pseudogenes in each family to build the pseudogene families. (5) Calculate the key statistics for the families and organize the data into the Pseudofam database.

Figure 2.

Figure 2.

The alignment of pseudogene family. Each pseudogene in a family is first aligned to its parent protein. Then, the pseudogene alignment is aligned with the parent protein domain by transferring the corresponding alignment from the Pfam multiple alignments. At last, all the aligned pseudogene domains, including their aligned parent protein domains, will be adjusted together to generate the final alignment.

Figure 3.

Figure 3.

The Pseudogene family ontology. An upper ontology that describes the various relationships between a pseudogene family and other genomic elements. The solid lines represent direct relationships and the dashed lines represent inferred or indirect relationships. The core part is represented in blue, while the well-established relationships are in dark gray and the secondary aspects of a pseudogene family are in light gray. For detailed concepts and relationships about pseudogene, see

Supplementary Figure S1

.

References

    1. Gerstein M, Zheng D. The real life of pseudogenes. Sci. Am. 2006;295:48–55. - PubMed
    1. Ortutay C, Vihinen M. PseudoGeneQuest - service for identification of different pseudogene types in the human genome. BMC Bioinformatics. 2008;9:299. - PMC - PubMed
    1. Yao A, Charlab R, Li P. Systematic identification of pseudogenes through whole genome expression evidence profiling. Nucleic Acids Res. 2006;34:4477–4485. - PMC - PubMed
    1. Zhang Z, Carriero N, Zheng D, Karro J, Harrison PM, Gerstein M. PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics. 2006;22:1437–1439. - PubMed
    1. Harrison PM, Gerstein M. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J. Mol. Biol. 2002;318:1155–1174. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources