Comparative analysis of pseudogenes across three phyla - PubMed (original) (raw)

Comparative Study

. 2014 Sep 16;111(37):13361-6.

doi: 10.1073/pnas.1407293111. Epub 2014 Aug 25.

Baikang Pei 2, Jing Leng 2, Adam Frankish 3, Yan Zhang 2, Suganthi Balasubramanian 4, Rachel Harte 5, Daifeng Wang 2, Michael Rutenberg-Schoenberg 2, Wyatt Clark 2, Mark Diekhans 5, Joel Rozowsky 4, Tim Hubbard 3, Jennifer Harrow 3, Mark B Gerstein 6

Affiliations

Comparative Study

Comparative analysis of pseudogenes across three phyla

Cristina Sisu et al. Proc Natl Acad Sci U S A. 2014.

Abstract

Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism's genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (∼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.

Keywords: functional genomics; genome annotation; transcriptomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.

Fig. 1.

Annotation, classification, and evolution. (A) Pseudogene annotation and ENCODE functional data availability. (B) Distribution of processed pseudogenes as a function of pseudogene age (sequence similarity to parent genes) for human (Left) and worm and fly (Right). (C) Pseudogene disablement variation and density.

Fig. 2.

Fig. 2.

Localization and mobility. (A, Left) The relative chromosomal localization preference for pseudogenes in human, worm, and fly. (Right) Average recombination rates for pseudogenes, protein-coding genes, and genomic background. (B) Distributions of processed and duplicated pseudogenes across chromosomes, sorted by length. (C) Pseudogene exchange between sex chromosomes and autosomes in humans.

Fig. 3.

Fig. 3.

Orthologs, paralogs, and families. (A) Venn diagrams showing the total number of orthologous genes and pseudogenes, in human, worm, and fly. (Right) Pseudogene orthologs between human and mouse. (B) Per chromosome distribution of RpS6 pseudogenes in human, worm, and fly. (C) Comparative distribution of pseudogene and paralogs per gene. (D) Top pseudogene families that give rise to 25% of the total number of pseudogenes in each organism (Left, family type; Right, number of pseudogenes). Oval rows indicate the collapse of two or more consecutive families of the same type. 7tm, G protein-coupled receptors; His, histone; IG, Ig; Kin, kinase; Ploop, P-loop NTPase proteins; Ribo, ribosomal proteins; RRM, RNA recognition motifs; Struct, structural protein; ZnF, Zinc finger proteins (TF); Ubq, ubiquitination proteins; Motor, kinesin motor domain proteins; SAP, SAP domain proteins.

Fig. 4.

Fig. 4.

Pseudogene activity. (A) Distribution of pseudogenes as a function of various activity features: transcription (Tnx), active chromatin (AC), and presence of active Pol II and TF binding sites in the upstream region. (B) Conservation of the upstream sequences in processed and duplicated pseudogenes compared with paralogs. (C) Conservation of an upstream sequence activity mark (H3K27Ac) in pseudogene-parent pairs vs. parent-paralogs. +, active H3K27Ac; −, inactivity. We find that the majority of parent–paralog pairs have coordinated H3K27Ac activity (larger diagonal values) as opposed to parent–pseudogene pairs (larger off-diagonal values). (D) Functional pseudogene candidates with translation evidence.

Similar articles

Cited by

References

    1. Zheng D, et al. Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution. Genome Res. 2007;17(6):839–851. - PMC - PubMed
    1. Zhang Z, et al. PseudoPipe: An automated pseudogene identification pipeline. Bioinformatics. 2006;22(12):1437–1439. - PubMed
    1. Harrison PM, et al. Molecular fossils in the human genome: Identification and analysis of the pseudogenes in chromosomes 21 and 22. Genome Res. 2002;12(2):272–280. - PMC - PubMed
    1. Pei B, et al. The GENCODE pseudogene resource. Genome Biol. 2012;13(9):R51. - PMC - PubMed
    1. Harrison PM, Zheng D, Zhang Z, Carriero N, Gerstein M. Transcribed processed pseudogenes in the human genome: An intermediate form of expressed retrosequence lacking protein-coding ability. Nucleic Acids Res. 2005;33(8):2374–2383. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources