Transcription-mediated gene fusion in the human genome - PubMed (original) (raw)

Transcription-mediated gene fusion in the human genome

Pinchas Akiva et al. Genome Res. 2006 Jan.

Abstract

Transcription of a gene usually ends at a regulated termination point, preventing the RNA-polymerase from reading through the next gene. However, sporadic reports suggest that chimeric transcripts, formed by transcription of two consecutive genes into one RNA, can occur in human. The splicing and translation of such RNAs can lead to a new, fused protein, having domains from both original proteins. Here, we systematically identified over 200 cases of intergenic splicing in the human genome (involving 421 genes), and experimentally demonstrated that at least half of these fusions exist in human tissues. We showed that unique splicing patterns dominate the functional and regulatory nature of the resulting transcripts, and found intergenic distance bias in fused compared with nonfused genes. We demonstrate that the hundreds of fused genes we identified are only a subset of the actual number of fused genes in human. We describe a novel evolutionary mechanism where transcription-induced chimerism followed by retroposition results in a new, active fused gene. Finally, we provide evidence that transcription-induced chimerism can be a mechanism contributing to the evolution of protein complexes.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

A model for transcription-induced chimerism. The transcribed region spans both consecutive genes. When the pre-mRNA is spliced, it involves a 5′ splice site at the upstream gene and a 3′ splice site at the downstream gene, thus removing the intergenic region from the mature fused mRNA. The product is a hybrid mRNA containing exons from both genes.

Figure 2.

Figure 2.

Intergenic splicing patterns. Exons (dark boxes) are numbered from 1 to n. Percentage of events is calculated out of the 212 events detected in our computational search. Thin triangles mark the intergenic splicing pattern. “GT” and “AG” stand for the 5′ and 3′ splice sites, respectively. Splice sites used for the intergenic splicing are in bold. Patterns are not mutually exclusive, and hence, the percentages sum up to more than 100%.

Figure 3.

Figure 3.

Intergenic distances distribution. Compared are a data set of “all genes,” representing distances between 12,395 consecutive RefSeq pairs residing on the same strand in the human genome, and “fused genes,” representing distances between the 212 genes in the current analysis. _x_-axis, intergenic distance in kilobase, divided to bins of 10 kb (i.e., the “20” bin corresponds to distances between 10,001 and 20,000 bp). _y_-axis, percent of gene pairs out of each data set. Notably, fused genes tend to reside much closer on the genome than the entire population of gene pairs.

Figure 4.

Figure 4.

Selected examples of experimentally verified transcription induced chimeras. Shown are snapshots taken from the UCSC genome browser (http://genome.ucsc.edu/), presenting the alignment of expressed sequences to the genome, as well as the location of CpG islands (Gardiner-Garden and Frommer 1987). A total of 70% of the downstream genes involved in TICs possess CpG islands in their 5′ regions, indicating that they are also regulated as single genes. Boxes represent exons, with thinner boxes representing the untranslated regions (UTRs). Arrowed thin lines represent introns. “PCR_verified” represents the chimeric sequence validated by RT–PCR. Beneath are RT–PCR results showing the fusion events (see Methods). (A) Fusion transcript between NME1 and NME2 creating a predicted fused protein. This transcript is supported by 30 ESTs (one is shown in the figure), and was found to be ubiquitously expressed in human tissues. The same fusion event was also detected in mouse ESTs. In the gel image—lanes a indicate the NME1 wild-type (WT) transcript and lanes b indicate the fused (TIC) NME1-NME2 transcript. (B) Fusion transcript between sialophorin (SPN) and quinolinate phosphoribosyltransferase (QPRT) demonstrates the donation of SPN regulatory sequence (5′UTR) to the QPRT transcript. The fusion was experimentally detected in RNAs from the Farage cell line (B-lymphoma). In the gel image—lanes a indicate the wild-type SPN and lanes b indicate the fused (TIC) SPN-QPRT transcript (validated product marked by black arrow). (C) Fusion transcript between phosphatidylinositol-4-phosphate 5-kinase (PIP5K1A) and proteasome 26S subunit non-ATPase 4 (PSMD4) on chromosome 1q21. The fusion event was discovered by identification of retroposed, chimeric processed expressed pseudogene residing on chromosome 10q23. No EST supports this fusion, but it was verified by RT–PCR. The RNA BC068549, expressed from the processed pseudogene on chromosome 10, is shown aligning to both genes. In the gel image—lanes a indicate the fused (TIC) PIP5K1A-PSMD4 transcript on chromosome 1 (validated product marked by black arrow); lanes b indicate the PIP5K1A (WT), and lanes c indicate the active transcription of the retroposed gene from chromosome 10, which was found to be ubiquitously expressed. Primers were designed from regions that are diverged between the fusion transcript and the retrogene, so that each product will be uniquely amplified (Supplemental Table S2). Products were verified by direct sequencing.

Similar articles

Cited by

References

    1. Communi, D., Suarez-Huerta, N., Dussossoy, D., Savi, P., and Boeynaems, J.M. 2001. Cotranscription and intergenic splicing of human P2Y11 and SSF1 genes. J. Biol. Chem. 276 16561-16566. - PubMed
    1. Cox, P.R., Siddique, T., and Zoghbi, H.Y. 2001. Genomic organization of Tropomodulins 2 and 4 and unusual intergenic and intraexonic splicing of YL-1 and Tropomodulin 4. BMC Genomics 2 7. - PMC - PubMed
    1. Enright, A.J. and Ouzounis, C.A. 2001. Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions. Genome Biol. 2 research0034. - PMC - PubMed
    1. Fears, S., Mathieu, C., Zeleznik-Le, N., Huang, S., Rowley, J.D., and Nucifora, G. 1996. Intergenic splicing of MDS1 and EVI1 occurs in normal tissues as well as in myeloid leukemia and produces a new member of the PR domain family. Proc. Natl. Acad. Sci. 93 1642-1647. - PMC - PubMed
    1. Gardiner-Garden, M. and Frommer, M. 1987. CpG islands in vertebrate genomes. J. Mol. Biol. 196 261-282. - PubMed

Web site references

    1. http://www.ncbi.nlm.nih.gov/Genbank/; NCBI GenBank version 136 (June 2003).
    1. http://www.ncbi.nlm.nih.gov/genome/guide/human/; human genome build 33 (April 2003).
    1. http://genome.ucsc.edu/; This site contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides a portal to the ENCODE project.

MeSH terms

LinkOut - more resources