Sequence patterns indicate an enzymatic involvement in integration of mammalian retroposons - PubMed (original) (raw)

Sequence patterns indicate an enzymatic involvement in integration of mammalian retroposons

J Jurka. Proc Natl Acad Sci U S A. 1997.

Abstract

It is commonly accepted that the reverse-transcribed cellular RNA molecules, called retroposons, integrate at staggered breaks in mammalian chromosomes. However, unlike what was previously thought, most of the staggered breaks are not generated by random nicking. One of the two nicks involved is primarily associated with the 5'-TTAAAA hexanucleotide and its variants derived by a single base substitution, particularly A --> G and T --> C. It is probably generated in the antisense strand between the consensus bases 3'-AA and TTTT complementary to 5'-TTAAAA. The sense strand is nicked at variable distances from the TTAAAA consensus site toward the 3' end, preferably within 15-16 base pairs. The base composition near the second nicking site is also nonrandom at positions preceding the nick. On the basis of the observed sequence patterns it is proposed that integration of mammalian retroposons is mediated by an enzyme with endonucleolytic activity. The best candidate for such enzyme may be the reverse transcriptase encoded by the L1 non-long-terminal-repeat retrotransposon, which contains a freshly reported domain homologous to the apurinic/apyrimidinic (AP) endonuclease family [Martin, F., Olivares, M., Lopez, M. C. & Alonso, C. (1996) Trends Biochem. Sci. 21, 283-285; Feng, Q., Moran, J. V., Kazazian, H. H. & Boeke, J. D. (1996) Cell 87, 905-916] and shows nicking in vitro with preference for targets similar to 5'-TTAAAA/3'-AATTTT consensus sequence [Feng, Q., Moran, J. V., Kazazian, H. H. & Boeke, J. D. (1996) Cell 87, 905-916]. A model for integration of mammalian retroposons based on the presented data is discussed.

PubMed Disclaimer

Figures

Figure 1

Figure 1

χ2 values for individual positions of Alu and ID flanking repeats and of adjacent regions. χ2 = Σ_i_ = 14(O iE i)2/E i; E i = (Total) × (Composition_i_), where individual base occurrences (_O_i), total numbers of bases at different positions (Total), and the base compositions (Comp.) for the top and bottom part of the figure are from Tables 1 and 2, respectively. χ2 values above the broken horizontal line correspond to significance levels of P < 0.001 for 3 degrees of freedom. The discrete χ2 values were connected by lines to improve the presentation. (a) A cluster of significant χ2 values occurs at positions −2 through +4 around the 5′ ends of flanking repeats. (b) Significant χ2 values are near the 3′ ends of flanking repeats at positions −4, −3, and −2 immediately preceding the presumed antisense nicking site which is between positions −1 and +1. The significant nonrandomness around positions −11 through −16 corresponds to the 5′ end of flanking repeats presented above. Significant χ2 values not shared by Alu and ID flanking repeats are not considered here. (c) The overall scheme indicating mutual orientation of retroposed elements, flanking repeats, and adjacent regions.

Figure 2

Figure 2

χ2 values for individual positions of 30-bp regions preceding 5′ ends of Alu retroposons. The χ2 values were calculated and presented as explained in the legend of Fig. 1, except that flanking repeats and the adjacent regions have not been adjusted and the expected values were calculated using total base composition of the 30-bp sequence segments which include flanking repeats and adjacent regions. All 3′ ends of the 30-bp regions correspond to position −1, immediately preceding the 5′ ends of Alu sequences. (a) χ2 values for the 30-bp regions which include long (>9 bp) and short (4–9 bp) flanking repeats. (b) Analogous χ2 values of two randomly selected sets of 30-bp human sequences as described in Materials and Methods. The χ2 values for these random sets are well below the broken horizontal line, which corresponds to P = 0.001.

Figure 3

Figure 3

Examples of direct repeats flanking diverse processed pseudogenes from GenBank 97.0. GenBank accession numbers are listed before each sequence. The flanking repeats are indicated in uppercase letters and the adjacent sequences in lowercase. The omitted portion of each pseudogene is marked by ∼. The pseudogenes listed are as follows: (I) human ferritin H, α-enolase, cytoplasmic 7SL RNA, thiopurine methyltransferase, β-tubulin, γ-actin-like, and chromosomal protein HMG-17; (II) rodent α-tubulin, δ-aminolevulinate dehydratase, cytoplasmic γ-actin, metallothionein 1, cellular tumor antigen p53, small nuclear RNA U3 (rat), small nuclear 4.5S RNA(I), small nuclear RNA U3 (mouse), ribosomal protein L35a-related, thymidine kinase, VH 7183 family for immunoglobulin heavy chain, 7SK RNA, cytochrome oxidase subunit VIa, and acyl-CoA-binding/diazepam binding inhibitor; (III) other mammals, rabbit short interspersed C repeat (SINE) and _B2_-like repeat from mink; and (IV) frog (Rana catesbeiana) apoferritin pseudogene.

Figure 4

Figure 4

Model for retroposon integration in mammals. (a) Enzymatic nicking in the presence of RNA indicated by a vertical black arrow. (b) Synthesis of cDNA, indicated by a dotted line, and formation of the second nick, indicated by a black arrow pointed down. (c) Completion of the reverse transcription and DNA-dependent DNA synthesis, indicated by a dashed line and the lowercase letters, followed by ligation. (d) Elimination of RNA and synthesis of the second DNA strand. Modified after Luan et al. (19).

References

    1. Weiner A M, Deininger P L, Efstratiadis A. Annu Rev Biochem. 1986;55:631–661. - PubMed
    1. Deininger P L. In: Mobile DNA. Berg D E, Howe M M, editors. Washington, DC: Am. Soc. Microbiol.; 1989. pp. 619–636.
    1. Deininger P L, Batzer M A. In: Evolutionary Biology. Hecht M K, MacIntyre R J, Clegg M T, editors. Vol. 27. New York: Plenum; 1993. pp. 157–196.
    1. Van Arsdell S W, Denison R A, Bernstein L B, Weiner A M. Cell. 1981;26:11–17. - PubMed
    1. Moos M, Gallwitz D. EMBO J. 1983;2:757–761. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources