Footprint of APOBEC3 on the genome of human retroelements - PubMed (original) (raw)

Footprint of APOBEC3 on the genome of human retroelements

Firoz Anwar et al. J Virol. 2013 Jul.

Abstract

Almost half of the human genome is composed of transposable elements. The genomic structures and life cycles of some of these elements suggest they are a result of waves of retroviral infection and transposition over millions of years. The reduction of retrotransposition activity in primates compared to that in nonprimates, such as mice, has been attributed to the positive selection of several antiretroviral factors, such as apolipoprotein B mRNA editing enzymes. Among these, APOBEC3G is known to mutate G to A within the context of GG in the genome of endogenous as well as several exogenous retroelements (the underlining marks the G that is mutated). On the other hand, APOBEC3F and to a lesser extent other APOBEC3 members induce G-to-A changes within the nucleotide GA. It is known that these enzymes can induce deleterious mutations in the genome of retroviral sequences, but the evolution and/or inactivation of retroelements as a result of mutation by these proteins is not clear. Here, we analyze the mutation signatures of these proteins on large populations of long interspersed nuclear element (LINE), short interspersed nuclear element (SINE), and endogenous retrovirus (ERV) families in the human genome to infer possible evolutionary pressure and/or hypermutation events. Sequence context dependency of mutation by APOBEC3 allows investigation of the changes in the genome of retroelements by inspecting the depletion of G and enrichment of A within the APOBEC3 target and product motifs, respectively. Analysis of approximately 22,000 LINE-1 (L1), 24,000 SINE Alu, and 3,000 ERV sequences showed a footprint of GG→AG mutation by APOBEC3G and GA→AA mutation by other members of the APOBEC3 family (e.g., APOBEC3F) on the genome of ERV-K and ERV-1 elements but not on those of ERV-L, LINE, or SINE.

PubMed Disclaimer

Figures

Fig 1

Fig 1

Diagnostic ratio of hA3G versus that of hA3F for the positive strand of the ERV-K family. Each point in this plot is a sequence belonging to ERV-K (red circles), nonhypermutated HIV-1 (green plus signs), HIV-1 hypermutated by hA3G (purple plus signs), HIV-1 hypermutated by hA3F (blue plus signs). The 99% confidence interval (CI) of human genes is shown by a magenta dotted oval. The calculated DR of hA3G and that of hA3F for the previously reported hypermutated HERV-K sequences are indicated by circled numbers 1 (HERV-K-158c3) and 2 (HERV-K-11c21). Genomic locations of three sequences with signatures of hA3F (indicated by blue arrow) are as follows: chr15.+0.84829019.84832364, chrY.-0.26397836.26401035, and chrY.+0.27561401.27564601. The chromosomal locations were extracted using a RepeatMasker tag. Each sequence description has a chromosome number, strand, starting position, and end position, each separated by a period.

Fig 2

Fig 2

Alignment of portions of the flanking sequences of three hypermutated HERV-K elements at 15q25.2, Yq11.23a, and Yq11.23b. There is a 129-bp deletion in the sequence flanking the 3′ end of the element 15q25.2. The element Yq11.23a is on the negative DNA strand. The darker the color is at each position, the higher the similarity is between sequences. Black color indicates 100% identity. This image was generated using the Geneious software.

Fig 3

Fig 3

The plot of DRhA3G versus DRhA3F for the sequences flanking the HERV-K elements. The flanking sequences of the three hypermutated HERV-K elements (two of which are overlapping) are shown by squares. As displayed, the flanking sequences form a single cluster without an outlier in the hA3F direction.

Fig 4

Fig 4

Diagnostic ratio of hA3G versus that of hA3F for the negative strand of the ERV-K family. Each point in this plot is a sequence belonging to ERV-K (red circles), nonhypermutated HIV-1 (green plus signs), HIV-1 hypermutated by hA3G (purple plus signs), and HIV-1 hypermutated by hA3F (blue plus signs). The calculated DR of hA3G and that of hA3F for the negative strand of the previously reported hypermutated HERV-K sequences are indicated by circled numbers 1 (HERV-K-158c3) and 2 (HERV-K-11c21).

Fig 5

Fig 5

Diagnostic ratio of hA3G versus that of hA3F in ERV-1 (A), ERV-L (B), L1 (C), and SINE Alu (D) families. Each point in this plot is a sequence belonging to ERV-1 (red circles) in panel A, ERV-L (red circles) in panel B, L1 (red circles) in panel C, SINE-Alu (red circles) in panel D, nonhypermutated HIV-1 (green plus signs), HIV-1 hypermutated by hA3G (purple plus signs), HIV-1 hypermutated by hA3F (blue plus signs). The 99% confidence interval (CI) of human genes is shown by a magenta dotted oval.

Fig 6

Fig 6

Analysis of the hypermutation footprint of hA3G and hA3F on 13 HERV-S71 sequences from the ERV-1 family. Sequences were aligned using MAFT, and a consensus sequence was generated from 25 HERV-S71 sequences that had normal DR values. Thirteen suspected hypermutated HERV-S71 sequences were aligned with the consensus sequence, and their hypermutation status was investigated using the Hypermut2 program from the Los Alamos National Laboratory. The vertical axis shows the number of mutations in each sequence against the consensus sequence. The horizontal axis shows a brief description of each of the HERV-S71 sequences (chromosomal location described by the RepeatMasker tag). Each sequence description has chromosome number, strand, starting position, and end position, each separated by a period. The bars represent total numbers of G-to-A mutations as well as those within different sequence contexts.

Fig 7

Fig 7

Alignment of a portion of the flanking sequences of 13 hypermutated HERV-S71 elements. The darker the color is at each position, the higher the similarity is between sequences. Black color indicates 100% identity. This image was generated using the Geneious software.

Fig 8

Fig 8

Analysis of the hypermutation footprint of hA3G and hA3F on a suspected LTR25 sequence from the ERV-1 family. Twenty-two LTR25 sequences were aligned using MAFT, and a consensus sequence was generated to compare against the potentially hypermutated LTR25 sequence, using the Hypermut2 program from the Los Alamos National Laboratory. The vertical axis shows the number of mutations against the consensus sequence. The bars represent the total number of G-to-A changes as well as changes within different sequence contexts.

Fig 9

Fig 9

Diagnostic ratio plots of the negative strand for ERV-1 (A) and ERV-L (B) families. Each point in this plot is a sequence belonging to ERV-1 (red circles in panel A), ERV-L (red circles in panel B), normal HIV-1 (green plus signs), HIV-1 hypermutated by hA3G (purple plus signs), HIV-1 hypermutated by hA3F (blue plus signs), and HERV-S71 of ERV-1 family (yellow circles in panel A). The black arrow in panel A points to LTR12B and LTR12D sequence cluster.

References

    1. Kazazian HH., Jr 2004. Mobile elements: drivers of genome evolution. Science 303:1626–1632 - PubMed
    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, et al. 2001. Initial sequencing and analysis of the human genome. Nature 409:860–921 - PubMed
    1. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562 - PubMed
    1. Cullen BR. 2006. Role and mechanism of action of the APOBEC3 family of antiretroviral resistance factors. J. Virol. 80:1067–1076 - PMC - PubMed
    1. Belshaw R, Pereira V, Katzourakis A, Talbot G, Paces J, Burt A, Tristem M. 2004. Long-term reinfection of the human genome by endogenous retroviruses. Proc. Natl. Acad. Sci. U. S. A. 101:4894–4899 - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources