Characterization of pre-insertion loci of de novo L1 insertions - PubMed (original) (raw)

Characterization of pre-insertion loci of de novo L1 insertions

Stephen L Gasior et al. Gene. 2007.

Abstract

The human Long Interspersed Element-1 (LINE-1) and the Short Interspersed Element (SINE) Alu comprise 28% of the human genome. They share the same L1-encoded endonuclease for insertion, which recognizes an A+T-rich sequence. Under a simple model of insertion distribution, this nucleotide preference would lead to the prediction that the populations of both elements would be biased towards A+T-rich regions. Genomic L1 elements do show an A+T-rich bias. In contrast, Alu is biased towards G+C-rich regions when compared to the genome average. Several analyses have demonstrated that relatively recent insertions of both elements show less G+C content bias relative to older elements. We have analyzed the repetitive element and G+C composition of more than 100 pre-insertion loci derived from de novo L1 insertions in cultured human cancer cells, which should represent an evolutionarily unbiased set of insertions. An A+T-rich bias is observed in the 50 bp flanking the endonuclease target site, consistent with the known target site for the L1 endonuclease. The L1, Alu, and G+C content of 20 kb of the de novo pre-insertion loci shows a different set of biases than that observed for fixed L1s in the human genome. In contrast to the insertion sites of genomic L1s, the de novo L1 pre-insertion loci are relatively L1-poor, Alu-rich and G+C neutral. Finally, a statistically significant cluster of de novo L1 insertions was localized in the vicinity of the c-myc gene. These results suggest that the initial insertion preference of L1, while A+T-rich in the initial vicinity of the break site, can be influenced by the broader content of the flanking genomic region and have implications for understanding the dynamics of L1 and Alu distributions in the human genome.

PubMed Disclaimer

Figures

Fig 1

Fig 1

Alu, L1, and G+C content of de novo HeLa L1 pre-insertion loci and random sites. The 10 kb flanks on each side of de novo L1 pre-insertion sites (closed squares) and simulated sites (open circles) were characterized for frequency containing A) %Alu content, B) %L1 content, C) %G+C content for 10 kb on either side of the endonuclease site/random site, and D) %G+C content for 25 bp on either side of the endonuclease site/random site. X-axis values are binned by 5.

Fig 2

Fig 2

Chromosome distribution of de novo L1 and SINE pre-insertion loci A) de novo L1 and SINE insertion sites were mapped to individual chromosomes according to the chromosome assignment of the highest contig BLAST hit and each chromosome is presented as a % of the total (white). This was compared to the percent chromosome content with corrections for the HeLa karyotype (black). The chromosome distributions of >1000 random insertion sites from a computer simulation with corrections for the HeLa karyotype are also shown (grey). B) Plot and curve fits of chromosome distributions of de novo L1 and SINE insertions (x-axis, as % of total) versus % chromosome content as calculated previously (squares’s, black line) and versus % ASV insertions (grey diamonds, grey line, right Y-axis. C) Plot and curve fit of chromosome distributions of de novo L1 and Alu insertions (x-axis, as % of total) versus a computer simulation of random sites in HeLa total (y- axis).

Fig 3

Fig 3

Insertions near the c-myc locus. A schematic of the c-myc locus with 5’ flanking pseudogene POU5F1P1 and 3’ flanking PVRT gene is presented. The locations of 4 de novo L1 insertions are marked with arrows above the genes pointing down. The locations into c-myc of a somatic L1 insertion/rearrangement from a breast cancer and the site of a canine L1 insertion shown with arrows pointing up.

References

    1. Amariglio EN, Hakim I, Brok-Simoni F, Grossman Z, Katzir N, Harmelin A, Ramot B, Rechavi G. Identity of rearranged LINE/c-MYC junction sequences specific for the canine transmissible venereal tumor. Proc Natl Acad Sci U S A. 1991;88:8136–8139. - PMC - PubMed
    1. Batzer MA, Deininger PL. Alu repeats and human genomic diversity. Nat Rev Genet. 2002;3:370–379. - PubMed
    1. Boissinot S, Chevret P, Furano AV. L1 (LINE-1) retrotransposon evolution and amplification in recent human history. Mol Biol Evol. 2000;17:915–928. - PubMed
    1. Boissinot S, Davis J, Entezam A, Petrov D, Furano AV. Fitness cost of LINE-1 (L1) activity in humans. Proc Natl Acad Sci U S A. 2006;103:9590–9594. - PMC - PubMed
    1. Boissinot S, Entezam A, Furano AV. Selection against deleterious LINE-1-containing loci in the human lineage. Mol Biol Evol. 2001;18:926–935. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources