Landscape of somatic retrotransposition in human cancers - PubMed (original) (raw)

. 2012 Aug 24;337(6097):967-71.

doi: 10.1126/science.1222077. Epub 2012 Jun 28.

Rebecca Iskow, Lixing Yang, Omer Gokcumen, Psalm Haseley, Lovelace J Luquette 3rd, Jens G Lohr, Christopher C Harris, Li Ding, Richard K Wilson, David A Wheeler, Richard A Gibbs, Raju Kucherlapati, Charles Lee, Peter V Kharchenko, Peter J Park; Cancer Genome Atlas Research Network

Affiliations

Landscape of somatic retrotransposition in human cancers

Eunjung Lee et al. Science. 2012.

Abstract

Transposable elements (TEs) are abundant in the human genome, and some are capable of generating new insertions through RNA intermediates. In cancer, the disruption of cellular mechanisms that normally suppress TE activity may facilitate mutagenic retrotranspositions. We performed single-nucleotide resolution analysis of TE insertions in 43 high-coverage whole-genome sequencing data sets from five cancer types. We identified 194 high-confidence somatic TE insertions, as well as thousands of polymorphic TE insertions in matched normal genomes. Somatic insertions were present in epithelial tumors but not in blood or brain cancers. Somatic L1 insertions tend to occur in genes that are commonly mutated in cancer, disrupt the expression of the target genes, and are biased toward regions of cancer-specific DNA hypomethylation, highlighting their potential impact in tumorigenesis.

PubMed Disclaimer

Figures

Fig. 1

Fig. 1

(A) To detect somatic insertions of TEs, paired-end sequencing data from tumor and matched normal samples are aligned to both the reference genome and a custom repeat assembly of canonical and divergent TE sequences. Two types of supporting reads are identified: (i) repeat-anchored mate (RAM) reads, in which one of the paired-end reads is mapped to a unique location in the genome, whereas the other is associated with a TE (reads 1 to 4), and (ii) clipped reads, which span the TE insertion breakpoints and show partial alignment to the reference or the repeat assembly (reads 5 to 8). The distances between the clipping positions and the clipped sequences are used to infer the insertion mechanism. For instance, duplicated sequences at the insertion site (TSD) and the poly-A tail of the inserted TE are characteristics of an endonuclease-mediated target-primed retrotransposition. (B) Example: a validated somatic L1 insertion in the 3′ UTR of GPATCH2 in colorectal cancer (CR3518). The top chart displays two clusters of RAM reads (green) whose mate pairs (not shown) are associated with L1 repeat sequences. Clipped (partially aligned) reads spanning the insertion breakpoint are shown underneath, with each nucleotide in a different color (nucleotides matching the reference are not shown). The consecutive red bases to the right of the insertion come from the poly-T tail of the inserted L1 in the negative orientation. The separation of clipped read positions between the strands reveals a 19-bp TSD (bottom). No RAMs or clipped reads are observed in the matched normal (blood) sample.

Fig. 2

Fig. 2

(A) Frequency of high-confidence somatic L1 insertions varies across 5 colorectal, 7 prostate, 8 ovarian, 7 multiple myeloma, and 16 glioblastoma tumors. Three epithelial cancers (colorectal, prostate, and ovarian) show frequent somatic L1 insertions, whereas no insertions are observed in the blood and brain cancers. One colorectal tumor (CR3518) contains 102 L1 insertions, increasing the average somatic event frequency for colorectal tumors from 9 to 28 when this sample is included. (B) The genes affected by somatic TE insertions are significantly enriched for genes with high mutation rates as estimated from the exome sequencing data of 228 additional colorectal tumors (P < 1 × 10−15). The mutation frequency of each gene was adjusted for its total exon size. (Inset) The top 15 genes with nonsilent mutations. (C) The transcript levels of 45 genes with somatic TE insertions in colorectal tumors were compared with those from 28 normal colorectal tissues, and the expression fold changes are shown. Overall, the genes with a TE insertion were significantly down-regulated in tumors (P = 6.3 × 10−4, background distribution based on randomly sampled gene sets). KCNIP1 appears twice because of two somatic insertions in two different samples. The dashed line marks 50% reduction in expression.

Fig. 3

Fig. 3

(A) Most of the identified insertions do not contain a full L1 sequence (6 kbp) but are truncated at the 5′ end. The parts of the L1 sequence found within the identified somatic and germline insertions are illustrated as a coverage plot. (B) A positive distance between the clipping positions of clipped reads with negative-and positive-strand mapping (Fig. 1B) corresponds to the length of the duplicated sequence at a TSD, whereas a negative or zero distance corresponds to a microdeletion or lack of duplication at the insertion site. The major TSD peak at ~15 bp is characteristic of an endonuclease-dependent L1 retrotransposition. Sequence analysis around the insertion breakpoints revealed the 5′-TTTT/A-3′ (where the slash indicates the insertion breakpoint) motif, consistent with a canonical sequence for L1-endonuclease target sites (31). The insertions belonging to the minor TSD peak (0 to 2 bp) did not show a significant sequence motif.

Fig. 4

Fig. 4

(A) Somatic L1 insertions are biased toward hypomethylated regions in cancer cells. Colon cancer regions were assessed in independent samples. (B) A model of TE insertion preferences and subsequent selection process that bias genomic distribution of TE insertions is illustrated. Somatic insertions are strongly biased toward cancer-specific DNA hypomethylation regions (red box) and encounter selection that depletes them from transcriptionally active genes unless such insertions promote tumorigenesis. By contrast, germline insertions are biased toward germline-specific DNA hypomethylation domains (blue box) and are depleted from all genes.

Similar articles

Cited by

References

    1. Kidd JM, et al. Cell. 2010;143:837. - PMC - PubMed
    1. Maksakova IA, Mager DL, Reiss D. Cell Mol Life Sci. 2008;65:3329. - PMC - PubMed
    1. Slotkin RK, Martienssen R. Nat Rev Genet. 2007;8:272. - PubMed
    1. Yang N, Kazazian HH., Jr Nat Struct Mol Biol. 2006;13:763. - PubMed
    1. Prak ETL, Kazazian HH., Jr Nat Rev Genet. 2000;1:134. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources