Widespread and tissue-specific expression of endogenous retroelements in human somatic tissues (original) (raw)

Physiological and Pathological Transcriptional Activation of Endogenous Retroelements Assessed by RNA-Sequencing of B Lymphocytes

Frontiers in microbiology, 2017

In addition to evolutionarily-accrued sequence mutation or deletion, endogenous retroelements (EREs) in eukaryotic genomes are subject to epigenetic silencing, preventing or reducing their transcription, particularly in the germplasm. Nevertheless, transcriptional activation of EREs, including endogenous retroviruses (ERVs) and long interspersed nuclear elements (LINEs), is observed in somatic cells, variably upon cellular differentiation and frequently upon cellular transformation. ERE transcription is modulated during physiological and pathological immune cell activation, as well as in immune cell cancers. However, our understanding of the potential consequences of such modulation remains incomplete, partly due to the relative scarcity of information regarding genome-wide ERE transcriptional patterns in immune cells. Here, we describe a methodology that allows probing RNA-sequencing (RNA-seq) data for genome-wide expression of EREs in murine and human cells. Our analysis of B cell...

LTR retroelement expansion of the human cancer transcriptome and immunopeptidome revealed by de novo transcript assembly

Genome Research, 2019

Dysregulated endogenous retroelements (EREs) are increasingly implicated in the initiation, progression, and immune surveillance of human cancer. However, incomplete knowledge of ERE activity limits mechanistic studies. By using pan-cancer de novo transcript assembly, we uncover the extent and complexity of ERE transcription. The current assembly doubled the number of previously annotated transcripts overlapping with long-terminal repeat (LTR) elements, several thousand of which were expressed specifically in one or a few related cancer types. Exemplified in melanoma, LTR-overlapping transcripts were highly predictable, disease prognostic, and closely linked with molecularly defined subtypes. They further showed the potential to affect disease-relevant genes, as well as produce novel cancer-specific antigenic peptides. This extended view of LTR elements provides the framework for functional validation of affected genes and targets for cancer immunotherapy.

Genome-wide experimental identification and functional analysis of human specific retroelements

Cytogenetic and Genome Research, 2005

Retroelements (REs) actively reshape genomes through genomic rearrangements, creation of new genes and modulation of the regulatory machinery of existing genes, thus introducing genomic novelties which potentially may be subject to natural selection. Thousands of REs integrations, presumably distinguishing the human and chimpanzee genomes, might well be involved in modern humans speciation. In this self-review we describe our recent results on genome-wide identification of human specific RE integrations and their transcriptional activity obtained with three new experimental techniques (TGDA, DiffIR and SDDIR) developed by us for such studies. A new mechanism of formation of retroelements involving template switches during L1-mediated mRNA reverse transcription, which was discovered in this research, will be also described in the review.

Detecting endogenous retrovirus-driven tissue-specific gene transcription

Genome biology and evolution, 2015

Transposable elements (TEs) comprise approximately half of the human genome, and several independent lines of investigation have demonstrated their role in rewiring gene expression during development, evolution, and oncogenesis. The identification of their regulatory effects has largely been idiosyncratic, by linking activity with isolated genes. Their distribution throughout the genome raises critical questions - do these elements contribute to broad tissue-and lineage-specific regulation? If so, in what manner, as enhancers, promoters, RNAs? Here, we devise a novel approach to systematically dissect the genome-wide consequences of TE insertion on gene expression, and test the hypothesis that classes of endogenous retrovirus long terminal repeats (LTRs) exert tissue-specific regulation of adjacent genes. Using correlation of expression patterns across 18 tissue types, we reveal the tissue-specific uncoupling of gene expression due to 62 different LTR classes. These patterns are spe...

Telescope: Characterization of the retrotranscriptome by accurate estimation of transposable element expression

2018

Characterization of Human Endogenous Retrovirus (HERV) expression within the transcriptomic landscape using RNA-seq is complicated by uncertainty in fragment assignment because of sequence similarity. We present Telescope, a computational software tool that provides accurate estimation of transposable element expression (retrotranscriptome) resolved to specific genomic locations. Telescope directly addresses uncertainty in fragment assignment by reassigning ambiguously mapped fragments to the most probable source transcript as determined within a Bayesian statistical model. We demonstrate the utility of our approach through single locus analysis of HERV expression in 13 ENCODE cell types. When examined at this resolution, we find that the magnitude and breadth of the retrotranscriptome can be vastly different among cell types. Furthermore, our approach is robust to differences in sequencing technology, and demonstrates that the retrotranscriptome has potential to be used for cell ty...

In-Depth Transcriptome Analysis Reveals Novel TARs and Prevalent Antisense Transcription in Human Cell Lines

PLoS ONE, 2010

Several recent studies have indicated that transcription is pervasive in regions outside of protein coding genes and that short antisense transcripts can originate from the promoter and terminator regions of genes. Here we investigate transcription of fragments longer than 200 nucleotides, focusing on antisense transcription for known protein coding genes and intergenic transcription. We find that roughly 12% to 16% of all reads that originate from promoter and terminator regions, respectively, map antisense to the gene in question. Furthermore, we detect a high number of novel transcriptionally active regions (TARs) that are generally expressed at a lower level than protein coding genes. We find that the correlation between RNA-seq data and microarray data is dependent on the gene length, with longer genes showing a better correlation. We detect high antisense transcriptional activity from promoter, terminator and intron regions of proteincoding genes and identify a vast number of previously unidentified TARs, including putative novel EGFR transcripts. This shows that in-depth analysis of the transcriptome using RNA-seq is a valuable tool for understanding complex transcriptional events. Furthermore, the development of new algorithms for estimation of gene expression from RNA-seq data is necessary to minimize length bias.

Transcription factor profiling reveals molecular choreography and key regulators of human retrotransposon expression

Proceedings of the National Academy of Sciences of the United States of America, 2018

Transposable elements (TEs) represent a substantial fraction of many eukaryotic genomes, and transcriptional regulation of these factors is important to determine TE activities in human cells. However, due to the repetitive nature of TEs, identifying transcription factor (TF)-binding sites from ChIP-sequencing (ChIP-seq) datasets is challenging. Current algorithms are focused on subtle differences between TE copies and thus bias the analysis to relatively old and inactive TEs. Here we describe an approach termed "MapRRCon" (mapping repeat reads to a consensus) which allows us to identify proteins binding to TE DNA sequences by mapping ChIP-seq reads to the TE consensus sequence after whole-genome alignment. Although this method does not assign binding sites to individual insertions in the genome, it provides a landscape of interacting TFs by capturing factors that bind to TEs under various conditions. We applied this method to screen TFs' interaction with L1 in human c...

Comprehensive Analysis of Human Endogenous Retrovirus Transcriptional Activity in Human Tissues with a Retrovirus-Specific Microarray

Journal of Virology, 2005

Retrovirus-like sequences account for 8 to 9% of the human genome. Among these sequences, about 8,000 pol-containing proviral elements have been identified to date. As part of our ongoing search for active and possibly disease-relevant human endogenous retroviruses (HERVs), we have recently developed an oligonucleotide-based microarray. The assay allows for both the detection and the identification of most known retroviral reverse transcriptase (RT)-related nucleic acids in biological samples. In the present study, we have investigated the transcriptional activity of representative members of 20 HERV families in 19 different normal human tissues. Qualitative evaluation of chip hybridization signals and quantitative analysis by real-time RT-PCR revealed distinct HERV activity in the human tissues under investigation, suggesting that HERV elements are active in human cells in a tissue-specific manner. Most active members of HERV families were found in mRNA prepared from skin, thyroid gland, placenta, and tissues of reproductive organs. In contrast, only few active HERVs were detectable in muscle cells. Human tissues that lack HERV transcription could not be found, confirming that human endogenous retroviruses are permanent components of the human transcriptome. Distinct activity patterns may reflect the characteristics of the regulatory machinery in these cells, e.g., cell type-dependent occurrence of transcriptional regulatory factors.

Diversity through duplication: Whole‐genome sequencing reveals novel gene retrocopies in the human population

BioEssays, 2014

Gene retrocopies are generated by reverse transcription and genomic integration of mRNA. As such, retrocopies present an important exception to the central dogma of molecular biology, and have substantially impacted the functional landscape of the metazoan genome. While an estimated 8,000-17,000 retrocopies exist in the human genome reference sequence, the extent of variation between individuals in terms of retrocopy content has remained largely unexplored. Three recent studies by Abyzov et al., Ewing et al. and Schrider et al. have exploited 1,000 Genomes Project Consortium data, as well as other sources of whole-genome sequencing data, to uncover novel gene retrocopies. Here, we compare the methods and results of these three studies, highlight the impact of retrocopies in human diversity and genome evolution, and speculate on the potential for somatic gene retrocopies to impact cancer etiology and genetic diversity among individual neurons in the mammalian brain.