Retroviral Integration Process in the Human Genome: Is It Really Non-Random? A New Statistical Approach (original) (raw)
Related papers
PLoS Computational Biology, 2011
Integration of retroviral vectors in the human genome follows non random patterns that favor insertional deregulation of gene expression and may cause risks of insertional mutagenesis when used in clinical gene therapy. Understanding how viral vectors integrate into the human genome is a key issue in predicting these risks. We provide a new statistical method to compare retroviral integration patterns. We identified the positions where vectors derived from the Human Immunodeficiency Virus (HIV) and the Moloney Murine Leukemia Virus (MLV) show different integration behaviors in human hematopoietic progenitor cells. Non-parametric density estimation was used to identify candidate comparative hotspots, which were then tested and ranked. We found 100 significative comparative hotspots, distributed throughout the chromosomes. HIV hotspots were wider and contained more genes than MLV ones. A Gene Ontology analysis of HIV targets showed enrichment of genes involved in antigen processing and presentation, reflecting the high HIV integration frequency observed at the MHC locus on chromosome 6. Four histone modifications/variants had a different mean density in comparative hotspots (H2AZ, H3K4me1, H3K4me3, H3K9me1), while gene expression within the comparative hotspots did not differ from background. These findings suggest the existence of epigenetic or nuclear three-dimensional topology contexts guiding retroviral integration to specific chromosome areas.
Vectors and Integration in Gene Therapy: Statistical Considerations
J Comput Sci Syst Biol, 2009
Background Gene therapy is a form of molecular medicine which treats genetic diseases by replacing a defective gene, responsible for the pathology, with a functional one. The basic principle is to introduce a piece of genetic material into cells via a virus which represents the vector for gene therapy. The virus integrates with the cell DNA and thus delivers the genetic material into the cell nucleus. This process is called integration and may alter the host cell's DNA. Recent studies based on cellular and animal models (Bushman:2005) reported empirical evidence of preference for certain retroviral vectors, i.e. those deriving from Moloney Murine Leukemia Virus (MLV), to integrate near the start of transcriptional units, whereas others (like Simian Immunodefi
PLOS One, 2009
Gamma-retroviruses and lentiviruses integrate non-randomly in mammalian genomes, with specific preferences for active chromatin, promoters and regulatory regions. Gene transfer vectors derived from gamma-retroviruses target at high frequency genes involved in the control of growth, development and differentiation of the target cell, and may induce insertional tumors or pre-neoplastic clonal expansions in patients treated by gene therapy. The gene expression program of the target cell is apparently instrumental in directing gamma-retroviral integration, although the molecular basis of this phenomenon is poorly understood. We report a bioinformatic analysis of the distribution of transcription factor binding sites (TFBSs) flanking .4,000 integrated proviruses in human hematopoietic and non-hematopoietic cells. We show that gamma-retroviral, but not lentiviral vectors, integrate in genomic regions enriched in cell-type specific subsets of TFBSs, independently from their relative position with respect to genes and transcription start sites. Analysis of sequences flanking the integration sites of Moloney leukemia virus (MLV)-and human immunodeficiency virus (HIV)-derived vectors carrying mutations in their long terminal repeats (LTRs), and of HIV vectors packaged with an MLV integrase, indicates that the MLV integrase and LTR enhancer are the viral determinants of the selection of TFBS-rich regions in the genome. This study identifies TFBSs as differential genomic determinants of retroviral target site selection in the human genome, and suggests that transcription factors binding the LTR enhancer may synergize with the integrase in tethering retroviral pre-integration complexes to transcriptionally active regulatory regions. Our data indicate that gamma-retroviruses and lentiviruses have evolved dramatically different strategies to interact with the host cell chromatin, and predict a higher risk in using gammaretroviral vs. lentiviral vectors for human gene therapy applications.
PLoS ONE, 2011
Vectors based on c-retroviruses or lentiviruses have been shown to stably express therapeutical transgenes and effectively cure different hematological diseases. Molecular follow up of the insertional repertoire of gene corrected cells in patients and preclinical animal models revealed different integration preferences in the host genome including clusters of integrations in small genomic areas (CIS; common integrations sites). In the majority, these CIS were found in or near genes, with the potential to influence the clonal fate of the affected cell. To determine whether the observed degree of clustering is statistically compatible with an assumed standard model of spatial distribution of integrants, we have developed various methods and computer programs for c-retroviral and lentiviral integration site distribution. In particular, we have devised and implemented mathematical and statistical approaches for comparing two experimental samples with different numbers of integration sites with respect to the propensity to form CIS as well as for the analysis of coincidences of integration sites obtained from different blood compartments. The programs and statistical tools described here are available as workspaces in R code and allow the fast detection of excessive clustering of integration sites from any retrovirally transduced sample and thus contribute to the assessment of potential treatment-related risks in preclinical and clinical retroviral gene therapy studies.
Genome-wide analysis of retroviral DNA integration
Nature Reviews Microbiology, 2005
| Retroviral vectors are often used to introduce therapeutic sequences into patients' cells. In recent years, gene therapy with retroviral vectors has had impressive therapeutic successes, but has also resulted in three cases of leukaemia caused by insertional mutagenesis, which has focused attention on the molecular determinants of retroviral-integration target-site selection. Here, we review retroviral DNA integration, with emphasis on recent genome-wide studies of targeting and on the status of efforts to modulate target-site selection. 848 | NOVEMBER 2005 | VOLUME 3 www.nature.com/reviews/micro R E V I E W S © 2005 Nature Publishing Group NATURE REVIEWS | MICROBIOLOGY VOLUME 3 | NOVEMBER 2005 | 849
Chromatin Landscapes of Retroviral and Transposon Integration Profiles
PLoS Genetics, 2014
The ability of retroviruses and transposons to insert their genetic material into host DNA makes them widely used tools in molecular biology, cancer research and gene therapy. However, these systems have biases that may strongly affect research outcomes. To address this issue, we generated very large datasets consisting of *120000 to *180000 unselected integrations in the mouse genome for the Sleeping Beauty (SB) and piggyBac (PB) transposons, and the Mouse Mammary Tumor Virus (MMTV). We analyzed *80 (epi)genomic features to generate bias maps at both local and genome-wide scales. MMTV showed a remarkably uniform distribution of integrations across the genome. More distinct preferences were observed for the two transposons, with PB showing remarkable resemblance to bias profiles of the Murine Leukemia Virus. Furthermore, we present a model where target site selection is directed at multiple scales. At a large scale, target site selection is similar across systems, and defined by domain-oriented features, namely expression of proximal genes, proximity to CpG islands and to genic features, chromatin compaction and replication timing. Notable differences between the systems are mainly observed at smaller scales, and are directed by a diverse range of features. To study the effect of these biases on integration sites occupied under selective pressure, we turned to insertional mutagenesis (IM) screens. In IM screens, putative cancer genes are identified by finding frequently targeted genomic regions, or Common Integration Sites (CISs). Within three recently completed IM screens, we identified 7%-33% putative false positive CISs, which are likely not the result of the oncogenic selection process. Moreover, results indicate that PB, compared to SB, is more suited to tag oncogenes.
Deciphering the Code for Retroviral Integration Target Site Selection
2010
Upon cell invasion, retroviruses generate a DNA copy of their RNA genome and integrate retroviral cDNA within host chromosomal DNA. Integration occurs throughout the host cell genome, but target site selection is not random. Each subgroup of retrovirus is distinguished from the others by attraction to particular features on chromosomes. Despite extensive efforts to identify host factors that interact with retrovirion components or chromosome features predictive of integration, little is known about how integration sites are selected. We attempted to identify markers predictive of retroviral integration by exploiting Precision-Recall methods for extracting information from highly skewed datasets to derive robust and discriminating measures of association. ChIPSeq datasets for more than 60 factors were compared with 14 retroviral integration datasets. When compared with MLV, PERV or XMRV integration sites, strong association was observed with STAT1, acetylation of H3 and H4 at several positions, and methylation of H2AZ, H3K4, and K9. By combining peaks from ChIPSeq datasets, a supermarker was identified that localized within 2 kB of 75% of MLV proviruses and detected differences in integration preferences among different cell types. The supermarker predicted the likelihood of integration within specific chromosomal regions in a cell-type specific manner, yielding probabilities for integration into protooncogene LMO2 identical to experimentally determined values. The supermarker thus identifies chromosomal features highly favored for retroviral integration, provides clues to the mechanism by which retrovirus integration sites are selected, and offers a tool for predicting cell-type specific proto-oncogene activation by retroviruses.
Journal of Virology, 2007
Retroviral integration into the host genome is not entirely random, and integration site preferences vary among different retroviruses. Human immunodeficiency virus (HIV) prefers to integrate within active genes, whereas murine leukemia virus (MLV) prefers to integrate near transcription start sites and CpG islands. On the other hand, integration of avian sarcoma-leukosis virus (ASLV) shows little preference either for genes, transcription start sites, or CpG islands. While host cellular factors play important roles in target site selection, the viral integrase is probably the major viral determinant. It is reasonable to hypothesize that retroviruses with similar integrases have similar preferences for target site selection. Although integration profiles are well defined for members of the lentivirus, spumaretrovirus, alpharetrovirus, and gammaretrovirus genera, no members of the deltaretroviruses, for example, human T-cell leukemia virus type 1 (HTLV-1), have been evaluated. We hav...
PLoS Computational Biology, 2005
Retroviral insertional mutagenesis screens, which identify genes involved in tumor development in mice, have yielded a substantial number of retroviral integration sites, and this number is expected to grow substantially due to the introduction of high-throughput screening techniques. The data of various retroviral insertional mutagenesis screens are compiled in the publicly available Retroviral Tagged Cancer Gene Database (RTCGD). Integrally analyzing these screens for the presence of common insertion sites (CISs, i.e., regions in the genome that have been hit by viral insertions in multiple independent tumors significantly more than expected by chance) requires an approach that corrects for the increased probability of finding false CISs as the amount of available data increases. Moreover, significance estimates of CISs should be established taking into account both the noise, arising from the random nature of the insertion process, as well as the bias, stemming from preferential insertion sites present in the genome and the data retrieval methodology. We introduce a framework, the kernel convolution (KC) framework, to find CISs in a noisy and biased environment using a predefined significance level while controlling the family-wise error (FWE) (the probability of detecting false CISs). Where previous methods use one, two, or three predetermined fixed scales, our method is capable of operating at any biologically relevant scale. This creates the possibility to analyze the CISs in a scale space by varying the width of the CISs, providing new insights in the behavior of CISs across multiple scales. Our method also features the possibility of including models for background bias. Using simulated data, we evaluate the KC framework using three kernel functions, the Gaussian, triangular, and rectangular kernel function. We applied the Gaussian KC to the data from the combined set of screens in the RTCGD and found that 53% of the CISs do not reach the significance threshold in this combined setting. Still, with the FWE under control, application of our method resulted in the discovery of eight novel CISs, which each have a probability less than 5% of being false detections. Citation: de Ridder J, Uren A, Kool J, Reinders M, Wessels L (2006) Detecting statistically significant common insertion sites in retroviral insertional mutagenesis screens. PLoS Comput Biol 2(12): e166.
Retroviral Integration Sites Correlate with Expressed Genes in Hematopoietic Stem Cells
Stem Cells, 2005
In this study, we analyzed whether retroviral integration sites in repopulating hematopoietic cells correlate with genes expressed in fractions enriched in hematopoietic stem cells (HSCs). We have previously described microarray studies of two populations enriched in HSCs: CD34 + /CD38 − and the slow dividing fraction of CD34 + /CD38 − cells (SDF). Furthermore, we demonstrated that oncoretroviral integrations in severe combined immunodeficient repopulating cells are preferentially located near the transcription start. Here, we have identified 117 corresponding cDNA clones on our micro-array representing genes with retroviral integration sites. These genes revealed a higher mean signal intensity in comparison with either all genes on the array or a subset of control genes with retroviral integrations in HeLa cells. Furthermore, these genes demonstrated a higher expression in CD34 + / CD38 − cells and SDF. The association of gene expression and retrovirally targeted genes observed here will help to elucidate the molecular characteristics of primitive repopulating hematopoietic cells. Stem Cells 2005;23:1050-1058