Highly prevalent putative quadruplex sequence motifs in human DNA (original) (raw)

Bioinformatics analyses and in vitro evidence for five and six stacked G-quadruplex forming sequences

Biochimie, 2018

Quadruplexes are noncanonical DNA structures that arise in guanine rich loci and have important biological functions. Classically, quadruplexes contain four stacked intramolecular G-tetrads. Surprisingly, although some algorithms allow searching for longer than 4G tracts for quadruplex formation, these have not yet been systematically studied. Therefore, we analyzed the human genome for sequences that are predicted to adopt stacked intramolecular G-tetrads with greater than four stacks. The data provide evidence for numerous G-quadruplexes that contain five or six stacked intramolecular G-tetrads. These sequences are predominantly found in known gene regulatory regions. Electrophoretic mobility assays and circular dichroism spectroscopy indicate that these sequences form quadruplex structures in vitro under physiological conditions. The localization and in vitro stability of these G-quadruplexes indicate their potentially important roles in gene regulation and their potential for th...

The disruptive positions in human G-quadruplex motifs are less polymorphic and more conserved than their neutral counterparts

Nucleic Acids Research, 2009

Specific guanine-rich sequence motifs in the human genome have considerable potential to form fourstranded structures known as G-quadruplexes or G4 DNA. The enrichment of these motifs in key chromosomal regions has suggested a functional role for the G-quadruplex structure in genomic regulation. In this work, we have examined the spectrum of nucleotide substitutions in G4 motifs, and related this spectrum to G4 prevalence. Data collected from the large repository of human SNPs indicates that the core feature of G-quadruplex motifs, 5'-GGG-3', exhibits specific mutational patterns that preserve the potential for G4 formation. In particular, we find a genome-wide pattern in which sites that disrupt the guanine triplets are more conserved and less polymorphic than their neutral counterparts. This also holds when considering non-CpG sites only. However, the low level of polymorphisms in guanine tracts is not only confined to G4 motifs. A complete mapping of DNA three-mers at guanine polymorphisms indicated that short guanine tracts are the most under-represented sequence context at polymorphic sites. Furthermore, we provide evidence for a strand bias upstream of human genes. Here, a significantly lower rate of G4-disruptive SNPs on the non-template strand supports a higher relative influence of G4 formation on this strand during transcription.

G-Quadruplexes Involving Both Strands of Genomic DNA Are Highly Abundant and Colocalize with Functional Sites in the Human Genome

PLOS ONE, 2016

The G-quadruplex is a non-canonical DNA structure biologically significant in DNA replication, transcription and telomere stability. To date, only G4s with all guanines originating from the same strand of DNA have been considered in the context of the human nuclear genome. Here, I discuss interstrand topological configurations of G-quadruplex DNA, consisting of guanines from both strands of genomic DNA; an algorithm is presented for predicting such structures. I have identified over 550,000 non-overlapping interstrand Gquadruplex forming sequences in the human genome-significantly more than intrastrand configurations. Functional analysis of interstrand G-quadruplex sites shows strong association with transcription initiation, the results are consistent with the XPB and XPD transcriptional helicases binding only to G-quadruplex DNA with interstrand topology. Interstrand quadruplexes are also enriched in origin of replication sites. Several topology classes of interstrand quadruplex-forming sequences are possible, and different topologies are enriched in different types of structural elements. The list of interstrand quadruplex forming sequences, and the computer program used for their prediction are available at the web address http://moment.utmb.edu/allquads.

Duplex stem-loop-containing quadruplex motifs in the human genome: a combined genomic and structural study

Nucleic acids research, 2015

Duplex stem-loops and four-stranded G-quadruplexes have been implicated in (patho)biological processes. Overlap of stem-loop- and quadruplex-forming sequences could give rise to quadruplex-duplex hybrids (QDH), which combine features of both structural forms and could exhibit unique properties. Here, we present a combined genomic and structural study of stem-loop-containing quadruplex sequences (SLQS) in the human genome. Based on a maximum loop length of 20 nt, our survey identified 80 307 SLQS, embedded within 60 172 unique clusters. Our analysis suggested that these should cover close to half of total SLQS in the entire genome. Among these, 48 508 SLQS were strand-specifically located in genic/promoter regions, with the majority of genes displaying a low number of SLQS. Notably, genes containing abundant SLQS clusters were strongly associated with brain tissues. Enrichment analysis of SLQS-positive genes and mapping of SLQS onto transcriptional/mutagenesis hotspots and cancer-ass...

Identification of G-quadruplex clusters by high-throughput sequencing of whole-genome amplified products with a G-quadruplex ligand

Scientific reports, 2018

G-quadruplex (G4) is a DNA secondary structure that has been found to play regulatory roles in the genome. The identification of G4-forming sequences is important to study the specific structure-function relationships of such regions. In the present study, we developed a method for identification of G4 clusters on genomic DNA by high-throughput sequencing of genomic DNA amplified via whole-genome amplification (WGA) in the presence of a G4 ligand. The G4 ligand specifically bound to G4 structures on genomic DNA; thus, DNA polymerase was arrested on the G4 structures stabilised by G4 ligand. We utilised the telomestatin derivative L1H1-7OTD as a G4 ligand and demonstrated that the efficiency of amplification of the G4 cluster regions was lower than that of the non-G4-forming regions. By high-throughput sequencing of the WGA products, 9,651 G4 clusters were identified on human genomic DNA. Among these clusters, 3,766 G4 clusters contained at least one transcriptional start site, sugge...

Bioinformatics approaches to quadruplex sequence location

Methods, 2007

Guanine quadruplex structures are potentially useful therapeutic targets. There have been several studies attempting to locate genomic sequences which are capable of forming these structures. Since the number of potential quadruplex forming sequences which have been identified is so high, several different strategies have been applied to try and determine which of these sequences may be physiologically relevant and which sequences are most likely to form quadruplex structures. These are based on the limited structural information that is currently available and comparative analyses of the location of these sequences with respect to different genomic regions. Sequence information alone is not enough to identify regions of nucleic acid which participate in quadruplex structures, however it is the starting point for quadruplex structure discovery when complemented with further experimentation.

Existence and consequences of G-quadruplex structures in DNA

While the discovery of B-form DNA 60 years ago has defined our molecular view of the genetic code, other postulated DNA secondary structures, such as A-DNA, Z-DNA, H-DNA, cruciform and slipped structures have provoked consideration of DNA as a more dynamic structure. Four-stranded G-quadruplex DNA does not use Watson-Crick base pairing and has been subject of considerable speculation and investigation during the past decade, particularly with regard to its potential relevance to genome integrity and gene expression. Here, we discuss recent data that collectively support the formation of G-quadruplexes in genomic DNA and the consequences of formation of this structural motif in biological processes.

Quadruplex DNA: sequence, topology and structure

Nucleic Acids Research, 2006

G-quadruplexes are higher-order DNA and RNA structures formed from G-rich sequences that are built around tetrads of hydrogen-bonded guanine bases. Potential quadruplex sequences have been identified in G-rich eukaryotic telomeres, and more recently in non-telomeric genomic DNA, e.g. in nuclease-hypersensitive promoter regions. The natural role and biological validation of these structures is starting to be explored, and there is particular interest in them as targets for therapeutic intervention. This survey focuses on the folding and structural features on quadruplexes formed from telomeric and non-telomeric DNA sequences, and examines fundamental aspects of topology and the emerging relationships with sequence. Emphasis is placed on information from the high-resolution methods of X-ray crystallography and NMR, and their scope and current limitations are discussed. Such information, together with biological insights, will be important for the discovery of drugs targeting quadruplexes from particular genes.

Molecular models for intrastrand DNA G-quadruplexes

BMC Structural Biology, 2009

Background: Independent surveys of human gene promoter regions have demonstrated an overrepresentation of G 3 X n1 G3X n2 G 3 X n3 G 3 motifs which are known to be capable of forming intrastrand quadruple helix structures. In spite of the widely recognized importance of Gquadruplex structures in gene regulation and growing interest around this unusual DNA structure, there are at present only few such structures available in the Nucleic Acid Database. In the present work we generate by molecular modeling feasible G-quadruplex structures which may be useful for interpretation of experimental data.