Distinct frequency-distributions of homopolymeric DNA tracts in different genomes (original) (raw)

Journal Article

,

Search for other works by this author on:

,

Search for other works by this author on:

,

1

CAOS/CAMM Center, University of Nijmegen

,

Toernooiveld 1, 6525 ED Nijmegen, The Netherlands

Search for other works by this author on:

1

CAOS/CAMM Center, University of Nijmegen

,

Toernooiveld 1, 6525 ED Nijmegen, The Netherlands

*To whom correspondence should be addressed. Tel: +31 24 3652248; Fax:

+31 24 3652977

; Email: jackl@caos.kun.nl

Search for other works by this author on:

+

Present address: N. V. Organon, Target Discovery Unit, Room RH1204, PO Box 20, 5340 BH Oss, The Netherlands

Author Notes

Published:

01 September 1998

Cite

Koen J. Dechering, Ruud N. H. Konings, Koen Cuelenaere, Jack A. M. Leunissen, Distinct frequency-distributions of homopolymeric DNA tracts in different genomes, Nucleic Acids Research, Volume 26, Issue 17, 1 September 1998, Pages 4056–4062, https://doi.org/10.1093/nar/26.17.4056
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

The unusual base composition of the genome of the human malaria parasite Plasmodium falciparum prompted us to systematically investigate the occurrence of homopolymeric DNA tracts in the P.falciparum genome and, for comparison, in the genomes of Homo sapiens, Saccharomyces cerevisiae, Caenorhabditis elegans, Arabidopsis thaliana, Escherichia coli and Mycobacterium tuberculosis. Comparison of the observed frequencies with the frequencies as expected for random DNA revealed that homopolymeric (dA:dT) tracts occur well above chance in the eukaryotic genome. In the majority of these genomes, (dA:dT) tract overrepresentation proved to be an exponential function of the tract length. (dG:dC) tract overrepresentation was absent or less pronounced in both prokaryotic and eukaryotic genomes. On the basis of our results, we propose that homopolymeric (dA:dT) tracts are expanded via replication slippage. This slippage-mediated expansion does not operate on tracts with lengths below a critical threshold of 7–10 bp.

Introduction

The past decade has seen the initiation of a number of genome sequencing projects for organisms that are of interest as a model system or as a pathogen. An example of the latter category is the protozoan parasite Plasmodium falciparum, which is the main cause of malaria in man and responsible for two million deaths annually. With the aim of the development of new drugs or a vaccine, the biology of the parasite has been the subject of intensive study. One of the unique features of the parasite is the extraordinary base composition of its genome, first revealed by the sequencing of genes and intergenic regions, and confirmed later by data generated by the P.falciparum genome project. The overall A/T content of the parasite's genome is 81%, and can reach levels as high as 90% in non-coding regions (1). Plasmodium falciparum possesses the G/C poorest genome known so far (2).

Visual inspection of P.falciparum intergenic sequences reveals the extensive occurrence of long homopolymeric (dA:dT) stretches. These stretches are of interest as they have unique structural and functional properties. Crystallographic data have shown that homopolymeric (dA:dT) tracts adapt a rigid structure which is characterized by a high level of propeller twist and an increased base stacking. This allows the formation of additional non-Watson-Crick, bifurcated hydrogen bonds (3). Phasing of short (dA:dT) tracts within the helical repeat of normal _B_-DNA results in a macroscopic curvature of the DNA (4,5). It has been proposed that all general-sequence _B_-DNA gently writhes, with the net effect of all local bends being a straight helix. Introduction of a straight (dA:dT) tract distorts the array of compensating writhes and results in a curvature of the DNA (6). This curvature can play a role in the modulation of the transcriptional activity of genes (7), and enhance the affinity of the DNA for transcription factors such as the TATA-binding protein (8). Alternatively, (dA:dT) can modulate the access of transcription factors to the DNA via a local distortion of a nucleosome (9). In yeast, (dA:dT) tracts are functional promoter elements (10). Their effects are mediated by a modulation of the nucleosomal occupancy of the DNA rather than by the direct recruitment of _trans_-acting factors (11). Finally, homopolymeric (dA:dT) stretches are part of scaffold associated regions (SARs) that are supposed to anchor the chromatin loops in the nucleus (12,13). The SARs are also the place of residence of topoisomerase II, which controls the topology of DNA during replication, recombination and transcription. It has been proposed that the curvature induced by the homopolymeric (dA:dT) tracts in the SAR defines the sequence characteristics preferred by topoisomerase II (14,15).

Homopolymeric DNA tracts, or more generally, simple repetitive sequences, can give rise to slippage of the polymerase during replication. The internally repetitive DNA sequences allow the nascent strand to slip back or forward on the parental strand with one or more repeat units, resulting in an expansion or contraction of the new DNA strand (16). It has been proposed that slipped strand replication is a major force in the evolution of genes (17,18) and genomes (19) and it is supposed to be implicated in a large number of human genetic diseases (20,21). In addition to replication slippage, processes like unequal crossing over, mutation and selection affect the persistence of simple sequences. The distribution of simple sequences in the genome thus reflects an equilibrium between various mutational and selective forces (22). It is generally believed, however, that the initial variations in sequence lengths are provoked by replication slippage, and that this process provides the raw material upon which the other mechanisms act (16,23,24).

The functional and structural significance of homopolymeric (dA:dT) stretches together with initial observations that they are enriched in the genome of P.falciparum prompted us to investigate their occurrence in a systematic way. For comparison, we analyzed the occurrence of homopolymeric (dG:dC) stretches in the P.falciparum genome, and the occurrence of (dA:dT) and (dG:dC) tracts in the genomes of six other species (Homo sapiens, Saccharomyces cerevisiae, Caenorhabditis elegans, Arabidopsis thaliana, Escherichia coli and Mycobacterium tuberculosis), which are widely ranged across the evolutionary spectrum.

Materials and Methods

Analysis of P.falciparum genomic sequences was performed on 17 contigs (#7289, #7290, #7292, #7294, #7296, #7297, #7299, #7300, #7302, #7316, #7327, #7355, #7404, #7455, #7535, #7623, #7651) obtained from the Institute for Genomic Research (ftp://ftp.tigr.org), and compiled from the sequence data of chromosome 2 of strain 3D7 of P.falciparum. For the analysis of the human genome, 10 contigs (BK992D9, DJ121G13, DJ211D12, DJ30P20, DJ431A14, DJ106H8, DJ170A21, DJ272J12, DJ389A20, DJ79C4) from sequence data of chromosomes 1, 6, 20, 22 and X were obtained from the Sanger Center (ftp://ftp.sanger.ac.uk). Sequences of C.elegans and A.thaliana were obtained from the EMBL nucleotide library (C.elegans: CEY105C5, CEY106G6; A.Thaliana: ATFCA0; ATFCA1; ATFCA2; ATFCA3; ATFCA4; ATFCA5; ATFCA6; ATFCA7; ATFCA8). Analysis of the S.cerevisiae, E.coli K12 and M.tuberculosis H37Rv genomes was on the completed genomic sequences obtained from ftp://genome-ftp.stanford.edu; ftp://ftp.genetics.wisc.edu; and ftp://ftp.sanger.ac.uk, respectively.

To investigate the occurrence of homopolymeric (dA:dT) stretches in different P.falciparum genome regions, sequences with well-defined features with respect to the organization into intron, exon and flanking sequences were selected from the EMBL database, release 52. Sequences encoding structural RNA or originating from plastids or mitochondria were omitted. A total number of 241 unique sequences were selected with a total length of 608 kb. Using a small Tcl script, the sequences were dissected in gene-flanking, coding and intron according to the tables of features accompanying the sequences.

The basic characteristics of all sequences analyzed are presented in Table 1. Sequence analysis was performed on a Silicon Graphics Challenge running Irix 5.3 using the GCG package (25) version 8.1. The expected frequency of finding a homopolymeric non-overlapping (dA:dT) tract of length N in either orientation was calculated assuming a zero-order Markov chain as described previously (26):

Note that this equation calculates the frequency of a non-overlapping (dA:dT) tract and takes into account the frequencies of the two adjacent nucleotides.

To assess whether the overrepresentation of homopolymeric (dA:dT) tracts is due to compositional inhomogeneities in the genome, regression analysis was carried out between the local A/T content of the contigs and the frequency of occurrence of overrepresented homopolymeric (dA:dT) tracts. To this end, the abundance of homopolymeric (dA:dT) tracts ≥10 bp and the A/T content were determined in a window of 1000 bp that was shifted along the sequence with a 950 bp interval. Regression analysis was performed using the regression module of Microsoft Excel 97.

Results

The P.falciparum genome is enriched for short (dG:dC) tracts and long (dA:dT) tracts

The malaria genome project, which was established in 1996 and ultimately aims at sequencing all the 2.5 × 107 nt of the genome, is in full progress and has already provided a wealth of sequence information (27). As chromosome 2 was the first chromosome for which a complete contig map was established in yeast artificial chromosomes (28), most progress has been made in sequencing this chromosome. To date, this has resulted in 21 807 individual sequence reads that can be assembled into 17 contigs that cover 967 kb of chromosome 2. As the estimated size of chromosome 2 is 1.03 Mb (28), the contigs encompass 94% of the chromosome and can be considered representative for its DNA sequence. The overall A/T content of the chromosome 2 contigs is 80% (Table 1), which corresponds well to the numbers that have been reported previously for the P.falciparum genome (29,30). We determined the numbers of non-overlapping homopolymeric (dA:dT) and (dG:dC) tracts present in either orientation in the chromosome 2 contigs. Table 2 shows the numbers of tracts for 2 ≤ N ≤ 10. The results show that all (dG:dC) tracts appear at higher frequencies than is expected on basis of a random distribution of nucleotides. However, (dG:dC) tracts longer than 9 nt are not observed. (dA:dT) tracts of 2 or 3 nt are slightly overrepresented whilst tracts of 4 nt show a minor underrepresentation. (dA:dT) tracts with lengths of 5–9 nt are overrepresented, but to a lesser extent than (dG:dC) tracts of similar lengths. Standard χ2 analysis revealed that in all cases the deviations of the observed frequencies from the expected frequencies are statistically significant at P < 0.001 (not shown).

Whereas (dA:dT) tracts <10 nt appear at frequencies that are close to expectation in the _P.falciparum_ genome, the occurrence of tracts >10 bp deviates strongly from expectation. Figure 1A shows the relative frequency of occurrence as a function of the tract length for the non-overlapping homopolymeric DNA tracts found in the chromosome 2 contigs. Interestingly, the frequency distribution of (dA:dT) tracts >12 bp can be fit by a single semi-logarithmic function that exhibits a far greater dependence on N than the function that describes the expected frequencies. (dA:dT) tracts with lengths of up to 47 bp are observed, which would be very unlikely to occur at a random distribution of nucleotides. The contribution of homopolymeric (dA:dT) tracts to the genome is considerable, as the sum of the lengths of all stretches >7 bp is 44 506 bp, which accounts for nearly 5% of the sequences encompassed by the chromosome 2 contigs. In conclusion, the P.falciparum genome as represented by the chromosome 2 contigs is highly enriched for short (dG:dC) tracts and for long (dA:dT) tracts.

We assessed whether the observed overrepresentation of homopolymeric (dA:dT) tracts is caused by compositional inhomogeneities in the P.falciparum genome. To this end, we determined the base composition and frequency of occurrence of homopolymeric (dA:dT) tracts of ≥10 bp in windows of 1000 bp. Regression analysis revealed a very weak correlation between the A/T content in the window and the abundance of homopolymeric tracts (Table 1). Thus, the abundance of homopolymeric tracts is independent of local inhomogeneities in the genome.

Table 1

Summary and basic characteristics of the sequence data analyzed in this study

Table 2

Occurrence of homopolymeric tracts in P.falciparum chromosome 2

Enrichment for (dA:dT) tracts is restricted to non-coding DNA

As coding and non-coding regions are subject to different functional constraints, it is likely that the occurrence of homopolymeric DNA tracts will vary between these regions. Therefore, we analyzed the occurrence of homopolymeric stretches in the different regions of the P.falciparum genome. 241 sequences were selected from the EMBL database and the number of tracts in the coding, intron and gene-flanking regions of the P.falciparum genome was scored. Table 1 summarizes the basic features of the data we have analyzed. The coding regions have an A/T content of 71% whereas the A/T content reaches 81% in the gene-flanking regions and 87% in the introns.

Figure 1B–D shows the length dependent occurrence of homopolymeric stretches in the different genome regions. Short (dG:dC) tracts show a minor overrepresentation in the introns while higher levels of overrepresentation are seen in the gene-flanking and coding sequences. (dG:dC) runs >9 bp are absent. In accordance with the analysis of the chromosome 2 contigs, short (dA:dT) tracts of N <10 bp appear at frequencies close to expectation in all regions. However, dramatic differences between the genomic regions become apparent for (dA:dT) tracts >10 bp. Whereas the coding regions are limitedly enriched for (dA:dT) tracts >10 nt, these tracts appear at frequencies well above chance in the non-coding regions. In the latter regions, the frequency distributions of (dA:dT) tracts show a characteristic bipartite pattern very similar to that observed for the chromosome 2 contigs. These results indicate that the enrichment for long (dA:dT) tracts as seen in the analysis of the chromosome 2 contigs can be largely attributed to the gene-flanking and intron sequences. To support this notion, we plotted the regions occupied by homopolymeric tracts together with a prediction of open reading frames along a 20 000 bp region of chromosome 2 (Fig. 2). In this representation it can be seen that long homopolymeric (dA:dT) tracts and open reading frames indeed appear in a mutually exclusive pattern.

Figure 1

Frequency distributions of homopolymeric tracts in the P.falciparum genome. The figure shows the frequency distributions of homopolymeric runs as observed in chromosomal (A), gene-flanking (B), intron (C) and protein encoding DNA (D) of P.falciparum.

Figure 2

Homopolymeric (dA:dT) tracts cluster in the non-coding regions of the P.falciparum genome. The figure shows the distribution of homopolymeric (dA:dT) tracts (vertical bars) along a representative 20 kb region of chromosome 2, together with a prediction of open reading frames (open boxes). Arrows indicate the directions of the open reading frames.

(dA:dT) tract enrichment is a general eukaryotic phenomenon

At a random distribution of nucleotides, homopolymeric (dA:dT) stretches would occur relatively frequently in an A/T rich genome whereas (dG:dC) tracts may be virtually absent. In the P.falciparum genome, for instance, (dA:dT) tracts of 8 bp are expected to occur once every 2000 nt whereas (dG:dC) tracts of similar length are expected to occur only once every 5.6 × 107 nt (Table 2). It is conceivable that the overrepresentation of (dA:dT) tracts in the P.falciparum genome is provoked by the intrinsic high frequency of randomly occurring tracts, which may serve as the substrate for slippage-mediated expansion. If such a process would operate with equal efficiency on A/T rich and G/C rich DNA, it would lead to (dA:dT) tract enrichment in an A/T rich genome, and (dG:dC) tract enrichment in a G/C rich genome. In this view, in a genome with an A/T to G/C ratio of 1, homopolymeric (dG:dC) tract overrepresentation should equal (dA:dT) tract overrepresentation. To address this hypothesis, we analyzed the occurrence of homopolymeric tracts in several genomes with varying A/T contents.

In contrast to the prediction, in none of the genomes analyzed is a high overrepresentation of (dG:dC) tracts observed (Fig. 3). Instead, the occurrence of (dG:dC) tracts is close to expectation (S.cerevisiae, A.thaliana), shows an underrepresentation (E.coli, M.tuberculosis) or a relatively minor overrepresentation (P.falciparum, H.sapiens, C.elegans). These results show that a higher G/C content does not lead to a dramatic increase in the overrepresentation of (dG:dC) tracts, and might indicate that replication slippage operates less efficiently on G/C rich DNA.

Figure 3

Length dependent occurrence of homopolymeric tracts in different eukaryotic and prokaryotic genomes.

The analysis of the various genomes furthermore shows that the patterns of occurrence of (dA:dT) tracts are clearly distinct between prokaryotes and eukaryotes (Fig. 3). In the two prokaryotes we have analyzed, (dA:dT) tracts appear at frequencies close to expectation. In eukaryotes, poly(dA:dT) tracts are generally overrepresented following a characteristic bipartite pattern. The frequency distribution of (dA:dT) tracts in the genomes of P.falciparum, H.sapiens, S.cerevisiae and A.thaliana can be fitted by two exponential functions that break in the 8–12 bp region. A strikingly divergent pattern of (dA:dT) tract overrepresentation is provided by C.elegans. In this organism, the curve that fits the observed frequencies shows a slight bulge in the 8–10 bp region, but then continues parallel to the curve that represents the distribution of the expected frequencies. Furthermore, (dA:dT) tracts >14 bp were not observed in the C.elegans genome whereas in all other eukaryotes tracts reach lengths of over 25 bp.

For all eukaryotic genomes, regression analysis between the local A/T content and the density of overrepresented homopolymeric (dA:dT) tracts revealed that these are not correlated (Table 1). This indicates that the overrepresentation of homopolymeric (dA:dT) tracts in the genomes of higher eukaryotes is not due to the presence of A/T rich compartments, or isochores.

Overrepresentation of (dA:dT) tracts is an exponential function of the tract length

In contrast to the situation in C.elegans, where tracts >10 bp appear at frequencies that are at a steady 15-fold above expectation, the overrepresentation of (dA:dT) tracts in the genomes of the other eukaryotes is an exponential function of the tract length. This can best be seen in Figure 4, where the ratio of the observed to the expected frequency is plotted against the tract length. In the genomes of P.falciparum, H.sapiens, S.cerevisiae and A.thaliana, longer tracts are more strongly overrepresented then shorter tracts. The overrepresentation of tracts >10 bp can be fitted by a single exponential function that depends on the tract length. Interestingly, these functions are very similar for the different genomes, suggesting a shared mechanism for the accumulation and maintenance of (dA:dT) tracts.

Figure 4

Overrepresentation of (dA:dT) tracts is an exponential function of the tract length in the genomes of P.falciparum, H.sapiens, S.cerevisiae and A.thaliana, but not in the C.elegans genome. Overrepresentation of (dA:dT) tracts, expressed as the ratio between the observed and the expected frequency of occurrence, is plotted against the tract length.

Discussion

The data presented here show that the eukaryotic genome is enriched for homopolymeric (dA:dT) tracts. With the exception of C.elegans, the occurrence of (dA:dT) tracts shows a bipartite pattern that can be described by two exponential functions. First, short tracts of 2 ≤N ≤ 7 occur at frequencies that are close to the predicted values. Second, tracts of N >10 show a length dependent overrepresentation and can reach lengths of >30 nt that are up to 1012-fold overrepresented. By contrast, (dG:dC) tracts are not or only weakly overrepresented. In the few instances that enrichment for (dG:dC) tracts is seen, the overrepresentation cannot be described by a simple exponential function (not shown) and never exceeds 104-fold over chance. The detailed analysis of the P.falciparum genome shows that overrepresentation of (dA:dT) tracts is largely restricted to non-coding DNA.

Slipped strand mispairing during replication rather than unequal crossing-over is seen as the major force in the generation of length variation of simple sequence repeats (16,23), and this process can also account for the variation in length of homopolymeric runs, which are the simplest forms of simple sequence repeats (19,31). The driving force in the expansion of homopolymeric tracts might originate from a biased action of the slippage process or from specific retention of expanded tracts (16). In either case, the process has a self-accelerating component, as expanded tracts increase the likelihood for additional slippage events, which in turn lead to additional expansion.

Two observations from the data presented here stand out. First, the overrepresentation of (dA:dT) tracts >10 bp in the genomes of P.falciparum, H.sapiens, S.cerevisiae and A.thaliana is an exponential function of the tract length. Such a distribution is consistent with models in which replication slippage is responsible for the expansion of homopolymeric DNA tracts (22,32). Interestingly, tracts <7 bp appear at frequencies close to expectation, indicating that they are immune to slippage mediated expansion. This suggests that there is a critical threshold that determines whether a homopolymeric tract can be subjected to slippage-mediated expansion. A length <7 bp is below the threshold for expansion. By contrast, the lengths of (dA:dT) tracts >10 bp are above the threshold and these tracts accumulate in the genome as a result of slippage-mediated expansion. A threshold value identical to that observed here can be determined from the data presented in a study of the length-dependent occurrence of homopolymeric (dA:dT) tracts in the Dictyostelium discoideum genome (26), and from data on the length dependent occurrence of DNA repeats in a variety of genomes (33). We conclude, therefore, that irrespective of the organism and of the nature of the repeat element, the minimum length requirement for a simple sequence repeat to undergo expansion by replication slippage is 7–10 bp.

Our data indicate that for the majority of the eukaryotic genomes, the expansion of (dA:dT) tracts >10 bp can be described by a single exponential function, which is very similar for the different genomes. A striking exception is provided by C.elegans where the frequency distribution of (dA:dT) tracts is clearly distinct from that of the other eukaryotes. Although the curve that fits the length dependent occurrence of (dA:dT) tracts in the C.elegans genome does change its slope slightly in the 8–10 bp region, it differs drastically from the curves seen for the other eukaryotes. The reason for this is unclear. The overall level of sequence simplicity in C.elegans is similar to that seen in other eukaryotes (19), indicating that the mechanisms responsible for the generation and maintenance of simple sequence repeats operate with comparable efficiency in C.elegans. Therefore, the distinct pattern of (dA:dT) tract overrepresentation most probably results from distinct selective forces. The nature of these forces remains unresolved.

A second important observation made here is that the frequency distribution of (dG:dC) tracts is very different between the various eukaryotic genomes. Some genomes are enriched for (dG:dC) tracts, whereas other genomes exhibit an underrepresentation. The A/T rich P.falciparum genome is enriched for (dG:dC) stretches ≤9 bp. As the lengths of the vast majority of these stretches are below the threshold for slippage, expansion of (dG:dC) tracts by replication slippage is precluded, and the overrepresentation of short (dG:dC) tracts most probably has evolved by other mechanisms. The genomes of the other eukaryotes studied here are more G/C rich, and will, by chance, have higher densities of (dG:dC) tracts. These stretches provide the substrate for slippage-induced expansion. Yet, overrepresentation of (dG:dC) tracts is absent or, in cases where it is observed, does not reach the high level seen for (dA:dT) tracts. This indicates that (dG:dC) tracts are less efficiently expanded by slipped strand replication. This might not be surprising: slippage during replication requires the local melting of a DNA duplex, and the greater stability of (dG:dC) duplexes in comparison to (dA:dT) duplexes might prevent slippage of polymeric (dG:dC) tracts. Accordingly, it has been shown that the efficiency of in vitro slippage synthesis of simple sequence DNA using short primers is dependent on the A/T content of the primers. Whereas A/T rich primers mediate slippage synthesis at a high rate, primers consisting purely of G/C nucleotides poorly generate simple sequence repeats (34). Thus, the low enrichment for (dG:dC) tracts in the eukaryotic genome most likely indicates that they are less efficiently expanded by slippage-like events.

Superimposed on the results of slipped strand replication are the actions of unequal crossing-over, mutation, gene conversion and selection that all act on the persistence of simple sequence DNA (22). As coding regions are subject to strong selection, the ways in which slippage-derived sequences accumulate in them are more restricted than they are in non-coding regions (19). Accordingly, we observed that homopolymeric tracts are less strongly overrepresented in the P.falciparum coding than in the non-coding regions. Furthermore, overrepresentation of homopolymeric tracts is absent in prokaryotes. This is consistent with the view that prokaryotes possess a streamlined genome, which allows for rapid replication and cell division (16,35). It is not clear whether the homopolymeric (dA:dT) tracts in the non-coding regions of the eukaryotic genome are also under selective pressure or represent junk DNA. Such junk, or ‘selfish’ DNA, is evolutionary neutral and does not affect the phenotype of its host (36). As selfish DNA is not under control of selective forces, it can only accumulate by virtue of a biased action of the replication machinery or by an ability to self-replicate as, for instance, is seen for transposons. Replication slippage itself might not be biased towards the generation of duplications. In experimental contexts, simple sequence repeats subjected to slippage events are unstable and acquire deletions rather than insertions (37–39). Thus, selective rather than stochastic principles might underlie the overrepresentation of homopolymeric runs. Selective advantages of homopolymeric (dA:dT) runs might originate from their structure forming abilities. Since one of the canonical features of the structure-forming ability of a DNA sequence is its length-dependence (40), any selective advantage given by the structure-forming ability of a (dA:dT) tract should be reflected in a length-dependent enrichment (33). Our results show that in the P.falciparum, H.sapiens, S.cerevisiae and A.thaliana genomes, tracts >10 nt show an overrepresentation that is an exponential function of the tract length. Longer tracts are up to a billion-fold overrepresented whereas short tracts only show a minor overrepresentation. This strongly suggests that the structure-forming abilities of (dA:dT) tracts offer selective advantages that lead to their overrepresentation. The precise nature of this selective advantage remains unclear but might originate from a functional role of (dA:dT) tracts in the modulation of transcription and/or in the organization of the chromatin structure (9,11,14,41).

All functional roles of homopolymeric (dA:dT) tracts reported in the literature relate to the structural organization of chromatin in the nucleus. Would that explain their overrepresentation in eukaryotes, or are they just byproducts of DNA metabolism in eukaryotic cells and represent selfish DNA? Regardless of a possible functional role, their presence will have an impact on the structure and organization of the DNA. Our analysis shows that (dA:dT) tracts make up a considerable 5% of the P.falciparum genome. In this organism, (dA:dT) tracts have been implicated in intrachromosomal recombination events that contribute to antigenic variation (42). Although the exact nature of the principles that lead to an overrepresentation of (dA:dT) tracts may be hard to resolve, having long homopolymeric stretches in the genome obviously has important biological consequences.

Acknowledgements

The authors wish to thank Harm Nijveen for discussion and computer programming. Nicolette Lubsen and Henk Stunnenberg are gratefully acknowledged for critical review of the manuscript.

References

1

,

Gene

,

1987

, vol.

61

(pg.

177

-

187

)

2

,

Gene

,

1995

, vol.

152

(pg.

127

-

132

)

3

,

Nature

,

1987

, vol.

330

(pg.

221

-

226

)

4

,

Proc. Natl Acad. Sci. USA

,

1982

, vol.

79

(pg.

7664

-

7668

)

5

,

Nature

,

1986

, vol.

320

(pg.

501

-

506

)

6

,

J. Mol. Biol.

,

1994

, vol.

239

(pg.

79

-

96

)

7

,

Mol. Cell. Biol.

,

1996

, vol.

16

(pg.

2119

-

2127

)

8

,

Nature

,

1995

, vol.

373

(pg.

724

-

727

)

9

,

Cell

,

1996

, vol.

87

(pg.

459

-

470

)

10

,

Proc. Natl Acad. Sci. USA

,

1985

, vol.

82

(pg.

8419

-

8423

)

11

,

EMBO J.

,

1995

, vol.

14

(pg.

2570

-

2579

)

12

,

J. Mol. Biol.

,

1989

, vol.

210

(pg.

587

-

599

)

13

,

Cell

,

1986

, vol.

46

(pg.

521

-

530

)

14

,

EMBO J.

,

1992

, vol.

11

(pg.

705

-

716

)

15

,

Nucleic Acids Res.

,

1997

, vol.

25

(pg.

2041

-

2046

)

16

,

Mol. Biol. Evol.

,

1987

, vol.

4

(pg.

203

-

221

)

17

,

EMBO J.

,

1989

, vol.

8

(pg.

1517

-

1525

)

18

,

Nucleic Acids Res

,

1993

, vol.

21

(pg.

2823

-

2830

)

19

,

J. Mol. Evol.

,

1995

, vol.

41

(pg.

1038

-

1047

)

20

,

Nature Genet.

,

1994

, vol.

6

(pg.

114

-

116

)

21

,

Bioessays

,

1994

, vol.

16

(pg.

277

-

284

)

22

,

Genetics

,

1987

, vol.

115

(pg.

553

-

567

)

23

,

Nature

,

1986

, vol.

322

(pg.

652

-

656

)

24

,

Bioessays

,

1996

, vol.

18

(pg.

421

-

425

)

25

,

Nucleic Acids Res.

,

1984

, vol.

12

(pg.

387

-

395

)

26

,

J. Biomol. Struct. Dynamics

,

1993

, vol.

11

(pg.

57

-

66

)

27

et al. ,

Mol. Biochem. Parasitol.

,

1996

, vol.

79

(pg.

1

-

12

)

28

,

Nature

,

1993

, vol.

361

(pg.

654

-

657

)

29

,

Science

,

1984

, vol.

225

(pg.

808

-

811

)

30

,

Nucleic Acids Res.

,

1982

, vol.

10

(pg.

539

-

546

)

31

,

J. Mol. Evol.

,

1994

, vol.

38

(pg.

637

-

641

)

32

,

Genetics

,

1992

, vol.

131

(pg.

471

-

478

)

33

,

Proc. Natl Acad. Sci. USA

,

1997

, vol.

94

(pg.

5237

-

5242

)

34

,

Nucleic Acids Res.

,

1992

, vol.

20

(pg.

211

-

215

)

35

,

Science

,

1997

, vol.

277

(pg.

1453

-

1462

)

36

,

Nature

,

1980

, vol.

284

(pg.

601

-

603

)

37

,

Mol. Cell. Biol.

,

1992

, vol.

12

(pg.

2749

-

2757

)

38

,

Mol. Cell. Biol.

,

1995

, vol.

15

(pg.

5607

-

5617

)

39

,

Hum. Mol. Genet.

,

1994

, vol.

3

(pg.

253

-

256

)

40

,

Topology and Physics of Circular DNA

,

1992

Boca Raton, FL

CRC Press

41

,

Bioessays

,

1995

, vol.

17

(pg.

759

-

766

)

42

,

Mol. Cell. Biol.

,

1990

, vol.

10

(pg.

3243

-

3246

)

43

,

Exp. Parasitol.

,

1988

, vol.

66

(pg.

143

-

170

)

44

,

The Evolution of Genome Size

,

1985

New York

John Wiley

(pg.

69

-

103

)

45

,

Science

,

1995

, vol.

270

(pg.

410

-

414

)

46

,

Science

,

1995

, vol.

270

(pg.

480

-

483

)

This paper is dedicated to the late Ruud Konings

Author notes

Present address: N. V. Organon, Target Discovery Unit, Room RH1204, PO Box 20, 5340 BH Oss, The Netherlands

© 1998 Oxford University Press

I agree to the terms and conditions. You must accept the terms and conditions.

Submit a comment

Name

Affiliations

Comment title

Comment

You have entered an invalid code

Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.

Citations

Views

Altmetric

Metrics

Total Views 2,730

2,321 Pageviews

409 PDF Downloads

Since 2/1/2017

Month: Total Views:
February 2017 12
March 2017 9
April 2017 3
May 2017 5
June 2017 2
July 2017 3
August 2017 6
September 2017 1
October 2017 6
November 2017 8
December 2017 17
January 2018 11
February 2018 14
March 2018 17
April 2018 33
May 2018 40
June 2018 38
July 2018 33
August 2018 25
September 2018 16
October 2018 14
November 2018 18
December 2018 32
January 2019 32
February 2019 21
March 2019 32
April 2019 136
May 2019 40
June 2019 30
July 2019 22
August 2019 29
September 2019 23
October 2019 31
November 2019 32
December 2019 31
January 2020 34
February 2020 40
March 2020 25
April 2020 41
May 2020 34
June 2020 42
July 2020 25
August 2020 21
September 2020 33
October 2020 36
November 2020 53
December 2020 35
January 2021 35
February 2021 30
March 2021 50
April 2021 32
May 2021 54
June 2021 18
July 2021 40
August 2021 51
September 2021 50
October 2021 34
November 2021 53
December 2021 46
January 2022 47
February 2022 48
March 2022 51
April 2022 51
May 2022 53
June 2022 58
July 2022 33
August 2022 38
September 2022 37
October 2022 38
November 2022 29
December 2022 16
January 2023 40
February 2023 32
March 2023 22
April 2023 30
May 2023 25
June 2023 6
July 2023 16
August 2023 21
September 2023 19
October 2023 20
November 2023 15
December 2023 24
January 2024 21
February 2024 42
March 2024 34
April 2024 31
May 2024 17
June 2024 12
July 2024 26
August 2024 19
September 2024 14
October 2024 11

Citations

77 Web of Science

×

Email alerts

Citing articles via

More from Oxford Academic