Replication-associated strand asymmetries in mammalian genomes: toward detection of replication origins - PubMed (original) (raw)

Comparative Study

. 2005 Jul 12;102(28):9836-41.

doi: 10.1073/pnas.0500577102. Epub 2005 Jun 28.

Affiliations

Comparative Study

Replication-associated strand asymmetries in mammalian genomes: toward detection of replication origins

Marie Touchon et al. Proc Natl Acad Sci U S A. 2005.

Abstract

In the course of evolution, mutations do not affect both strands of genomic DNA equally. This imbalance mainly results from asymmetric DNA mutation and repair processes associated with replication and transcription. In prokaryotes, prevalence of G over C and T over A is frequently observed in the leading strand. The sign of the resulting TA and GC skews changes abruptly when crossing replication-origin and termination sites, producing characteristic step-like transitions. In mammals, transcription-coupled skews have been detected, but so far, no bias has been associated with replication. Here, analysis of intergenic and transcribed regions flanking experimentally identified human replication origins and the corresponding mouse and dog homologous regions demonstrates the existence of compositional strand asymmetries associated with replication. Multiscale analysis of human genome skew profiles reveals numerous transitions that allow us to identify a set of 1,000 putative replication initiation zones. Around these putative origins, the skew profile displays a characteristic jagged pattern also observed in mouse and dog genomes. We therefore propose that in mammalian cells, replication termination sites are randomly distributed between adjacent origins. Taken together, these analyses constitute a step toward genome-wide studies of replication mechanisms.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

TA and GC skew profiles around experimentally determined human replication origins. (a) The skew profiles were determined in 1-kbp windows in regions surrounding (±100 kbp without repeats) experimentally determined human replication origins (see Data and Methods). (Upper) TA and GC cumulated skew profiles Σ_TA_ (thick line) and Σ_GC_ (thin line). (Lower) Skew S calculated in the same regions. The Δ_S_ amplitude associated with these origins, calculated as the difference of the skews measured in 20-kbp windows on both sides of the origins, are: MCM4 (31%), HSPA4 (29%), TOP1 (18%), MYC (14%), SCA7 (38%), and AR (14%). (b) Cumulated skew profiles calculated in the six regions of the mouse genome homologous to the human regions analyzed in a.(c) Cumulated skew profiles in the six regions of the dog genome homologous to human regions analyzed in a. The abscissa (x) represents the distance (in kilobase pairs) of a sequence window to the corresponding origin; the ordinate represents the values of S given in percent. Red, (+) genes (coding strand identical to the Watson strand); blue, (–) genes (coding strand opposite to the Watson strand); black, intergenic regions. In c, genes are not represented.

Fig. 2.

Fig. 2.

Skew S in regions situated on both sides of human replication origins. The mean values of S were calculated in intergenic regions and in intronic regions situated 5′ (Left) and 3′ (Right) of the six origins analyzed in Fig. 1_a_. Colors are as in Fig. 1; mean values are in percent ± SEM.

Fig. 3.

Fig. 3.

Histograms of the |Δ_S_| amplitudes of the jumps in the S profile. Using the wavelet transform, a set of 5,101 discontinuities was detected (2,415 upward jumps and 2,686 downward jumps; see Data and Methods). The |Δ_S_| amplitude was calculated as in Fig. 1_a_. (a) |Δ_S_| distributions of the jumps presenting G + C < 42%, corresponding to 1,647 upward jumps and 1,755 downward jumps; the threshold |Δ_S_| ≥ 12.5% (vertical line) corresponded to 1,012 upward jumps that were retained as putative replication origins and to 211 downward jumps (r_ = 0.21). (b) |Δ_S_| distributions of the jumps presenting G + C > 42%, with |Δ_S| ≥ 12.5% corresponding to 528 upward jumps and 280 downward jumps (r = 0.53). The G+C content was measured in the 100-kbp window surrounding the jump position. Upward jumps are shown in black, and downward jumps are shown with dots. The abscissa represents the values of the |Δ_S_| amplitudes calculated in percent.

Fig. 4.

Fig. 4.

Mean skew profile of intergenic regions around putative replication origins. The skew S was calculated in 1-kbp windows (Watson strand) around the position (±300 kbp without repeats) of the 1,012 upward jumps (Fig. 3); 5′ and 3′ transcript extremities were extended by 0.5 and 2 kbp, respectively (filled circles), or by 10 kbp at both ends (stars) (see Data and Methods). The abscissa represents the distance (in kilobase pairs) to the corresponding origin; the ordinate represents the skews calculated for the windows situated in intergenic regions (mean values for all discontinuities and for 10 consecutive 1-kbp window positions). The skews are given in percent (vertical bars, SEM). The lines correspond to linear fits of the values of the skew (stars) for x <–100 kbp and _x_ > 100 kbp.

Fig. 5.

Fig. 5.

S profiles along mammalian genome fragments. (a) Fragment of chromosome 20 including the TOP1 origin (red vertical line). (b and c) Chromosome 4 and chromosome 9 fragments, respectively, with low G+C content (36%). (d) Chromosome 22 fragment with larger G+C content (48%). In a and b, vertical lines correspond to selected putative origins; yellow lines are linear fits of the S values between successive putative origins. Black, intergenic regions; red, (+) genes; blue, (–) genes. Note the fully intergenic regions upstream of TOP1 in a and from positions 5,290–6,850 kbp in c.(e) Fragment of mouse chromosome 4 homologous to the human fragment shown in c.(f) Fragment of dog chromosome 5 syntenic to the human fragment shown in c. In e and f, genes are not represented.

Fig. 6.

Fig. 6.

Model of replication termination. Schematic representation of the skew profiles associated with three replication origins _O_1, _O_2, and _O_3; we suppose that these replication origins are adjacent, bidirectional origins with similar replication efficiency. The abscissae represent the sequence positions; the ordinates represent the S values (arbitrary units). Upward (or downward) steps correspond to origin (or termination) positions. For convenience, the termination sites are symmetric relative to _O_2.(Left) Three different termination positions _T_i, _T_j, and _T_k, leading to elementary skew profiles _S_i, _S_j, and _S_k.(Center) Superposition of these three profiles. (Right) Superposition of a large number of elementary profiles leading to the final factory-roof pattern. In the simple model, termination occurs with equal probability on both sides of the origins, leading to the linear profile (thick line). In the alternative model, replication termination is more likely to occur at lower rates close to the origins, leading to a flattening of the profile (gray line).

Similar articles

Cited by

References

    1. Freeman, J. M., Plasterer, T. N., Smith, T. F. & Mohr, S. C. (1998) Science 279, 1827–1830.
    1. Beletskii, A., Grigoriev, A., Joyce, S. & Bhagwat, A. S. (2000) J. Mol. Biol. 300, 1057–1065. - PubMed
    1. Francino, M. P. & Ochman, H. (2001) Mol. Biol. Evol. 18, 1147–1150. - PubMed
    1. Green, P., Ewing, B., Miller, W., Thomas, P. J. & Green, E. D. (2003) Nat. Genet. 33, 514–517. - PubMed
    1. Touchon, M., Nicolay, S., Arneodo, A., d'Aubenton-Carafa, Y. & Thermes, C. (2003) FEBS Lett. 555, 579–582. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources