Improved alignment of nucleosome DNA sequences using a mixture model - PubMed (original) (raw)

Improved alignment of nucleosome DNA sequences using a mixture model

Ji-Ping Z Wang et al. Nucleic Acids Res. 2005.

Abstract

DNA sequences that are present in nucleosomes have a preferential approximately 10 bp periodicity of certain dinucleotide signals, but the overall sequence similarity of the nucleosomal DNA is weak, and traditional multiple sequence alignment tools fail to yield meaningful alignments. We develop a mixture model that characterizes the known dinucleotide periodicity probabilistically to improve the alignment of nucleosomal DNAs. We assume that a periodic dinucleotide signal of any type emits according to a probability distribution around a series of 'hot spots' that are equally spaced along nucleosomal DNA with 10 bp period, but with a 1 bp phase shift across the middle of the nucleosome. We model the three statistically most significant dinucleotide signals, AA/TT, GC and TA, simultaneously, while allowing phase shifts between the signals. The alignment is obtained by maximizing the likelihood of both Watson and Crick strands simultaneously. The resulting alignment of 177 chicken nucleosomal DNA sequences revealed that all 10 distinct dinucleotides are periodic, however, with only two distinct phases and varying intensity. By Fourier analysis, we show that our new alignment has enhanced periodicity and sequence identity compared with center alignment. The significance of the nucleosomal DNA sequence alignment is evaluated by comparing it with that obtained using the same model on non-nucleosomal sequences.

PubMed Disclaimer

Figures

Figure 1

Figure 1

A diagram of nucleosomal DNA sequence alignment. The positions along a nucleosome are indexed as 1, 2, …, 147 from the 5′ end to the 3′ end. The alignment shift δi is defined as the signed distance from the first nucleotide of sequence Si to the first position of the nucleosome core. Aligning the nucleosomal DNA sequences in a set S = {Si : i = 1, …, n} is equivalent to determining the shift parameter δi for each i. A position x in an unaligned sequence Si corresponds to the position x − δi with reference to the nucleosome position.

Figure 2

Figure 2

The mixture model captures ‘hot spots’ while allowing variability. This model hypothesizes that there are a series of hot spots in the nucleosome core region for a particular dinucleotide signal of interest. The probability of observing a dinucleotide signal of this type decays with distance from the hot spot.

Figure 3

Figure 3

Palindromic symmetry and alignment constraint. For a pair of Watson and Crick strands _S_1 and _S_2 of length L, we require that the alignment shift parameters δ1, δ2 satisfy the constraint L − δ1 − δ2 = 147. Palindromic symmetry is imposed by demanding that the shifts for each strand of a given sequence be optimized simultaneously, subject to this constraint.

Figure 4

Figure 4

Plots of dinucleotide frequency averaged over a 3 bp window for alignments under strict 10 bp periodicity with initial setting u1 = u2 = u3 = (1, 11, …, 141) (A) and the adjusted setting with u1 = u3 = (8, …, 68, 79, 89, …, 139) and u2 = (3, 13, …, 73, 74, 84, …, 144) (B).

Figure 5

Figure 5

Frequency plot of TT and AA signals in the alignment presented in Figure 4B.

Figure 6

Figure 6

Comparison of dinucleotide frequency plots from the mixture alignment using AA/TT, GC and TA signals.

Figure 7

Figure 7

Base composition plot in core region of mixture alignment. (AD) Frequencies of T, A, C and G, respectively, in the aligned sequences, plotted as a function of position along the length of the nucleosome. (E and F) Frequencies of pyrimidines (T+C) or purines (A+G), respectively.

Figure 8

Figure 8

Mixture model alignment compared to center alignment, for chicken nucleosome sequences and randomly chosen chicken genomic sequences. Results for AA/TT signals are shown (A, C, randomly chosen genomic sequences under center and mixture model alignments, respectively; B, D real nucleosome sequences under center and mixture model alignments, respectively); other signals yield comparable results with those shown here for AA/TT.

Similar articles

Cited by

References

    1. Satchwell S., Drew H., Travers A. Sequence periodicities in chicken nucleosome core DNA. J. Mol. Biol. 1986;191:659–675. - PubMed
    1. Widom J. Short-range order in two eukaryotic genomes: relation to chromosome structure. J. Mol. Biol. 1996;259:579–588. - PubMed
    1. van Holde K.E. Chromatin. New York: Springer-Verlag; 1989.
    1. Richmond T.J., Davey C.A. The structure of DNA in the nucleosome core. Nature. 2003;423:145–150. - PubMed
    1. Widom J. A relationship between the helical twist of DNA and the ordered positioning of nucleosomes in all eukaryotic cells. Proc. Natl Acad. Sci. USA. 1992;89:1095–1099. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources