Improved alignment of nucleosome DNA sequences using a mixture model - PubMed (original) (raw)
Improved alignment of nucleosome DNA sequences using a mixture model
Ji-Ping Z Wang et al. Nucleic Acids Res. 2005.
Abstract
DNA sequences that are present in nucleosomes have a preferential approximately 10 bp periodicity of certain dinucleotide signals, but the overall sequence similarity of the nucleosomal DNA is weak, and traditional multiple sequence alignment tools fail to yield meaningful alignments. We develop a mixture model that characterizes the known dinucleotide periodicity probabilistically to improve the alignment of nucleosomal DNAs. We assume that a periodic dinucleotide signal of any type emits according to a probability distribution around a series of 'hot spots' that are equally spaced along nucleosomal DNA with 10 bp period, but with a 1 bp phase shift across the middle of the nucleosome. We model the three statistically most significant dinucleotide signals, AA/TT, GC and TA, simultaneously, while allowing phase shifts between the signals. The alignment is obtained by maximizing the likelihood of both Watson and Crick strands simultaneously. The resulting alignment of 177 chicken nucleosomal DNA sequences revealed that all 10 distinct dinucleotides are periodic, however, with only two distinct phases and varying intensity. By Fourier analysis, we show that our new alignment has enhanced periodicity and sequence identity compared with center alignment. The significance of the nucleosomal DNA sequence alignment is evaluated by comparing it with that obtained using the same model on non-nucleosomal sequences.
Figures
Figure 1
A diagram of nucleosomal DNA sequence alignment. The positions along a nucleosome are indexed as 1, 2, …, 147 from the 5′ end to the 3′ end. The alignment shift δi is defined as the signed distance from the first nucleotide of sequence Si to the first position of the nucleosome core. Aligning the nucleosomal DNA sequences in a set S = {Si : i = 1, …, n} is equivalent to determining the shift parameter δi for each i. A position x in an unaligned sequence Si corresponds to the position x − δi with reference to the nucleosome position.
Figure 2
The mixture model captures ‘hot spots’ while allowing variability. This model hypothesizes that there are a series of hot spots in the nucleosome core region for a particular dinucleotide signal of interest. The probability of observing a dinucleotide signal of this type decays with distance from the hot spot.
Figure 3
Palindromic symmetry and alignment constraint. For a pair of Watson and Crick strands _S_1 and _S_2 of length L, we require that the alignment shift parameters δ1, δ2 satisfy the constraint L − δ1 − δ2 = 147. Palindromic symmetry is imposed by demanding that the shifts for each strand of a given sequence be optimized simultaneously, subject to this constraint.
Figure 4
Plots of dinucleotide frequency averaged over a 3 bp window for alignments under strict 10 bp periodicity with initial setting u1 = u2 = u3 = (1, 11, …, 141) (A) and the adjusted setting with u1 = u3 = (8, …, 68, 79, 89, …, 139) and u2 = (3, 13, …, 73, 74, 84, …, 144) (B).
Figure 5
Frequency plot of TT and AA signals in the alignment presented in Figure 4B.
Figure 6
Comparison of dinucleotide frequency plots from the mixture alignment using AA/TT, GC and TA signals.
Figure 7
Base composition plot in core region of mixture alignment. (A–D) Frequencies of T, A, C and G, respectively, in the aligned sequences, plotted as a function of position along the length of the nucleosome. (E and F) Frequencies of pyrimidines (T+C) or purines (A+G), respectively.
Figure 8
Mixture model alignment compared to center alignment, for chicken nucleosome sequences and randomly chosen chicken genomic sequences. Results for AA/TT signals are shown (A, C, randomly chosen genomic sequences under center and mixture model alignments, respectively; B, D real nucleosome sequences under center and mixture model alignments, respectively); other signals yield comparable results with those shown here for AA/TT.
Similar articles
- Adaptive machine learning technique for periodicity detection in biological sequences.
Rasheed F, Alshalalfa M, Alhajj R. Rasheed F, et al. Int J Neural Syst. 2009 Feb;19(1):11-24. doi: 10.1142/S012906570900180X. Int J Neural Syst. 2009. PMID: 19263500 - Preferred positions of AA and TT dinucleotides in aligned nucleosomal DNA sequences.
Ioshikhes I, Bolshoy A, Trifonov EN. Ioshikhes I, et al. J Biomol Struct Dyn. 1992 Jun;9(6):1111-7. doi: 10.1080/07391102.1992.10507982. J Biomol Struct Dyn. 1992. PMID: 1637505 - Nucleosome DNA sequence pattern revealed by multiple alignment of experimentally mapped sequences.
Ioshikhes I, Bolshoy A, Derenshteyn K, Borodovsky M, Trifonov EN. Ioshikhes I, et al. J Mol Biol. 1996 Sep 20;262(2):129-39. doi: 10.1006/jmbi.1996.0503. J Mol Biol. 1996. PMID: 8831784 Review. - Categorical spectral analysis of periodicity in nucleosomal DNA.
Jin H, Rube HT, Song JS. Jin H, et al. Nucleic Acids Res. 2016 Mar 18;44(5):2047-57. doi: 10.1093/nar/gkw101. Epub 2016 Feb 17. Nucleic Acids Res. 2016. PMID: 26893354 Free PMC article. - Curved DNA.
Trifonov EN. Trifonov EN. CRC Crit Rev Biochem. 1985;19(2):89-106. doi: 10.3109/10409238509082540. CRC Crit Rev Biochem. 1985. PMID: 3905255 Review.
Cited by
- Deciphering the mechanical code of the genome and epigenome.
Basu A, Bobrovnikov DG, Cieza B, Arcon JP, Qureshi Z, Orozco M, Ha T. Basu A, et al. Nat Struct Mol Biol. 2022 Dec;29(12):1178-1187. doi: 10.1038/s41594-022-00877-6. Epub 2022 Dec 5. Nat Struct Mol Biol. 2022. PMID: 36471057 Free PMC article. - Distinctive regulatory architectures of germline-active and somatic genes in C. elegans.
Serizay J, Dong Y, Jänes J, Chesney M, Cerrato C, Ahringer J. Serizay J, et al. Genome Res. 2020 Dec;30(12):1752-1765. doi: 10.1101/gr.265934.120. Epub 2020 Oct 22. Genome Res. 2020. PMID: 33093068 Free PMC article. - Methylation Status of MTHFR Promoter and Oligozoospermia Risk: An Epigenetic Study and in Silico Analysis.
Rezaeian A, Karimian M, Hossienzadeh Colagar A. Rezaeian A, et al. Cell J. 2021 Jan;22(4):482-490. doi: 10.22074/cellj.2021.6498. Epub 2020 Apr 22. Cell J. 2021. PMID: 32347042 Free PMC article. - VOLPES: an interactive web-based tool for visualizing and comparing physicochemical properties of biological sequences.
Bartonek L, Zagrovic B. Bartonek L, et al. Nucleic Acids Res. 2019 Jul 2;47(W1):W632-W635. doi: 10.1093/nar/gkz407. Nucleic Acids Res. 2019. PMID: 31114895 Free PMC article. - Genome-wide Mapping of the Nucleosome Landscape by Micrococcal Nuclease and Chemical Mapping.
Voong LN, Xi L, Wang JP, Wang X. Voong LN, et al. Trends Genet. 2017 Aug;33(8):495-507. doi: 10.1016/j.tig.2017.05.007. Epub 2017 Jul 7. Trends Genet. 2017. PMID: 28693826 Free PMC article. Review.
References
- Satchwell S., Drew H., Travers A. Sequence periodicities in chicken nucleosome core DNA. J. Mol. Biol. 1986;191:659–675. - PubMed
- Widom J. Short-range order in two eukaryotic genomes: relation to chromosome structure. J. Mol. Biol. 1996;259:579–588. - PubMed
- van Holde K.E. Chromatin. New York: Springer-Verlag; 1989.
- Richmond T.J., Davey C.A. The structure of DNA in the nucleosome core. Nature. 2003;423:145–150. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous