Distinct Modes of Regulation by Chromatin Encoded through Nucleosome Positioning Signals (original) (raw)

< Back to Article

Figure 2

Nucleosome positioning signals in genomic sequence.

(A) Fraction (normalized, see Methods) of AA/AT/TA/TT and separately, CC/CG/GC/GG dinucleotides at each position of our center-aligned nucleosome-bound sequences with length 146–148, showing ∼10 bp periodicity of these dinucleotide sets. (B) Many 5-mers are enriched in linker or nucleosome regions. Shown is the distribution of (log base 2) ratios between the frequency of 5-mers in linker regions and in nucleosomal DNA regions for all 5-mers (green line), and for the 32 5-mers composed exclusively of either G/C (red bars) or A/T (blue bars) nucleotides. Linkers are taken as contiguous non-repetitive regions of lengths 50–500 bp that are not covered by any nucleosome read in our data. (C) Illustration of the key features of our probabilistic nucleosome–DNA interaction model, including the periodic dinucleotides patterns preferred within the nucleosome, and the 5-mers preferred in linkers. (D) Our model classifies linkers from nucleosomal DNA with high accuracy. Shown is the fraction of all measured nucleosomes that our model correctly classifies as nucleosomes (_y_-axis; true positive rate) against the fraction of all measured linkers that our model incorrectly classifies as nucleosomes (_x_-axis; false positive rate), for each possible threshold on the minimum score above which our model classifies a region as nucleosomal. The score of each measured nucleosome or linker is the mean score that our model assigns in the region that is within 20 bp from the center of the nucleosome or linker, respectively. Scores of the model are assigned using a cross validation scheme, in which every measured nucleosome or linker on a given chromosome is assigned a score using a model that was trained from the data of all other chromosomes. Linkers are defined as contiguous non-repetitive regions of lengths 50–500 bp that are not covered by any nucleosome in our data. Results are shown for separating these 8,017 linkers from nucleosomes with various levels of occupancy (1, 2, 4, 8, and 16), where the occupancy of a nucleosome is defined by the number of nucleosome reads whose center is within 20 bp of its own center. The number of nucleosomes in each classification group are 84,410 (occupancy 1), 69,703 (occupancy 2), 38,787 (occupancy 4), 12,076 (occupancy 8), and 1,601 (occupancy 16). (E) Shown is the combined nucleosome fold depletion over all homopolymeric tracts of A or T (Poly(dA:dT) elements) of length k, for k = 5,6,7,…, and for Poly(dA:dT) elements with exactly 0, 2, 4, or 6 base substitutions (mismatches). Each graph is trimmed at a length K in which there are less than 10 elements, and the fold depletion at this final point is computed over all elements whose length is at least K. The combined fold depletion of a set of genomic elements (_y_-axis) is the ratio between their expected and observed nucleosome coverage, where the expected coverage is the average coverage of any basepair according to our data, and the observed coverage is the average coverage of a basepair from the set (see Methods). The number of underlying elements at various points in the graph is indicated (N). See Figure S4 for a graph of all possible mismatches and showing the number of elements at all points.

Figure 2

doi: https://doi.org/10.1371/journal.pcbi.1000216.g002