Identification of higher-order functional domains in the human ENCODE regions - PubMed (original) (raw)

Comparative Study

Identification of higher-order functional domains in the human ENCODE regions

Robert E Thurman et al. Genome Res. 2007 Jun.

Abstract

It has long been posited that human and other large genomes are organized into higher-order (i.e., greater than gene-sized) functional domains. We hypothesized that diverse experimental data types generated by The ENCODE Project Consortium could be combined to delineate active and quiescent or repressed functional domains and thereby illuminate the higher-order functional architecture of the genome. To address this, we coupled wavelet analysis with hidden Markov models for unbiased discovery of "domain-level" behavior in high-resolution functional genomic data, including activating and repressive histone modifications, RNA output, and DNA replication timing. We find that higher-order patterns in these data types are largely concordant and may be analyzed collectively in the context of HeLa cells to delineate 53 active and 62 repressed functional domains within the ENCODE regions. Active domains comprise approximately 44% of the ENCODE regions but contain approximately 75%-80% of annotated genes, transcripts, and CpG islands. Repressed domains are enriched in certain classes of repetitive elements and, surprisingly, in evolutionarily conserved nonexonic sequences. The functional domain structure of the ENCODE regions appears to be largely stable across different cell types. Taken together, our results suggest that higher-order functional domains represent a fundamental organizing principle of human genome architecture.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Wavelet segmentation approach for functional domain mapping. (A) Exemplary continuous functional data type (H3 acetylation) for ENCODE region ENm005. (B) Continuous wavelet transform heatmap (“scalogram”) of H3 acetylation data. In the heatmap, the horizontal axis represents genomic position, while the vertical axis represents wavelet scale. Each color in the scalogram represents the magnitude of the wavelet coefficient at that genomic position and scale, ranging from blue (small magnitude) to white (large magnitude). Larger magnitude wavelet coefficients imply a strong trend in the original data at that position and scale. The 64-kb scale is marked with a dashed red line. (C) Wavelet smoothed data at the 64-kb scale obtained using MODWT. Horizontal axis: genomic position. Vertical axis: wavelet coefficient at that position at the 64-kb scale. (D) Results from two HMM state segmentation of data from C, based on fitting HMM to H3ac data over all ENCODE regions. The top row indicates state 1 regions in black, while the bottom row indicates state 0 regions in black. The high state (state 1) is taken to represent active domains based on the assumption that H3ac is an activating mark. (E) GENCODE gene annotations for ENm005. Note the correspondence between state 1/active and GENCODE gene and density.

Figure 2.

Figure 2.

Simultaneous segmentation of four ENCODE functional data types. (A) Exemplary results from eight ENCODE regions ENm001 (1.8 Mb), ENm002 (1 Mb), ENm003 (600 kb), ENm004 (1.7 Mb), ENm005 (1.6 Mkb), ENm006 (1 Mb), ENm008 (1 Mb), and ENm012 (1.2 Mb). For each ENCODE region subpanel, wavelet smoothed data are displayed as tracks ordered _top_-to-bottom as follows: TR50 (black), RNA (blue), H3K27me3 (purple), and H3ac (orange). State assignments (domains) resulting from simultaneous HMM segmentation are shown at bottom as black rectangles; see Fig. 1 for additional description. (B) Close-up of ENCODE region ENm005, with bracketed intervals indicating exemplary domains in states 0 and 1. State 1 generally corresponds to higher levels of RNA and H3ac and lower levels of TR50 and H3K27me3, and is therefore assigned the active label, while state 0 is correspondingly assigned repressed. The latter contains the oligodendrocyte-specific OLIG1 and OLIG2 genes, which are repressed in the tissues studied under ENCODE.

Figure 3.

Figure 3.

Enrichment and depletion of annotated genomic features in active and repressed domains. Data are based on simultaneous segmentation of four data types (H3ac, H3K27me3, RNA, TR50). Green bars correspond to active state regions, red bars to repressed regions. Values (Y axis) indicate percentage enrichment or depletion over random expectation. For example, GENCODE TxStarts are ∼71% enriched over expectation in active regions and ∼61% depleted under expectation in repressed regions (see Table 4 for corresponding data). Shaded bars reflect enrichment or depletion that is not significant at the 0.01 level based on the label permutation test (see Methods).

Similar articles

Cited by

References

    1. Allen T.E., Herrgrd M.J., Liu M., Qiu Y., Glasner J.D., Blattner F.R., Palsson B.O., Herrgrd M.J., Liu M., Qiu Y., Glasner J.D., Blattner F.R., Palsson B.O., Liu M., Qiu Y., Glasner J.D., Blattner F.R., Palsson B.O., Qiu Y., Glasner J.D., Blattner F.R., Palsson B.O., Glasner J.D., Blattner F.R., Palsson B.O., Blattner F.R., Palsson B.O., Palsson B.O. Genome-scale analysis of the uses of the Escherichia coli genome: Model-driven analysis of heterogeneous data sets. J. Bacteriol. 2003;185:6392–6399. - PMC - PubMed
    1. Allen T.E., Price N.D., Joyce A.R., Palsson B.O., Price N.D., Joyce A.R., Palsson B.O., Joyce A.R., Palsson B.O., Palsson B.O. Long-range periodic patterns in microbial genomes indicate significant multiscale chromosomal organization. PLoS Comput. Biol. 2006;2:13–21. - PMC - PubMed
    1. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Dolinski K., Dwight S.S., Eppig J.T., Dwight S.S., Eppig J.T., Eppig J.T., et al. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. - PMC - PubMed
    1. Audit B., Vaillant C., Arnéodo A., Thermes C., Vaillant C., Arnéodo A., Thermes C., Arnéodo A., Thermes C., Thermes C. Wavelet analysis of DNA bending profiles reveals structural constraints on the evolution of genomic sequences. J. Biol. Phys. 2004;30:33–81. - PMC - PubMed
    1. Azuara V Perry P., Sauer S., Spivakov M., Jorgensen H.F., John R.M., Gouti M., Casanova M., Warnes G., Merkenschlager M., Sauer S., Spivakov M., Jorgensen H.F., John R.M., Gouti M., Casanova M., Warnes G., Merkenschlager M., Spivakov M., Jorgensen H.F., John R.M., Gouti M., Casanova M., Warnes G., Merkenschlager M., Jorgensen H.F., John R.M., Gouti M., Casanova M., Warnes G., Merkenschlager M., John R.M., Gouti M., Casanova M., Warnes G., Merkenschlager M., Gouti M., Casanova M., Warnes G., Merkenschlager M., Casanova M., Warnes G., Merkenschlager M., Warnes G., Merkenschlager M., Merkenschlager M., et al. Chromatin signatures of pluripotent cell lines. Nat. Cell Biol. 2006;8:532–538. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources