What fraction of the human genome is functional? - PubMed (original) (raw)

Review

What fraction of the human genome is functional?

Chris P Ponting et al. Genome Res. 2011 Nov.

Abstract

Many evolutionary studies over the past decade have estimated α(sel), the proportion of all nucleotides in the human genome that are subject to purifying selection because of their biological function. Most of these studies have estimated the nucleotide substitution rates from genome sequence alignments across many diverse mammals. Some α(sel) estimates will be affected by the heterogeneity of substitution rates in neutral sequence across the genome. Most will also be inaccurate if change in the functional sequence repertoire occurs rapidly relative to the separation of lineages that are being compared. Evidence gathered from both evolutionary and experimental analyses now indicate that rates of "turnover" of functional, predominantly noncoding, sequence are, indeed, high. They are sufficiently high that an estimated 50% of mouse constrained noncoding sequence is predicted not to be shared with rat, a closely related rodent. The rapidity of turnover results in, at least, a twofold underestimate of α(sel) by analyses that measure constraint across the eutherian phylogeny. Approaches that take account of turnover estimate that the steady-state value of α(sel) lies between 10% and 15%. Experimental studies corroborate the predicted rates of loss and gain of noncoding functional sites. These studies show the limitations inherent in the use of deep sequence conservation for identifying functional sequence. Experimental investigations focusing on lineage-specific, noncoding, and functional sequence are now essential if we are to appreciate the complete functional repertoire of the human genome.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Conservation and turnover of functional sequence. Functional DNA, such as a spliced coding gene (blue) or regulatory elements (red triangles) present in the last common ancestor of two species, may become nonfunctional (minus) or be augmented by newly arisen regulatory sequence (plus) in a lineage-specific manner (green or gray triangles represent such derived functional sequence). Once the orthologous sequence from these two extant species is compared (below), then conservation is strong for retained ancestral functional sequence (here, coding exons, underlined) but is much weaker, and possibly undetectable, for lost ancestral (red) or lineage-specific (green, gray) sequence.

Figure 2.

Figure 2.

Estimates of αsel from 16 studies ranked by increasing values. Lower and upper bound values are indicated in blue and red, respectively.

Figure 3.

Figure 3.

The constrained noncoding fraction of the human genome (αsel − π) declines exponentially with species divergence, d. The regression line for the natural logarithm of (αsel − π) against d for the Smith et al. (2004) study is shown by the broken line. Data points for the Meader et al. (2010) study are shown (blue diamonds), together with their regression line (solid line). The equations for these lines are presented, together with the inferred values of α0sel and d1/2. Meader et al. (2010) data were taken to be the midpoint between lower and upper bound estimates. Divergence values in the Smith et al. (2004) and Meader et al. (2010) studies were estimated from full alignments and from synonymous sites, respectively. As elsewhere in this review, αsel is defined relative to the size of the human genome, rather than to the sizes of different animal genomes.

Figure 4.

Figure 4.

Conservation and apparent turnover of regulatory regions in the IGK gene encoding immunoglobulin κ. The intronic enhancer discovered by interspecies conservation of noncoding DNA (Emorine et al. 1983) is toward the right end; it lies in a region likely to be under evolutionary constraint, as shown by the mammalian phastCons and conserved element tracks (Siepel et al. 2005). This site and a second, nonconserved site are both bound by NFKB (RELA subunit) in the lymphoblastoid cell line GM12891, shown as the density of mapped ChIP-seq reads on the last track. The ChIP-seq data are from Kasowski et al. (2010) and the ENCODE project (The ENCODE Project Consortium 2011).

References

    1. Aparicio S, Morrison A, Gould A, Gilthorpe J, Chaudhuri C, Rigby P, Krumlauf R, Brenner S 1995. Detecting conserved regulatory elements with the model genome of the Japanese puffer fish, Fugu rubripes. Proc Natl Acad Sci 92: 1684–1688 - PMC - PubMed
    1. Asthana S, Roytberg M, Stamatoyannopoulos J, Sunyaev S 2007. Analysis of sequence conservation at nucleotide resolution. PLoS Comput Biol 3: e254 doi: 10.1371/journal.pcbi.0030254 - PMC - PubMed
    1. Blow MJ, McCulley DJ, Li Z, Zhang T, Akiyama JA, Holt A, Plajzer-Frick I, Shoukry M, Wright C, Chen F, et al. 2010. ChIP-Seq identification of weakly conserved heart enhancers. Nat Genet 42: 806–810 - PMC - PubMed
    1. Bodine DM, Ley TJ 1987. An enhancer element lies 3' to the human A λ globin gene. EMBO J 6: 2997–3004 - PMC - PubMed
    1. Bourque G, Leong B, Vega VB, Chen X, Lee YL, Srinivasan KG, Chew JL, Ruan Y, Wei CL, Ng HH, et al. 2008. Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res 18: 1752–1762 - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources