The Function Wars: Part I (original) (raw)

This is Part I of the "Function Wars: posts. The second one is on The ENCODE legacy.1

Quibbling about the meaning of the word "function"

The world is not inhabited exclusively by fools and when a subject arouses intense interest and debate, as this one has, something other than semantics is usually at stake.

Stephan Jay Gould (1982)

The ENCODE Consortium tried to redefine the word “function” to include any biological activity that they could detect using their genome-wide assays. This was not helpful since it included a huge number of sites and sequences that result from spurious (nonfunctional) binding of transcription factors or accidental transcription of random DNA sequences to make junk RNA [see What did the ENCODE Consortium say in 2012?]..

I believe that this strange way of redefining biological function was a deliberate attempt to discredit junk DNA. It was quite successful since much of the popular press interpreted the ENCODE results as refuting or disproving junk DNA. I believe that the leaders of the ENCODE Consortium knew what they were doing when they decided to hype their results by announcing that 80% of the human genome is functional [see The Story of You: Encode and the human genome – video, Science Writes Eulogy for Junk DNA]..

The ENCODE Project, today, announces that most of what was previously considered as 'junk DNA' in the human genome is actually functional. The ENCODE Project has found that 80 per cent of the human genome sequence is linked to biological function.

[Google Earth of Biomedical Research]

Theme

Genomes & Junk DNA

It’s unfortunate that one of the consequences of the ENCODE Consortium publicity campaign is an ongoing debate about the exact meaning of the word “function.” This debate has drawn in several philosphers as well as biologists. In some cases this has led to pointless quibbles that do nothing to settle the controversy over junk DNA. These debates also have the unfortunate consequence of seeming to justify the decision of the ENCODE Consortium leaders. I agree with Sean Eddy when he says (Eddy, 2013) ...

Attention focused on the squabbling more than the substance, and probably led some to wonder whether the arguments were just quibbling over the semantics of the word ‘function’.

Trying to conceptualize the forces that act on genome evolution is not just a matter of semantics.

(This is from the commentary in Current Biology where Eddy criticized Dan Graur’s paper (Graur et al., 2013) as “angry, dogmatic, scattershot, sometimes inaccurate.” I strongly disagree with Sean Eddy on that point even though I am sympathetic to the point he makes about quibbling over the meaning of “function” being a distraction.)

Although I am going to quibble about the word “function” in this lengthy post, my main point is that the function wars are, for the most part, distracting and unproductive. We’re interested in the big picture—whether most of our genome is junk—and that’s not going to be resolved by settling on a definition of “function.” We have enough experience in biology to know that very few terms can be defined unambiguously (e.g. “gene,” “species”).

Biomedically Relevant Function

Let’s look at an examples of quibbling over the meaning over of “function.” A recent paper by Germain et al. (2014) points out that the purpose of the ENCODE project was to discover functional sequences in the human genome. They are correct to say this in spite of the fact that the ENCODE leaders are now pretending that looking for function was not a very important part of the ENCODE project. According to the latest revisionist account, the most important contribution was just collecting massive amounts of data (Kellis et al. 2014).2

Germain et al. then go on to say that ....

ENCODE’s controversial claim of functionality should be interpreted as saying that 80% of the genome is engaging in relevant biochemical activities and that are likely to have causal roles in phenomena deemed relevant to biomedical research.

This seems to echo the view of the ENCODE Consortium since in their latest attempt at backtracking (Kellis et al. 2014) they emphasize this same point about medical relevance. After pointing out that only 1% of the genome encodes protein, the ENCODE leaders say ...

More recently, genome-wide association studies have indicated that a majority of trait-associated loci, including ones that contribute to human diseases and susceptibility, also lie outside protein-coding regions. These findings suggest that the noncoding regions of the human genome harbor a rich array of functionally significant elements with diverse gene regulatory and other functions.

They are suggesting that there’s a “rich array” of mysterious sequences that affect genetic diseases. I doubt very much that this is true. The mutations that produce genetic defects in humans will almost certainly turn out to be in well-understood parts of the gene or its closely associated regulatory sequences. There’s no reason to assume that mapping of genetic disease mutations in humans is likely to uncover a huge number of new regulatory elements that escaped detection by geneticists, biochemists, and molecular biologists.

The focus on putative functions that are biomedically relevant is just another way of describing the original claim of the ENCODE Consortium and it does nothing to advance our understanding of “function.” The correct way of expressing this idea is to say that 80% of the human genome might possibly have something to do with biomedical research. Using that kind of logic, one is forced to conclude that the most important result of ENCODE is to narrow the target by showing that 20% of the genome has nothing to do with biomedical research. But even that’s not true because most of the ENCODE leaders won’t rule out undiscovered functions in the remaining 20% of the genome.

It’s easy, and correct, to talk about “biochemical activities” as “putative function” or “potential function” and if that’s all that the ENCODE Consortium did then there would have been no headlines about the death of junk DNA. But even saying that 80% of the genome has a “putative function” is misleading since we know for a fact that one of the fundamental properties of DNA binding proteins is nonspecific (nonfunctional) binding and that these nonspecific sites

must outnumber specific (functional) sites in a large genome (Yamamoto and Alberts, 1976) [see Slip Slidin' Along - How DNA Binding Proteins Find Their Target, DNA Binding Proteins ]..

Similarly, a great deal of the pervasive transcription detected by ENCODE is confined to a small number of cell types and very low abundance—a fact only reluctantly admitted eighteen months after the original papers were published (Kellis et al., 2014). What this means is that much of that pervasive transcription cannot be functional. So, we know for a fact that most of this “putative function” has to be nonfunctional (Struhl, 2007, van Bakel et al., 2010). Incidentally, one of the best ways to prove that accidental binding and spurious transcription is significant is to employ a negative control like the Random Genome Project (Eddy, 2013).

The best way to express this scientifically is not the statement that Germain et al. propose but something like: “The various sites identified by the ENCODE assays cover as much as 80% of the genome. Most of these sites will not have a biological function by any reasonable definition of ‘function’ but a small percentage of them have important, and well-understood, biological functions. It’s quite possible that an even smaller fraction of these sites have functions that we do not yet know about.” Somehow, that doesn’t seem quite as catchy as saying that 80% of the genome is functional.

In fairness, Germain et al seem to recognize the limitations of their argument when they admit that “this 80% cannot strictly speaking be called ... functional as ENCODE claimed.” However, they reveal their bias when they say that it is very likely to be functional. But this is the heart of the dispute. I, and many others, claim that most of this 80% is almost certainly nonfunctional and we have evidence and arguments to back up that claim. Evidence that Germain et al. seem to ignore.

Other philosophers believe that “function” can have different meanings depending on one’s interests. In Elliot et al. (2014) for example, the authors3 point out that medicine uses different definitions just as Germain et al. (2014) suggest. The example used by Elliot et al. is a mutation that causes cancer—this could be an oncogenic genome rearrangement, for example. Physicians could legitimately say that the mutation functions in causing cancer. This is not helpful.

The Many Meanings of “Function”

The issue of whether a large part of our genome is junk is not just a philosophical debate about the meaning of “function” but a large part of the Germain et al. paper is devoted to just that. The authors discuss two philosophical definitions called the causal role account of function and the selected-effect account. I find their discussion tedious and almost incomprehensible. The distinction between the two definitions is explained much better in Doolittle (2013) and Doolittle et al. (2014) but both discussions suffer from the over-emphasis of a false premise; namely, that it’s possible to define “function” in an unambiguous way that sheds light on the junk DNA debate.

The paper by Graur et al. (2013) suffers from the same problem. Those authors come down firmly on the side of selected-effect functions although they recognize that, “Estimates of functionality based on conservation are likely to be, well, conservative.” The best way to define function, according to Graur et al. is in terms of whether losing it has consequences. This is the best working definition, in my opinion: a sequence is functional if deleting it from the genome has an effect on the survivability of the organism or its progeny. This is the definition I’ve been using for almost two decades [see Junk DNA Poll].

Strictly speaking, this definition does not correspond to either the causal-role definition or the selected-effect definition because it can include functional DNA whose sequence is not conserved. This is the same definition used by Niu and Jiang (2013) for the same reason.

Dan Graur has expanded on this point in: "What is function?" A Section from a Future Textbook Chapter (would greatly appreciate comments) but now he seems to focus exclusively on functions that can be destroyed by mutation. This implies that a functional part of the genome has to have a specific sequence that is required for the function. This rules out spacer DNA and any of the bulk DNA hypothesis that are used by opponents of junk DNA. Even worse, this test of function fits the causal-role (CR) definition (not the selected-effect (SE) definition) according to some philosophers (Elliot et al., 2014).

As I mentioned above, Elliot et al. argue that different branches of biology use the word “function is different ways. They also argue that biologists who criticize ENCODE often appeal to the distinction between causal-role (CR) function and selected-effect (SE) function but “they do so in a way that many philosophers would find problematic.” It’s worth pointing out that philosophers are sometimes guilty of writing about biology in ways that many biologists would find problematic. The real question here is whether the debate about the amount of junk DNA in our genome is a biological problem, or a philosophical problem.

For the record, here’s the position adopted by Elliot et al. (2014)

Today, perhaps the closest thing to a consensus among philosophers of biology is that each function concept is associated with a distinct type of explanatory goal. On this view, the SE-function concept is appropriate for developing evolutionary or ultimate explanations, while the CR concept is appropriate for explaining proximate mechanisms.

I don’t know about you, but what this tells me is that philosophers aren’t going to make much of a contribution to the debate over junk DNA but they are going to be active participants in the function wars.

It is absolutely safe to say that if you meet somebody who claims not to believe in evolution, that person is ignorant, stupid or insane (or wicked, but I’d rather not consider that).

Richard Dawkins

I don’t think it’s possible to define biological function in a way that can satisfy everyone. This isn’t unusual in biology since there are many important words that resist airtight definitions. I’m thinking of “gene” and “species” but there are many more. I agree with Doolittle et al. (2014) and Graur et al (2013) in one sense; namely, that defining “function” in terms of evolution and conservation (selected-effect) is vastly superior to defining biological function in terms of something that just does something else (causal-role). I also agree with all critics of the ENCODE Consortium that their attempt to use a causal-effect definition of function was just plain silly. (Or, possibly wicked, but I’d rather not consider that.)

The ENCODE leaders now (2014) take a slightly different approach to defining function. They refer to three approaches to the problem: genetic, biochemical, and evolutionary (Kellis et al., 2014).

The genetic approach relies on identifying function by recognizing stretches of DNA where mutations have an observable effect. This is a pretty good way of recognizing function. I prefer to think of the genetic approach in terms of whether or not a given sequence can be deleted without causing any significant effect but the basic idea is the same. Kellis at al point out the technical limitations of the genetic approach but that’s not very relevant when we’re talking about ways of

defining function.

The evolutionary approach looks at sequence conservation as the hallmark of functional regions of the genome. This is a tried-and-true method of recognizing functional regions of the genome but there are some limitations (see discussion below). There can, in theory, be large regions of the genome that are functional but not conserved in terms of sequence. There is no evidence that this possibility is correct although we know for a fact that there are small regions of the genome that fall into this category,

The ENCODE leaders want you to know that it’s not always easy to recognize short conserved (functional) regions of the genome because multiple sequence alignments are a “substantial challenge.” They remind us that secondary structures in RNA might be conserved even though the sequence can change and that you can have substitutions in binding sites that still allow significant binding. (Nevertheless, scientists have been successful at identifying consensus sequences for over three decades.) The ENCODE leaders also want you to know that new functional sequences that have arisen specifically in the human lineage cannot be detected by the evolutionary approach. While true, this is likely to be trivial, as far as I’m concerned, but there are a surprising number of scientists who actually believe that a large fraction of the genome could have evolved new essential functions since humans diverged from chimpanzees. That’s why they keep mentioning this possibility.

The biochemical approach looks at molecules and sequences to determine what they do. It’s an excellent experimental method of determining whether a given DNA sequence has a function. The only limitation is that you have to understand biochemistry and that means understanding that just because you detect a biochemical

effect of some sort, does not mean that you have identified a function. For example, human transcription factors will bind to million of sites in plant genomes but this activity doesn’t mean that they have a function in plants. Similarly, human transcription factors MUST bind to junk DNA, if it exists, because that’s the nature of DNA binding proteins. That’s a biochemical fact that’s described in all the textbooks.

The problem, as I see it, is that while biological function can most often be associated with conservation and selection, it isn’t a sufficient definition and it sometimes misidentifies sequences that don’t really have a significant biological function. In other words, there are both false positives and false negatives.

A good working definition of “biological function” is to consider a particular stretch of DNA functional if deleting it affects the survival of the organism or its descendants. Conversely, if the DNA can be removed without consequences then it is probably junk. These are not rigorous definitions because there are all kinds of cases where a gene with a known function can be deleted without harm to the organism.

For example, think of our primitive ancestor who just acquired a mutation in the gene for making vitamin C. That sequence is now junk because it can no longer encode an enzyme but was it junk or was it functional just before it acquired an inactivating mutation? I think we would want to say that the DNA sequence encoding the enzyme (L-glucono-γ-lactone oxidase) has a biological function even if we know that deleting it will have no effect.

An even better example is the gene for the enzyme _N_-acetylaminogalactosyl-transferase. This is the gene that controls ABO blood types. People with O-type blood are homozygous for alleles that make the gene nonfunctional and no enzyme is produced [Online Mendelain Inheritance in Man (OMIM) 110300]. As a consequence, the protein on the surface of red blood cells is not glycosylated as it is in people with A-type, B-type, and AB-type blood.

There is no evidence that people with the defective gene and O-type blood are any worse off than people that have the glycosylated protein. Does that mean that the ABO gene is junk even though it has a well-defined function? I don’t think that makes a lot of sense. This is a functional gene even though it meets our working definition of junk DNA.

Given examples like these, the working definition of junk DNA is not an airtight, unambiguous, way to identify junk DNA because it includes some DNA that has a clear biological function. Conversely, it may be possible to delete fairly large regions of the genome without immediate consequences as was done in the now-famous mouse genome deletion experiment (Nobrega et al., 2004) but opponents of junk DNA will not accept this as proof that the DNA was junk because they can imagine functions that might go undetected under laboratory conditions. Furthermore, there are those who argue that if we were to delete all the putative junk DNA from our genome there would probably be consequences. Cells might be smaller and cell divisions might be more frequent so that humans with very little junk DNA might look very different. This could be true but it doesn’t mean that the extra DNA in our genome is actually functional. It’s still junk.

What this means is that defining junk DNA as DNA that can be deleted without consequences will always be contested by quibbling. Nevertheless, it’s the best definition we have and it works quite well as long as you ignore the nitpicking and think about the big picture. About 90% of our genome is junk according to the best available biological evidence. Quibbling about the meaning of “function” (or "junk”) isn’t going to change that very much. The gray area, where a given sequence could be “junk” or “functional” represents only a few percent of the genome. (Although it probably takes up 90% of the published literature.)

What about identifying function by relying on sequence conservation? This is an evolutionary definition. It seems to be a pretty good way identifying functional regions of the genome (Doolittle et al,. 2014, Graur et al., 2013) and it’s slightly different from a definition that identifies function by saying that the DNA can’t be deleted without consequences. Looking for sequence conservation is a positive way of recognizing functional regions of the genome—at least in theory. It has worked pretty well in the past 50 years or so.

I agree with most biologists that conserved DNA is a pretty good proxy for functional DNA and that nonconserved DNA is most likely junk. However, even this definition is neither inclusive nor exclusive. There are examples of conserved DNA that look like junk and examples of nonconserved DNA that has a function.

As mentioned above, two large regions of the mouse genome were deleted without effect (Norbrega et al., 2004). Together, those regions covered 1,243 segments of DNA that were 70% identical in mice and humans (100 bp. window). This tells us that sequence conservation is not a reliable indication of function.

Similarly, Ahituv et al. (2007) detected four “ultraconserved” regions of the mouse genome that were shown to function as enhancers in vitro. Deleting these regions from the mouse genome yielded viable, fertile, mice that were indistinguishable from mice whose genomes contained the ultraconserved regions. The regions were conserved and potentially functional but they appear to be junk DNA.

We also have examples of pseudogenes whose sequences are relatively conserved in closely related species but they are, nevertheless, junk. Bits and pieces of defective transposons are important examples in this discussion since they represent a significant portion of the genome that is conserved between, say, humans and chimpanzees. They are conserved because they descend from an active transposon that inserted into that locus in the common ancestor of chimpanzees and human. But, today, those sequences are junk.

Speaking of transposons, active transposons have enhancers, a promoter, and at least one open reading frame encoding reverse transcriptase or transposase, depending on the type of transposon. The gene is functional and so are the regulatory regions. Under the right circumstances the gene will be transcribed and the transposon can move to a new location in the genome. Are active transposons junk DNA or are they part of the functional portion of the genome?

The question is analogous to asking whether an integrated copy of bacteriophage lambda in the E. coli genome (prophage) is functional or not. I think we would want to say that it IS functional and so are active transposons. These are not true examples of junk DNA. (Active transposons make up only a tiny proportion of the mammalian genome so the resolution of this semantic problem has no effect on the big picture debate.)

Questions like this can be of immense interest to philosophers and to those interested in the philosophy of biology. The previously mentioned paper by Elliot et el. (2014) addresses just this point: Conceptual and Empirical Challenges of Ascribing Functions to Transposable Elements. They talk about distinguishing between different levels of function such as the organismal level and the transposon level. It’s not clear whether they consider transposons functional at the transposon level and junk at the organismal level because much of the discussion is about whether transposons can affect the survival of the organism. That paper (Elliot et al., 2014) is a good example of the difficulties one can get into when the emphasis is on semantics (or philosophy) rather than the real question of how much of our genome is junk.

So conservation doesn’t necessarily mean that the DNA is functional. But are there examples of nonconserved sequences that are functional? Yes, there are. The best examples are spacer DNAs that separate DNA binding sites that have to form a loop when bound by their respective factors. The classic example is binding of lac repressor to two operator sites upstream of the promoter for the lac operon (Krämer et al. 1987; Krämer et al. 1988). You need the spacer but its sequence is unimportant. It has a function. Similarly, there’s a minimal size of intron because the assembly of the spliceosome requires an RNA loop [Junk in Your Genome: Protein-Encoding Genes] [Junk in Your Genome: Intron Size and Distribution].

These particular exceptions aren’t going to make much of a difference because they don’t involve a large percentage of the genome. That’s why sequence conservation is a good approximation of function and lack of conservation is still a fairly reliable indicator of junk DNA.

However, there are some possible “exceptions” to the rule that may be more important. One of them concerns a different kind of “spacer” DNA based on our understanding of chromosome bands and puffs in Drosophila polytene chromosomes and lampbrush chromosomes in vertebrate oocytes (especially amphibians). The idea is that genes are arranged on long loops of DNA that form compact higher order chromatin structures when the genes are silent but large extended loops when they are active. Emil Zukerkandl suggested back in 1976 that a certain amount of spacer DNA was necessary to keep genes apart on these loops and to form the complex heterochromatic state required for gene silencing. If more complex species needed more spacer DNA (larger loops), this would explain the C-value paradox (Zuckerkandl, 1976). A similar idea was suggested by Gall (1981).

There’s no evidence to support this hypothesis so it has been ignored in recent years. I mention it only to show that there are “spacer DNA” explanations that can account for a large percentage of the genome. This is DNA that cannot be identified by sequence conservation.

In addition, some people think that bulk DNA serves an important function in protecting against mutation, or in regulating the size of the nucleus. (There are other possibilities.) The point is that these bulk DNA hypotheses, like the one mentioned above, do not require sequence conservation but they do postulate that a lot of DNA has a function—it is not junk. If any of these hypotheses are correct then sequence conservation is not a reliable proxy for function. Fortunately, none of the bulk DNA hypotheses make any sense, so the point is moot.

So, we can adopt a working definition of function and junk based on whether or not deleting the DNA in question affects the survivability of the organism or its descendants. (Keeping in mind that there are minor exceptions).

Function Wars
(My personal view of the meaning of function is described at the end of Part V.)

1. Alex Palazzo suggested that we call these the “function wars.” Thanks, Alex.

2. At a cost of $200,000,000.

3. Only one of them, Linquist, is a card-carrying philosopher.

Ahituv, N., Zhu, Y., Visel, A., Holt, A., Afzal, V., Pennacchio, L. A. and Rubin, E. M. (2007) Deletion of ultraconserved elements yields viable mice. PLoS biology 5, e234.

Doolittle, W. F. (2013) Is junk DNA bunk? A critique of ENCODE. Proceedings of the National Academy of Sciences 110, 5294-5300. [doi: 10.1073/pnas.1221376110 ]

Eddy, S. R. (2013) The ENCODE project: missteps overshadowing a success. Current Biology 23:R259-R261. [10.1016/j.cub.2013.03.023]

Elliott, T. A., Linquist, S. and Gregory, T. R. (2014) Conceptual and empirical challenges of ascribing functions to transposable elements. The American naturalist 184:14-24. [doi: 10.1086/676588]

Gall, J. G. (1981) Chromosome structure and the C-value paradox. The Journal of cell biology 91, 3s-14s. [PDF]

Germain, P.-L., Ratti, E. and Boem, F. (2014) Junk or functional DNA? ENCODE and the function controversy. Biology & Philosophy, 1-25. (published online March 21, 2014) [doi: 10.1007/s10539-014-9441-3]

Graur, D., Zheng, Y., Price, N., Azevedo, R. B., Zufall, R. A. and Elhaik, E. (2013) On the immortality of television sets:“function” in the human genome according to the evolution-free gospel of ENCODE. Genome biology and evolution 5, 578-590. [doi: 10.1093/gbe/evt028]

Kellis, M., Wold, B., Snyder, M. P., Bernstein, B. E., Kundaje, A., Marinov, G. K., Ward, L. D., Birney, E., Crawford, G. E. and Dekker, J. (2014) Defining functional DNA elements in the human genome. Proceedings of the National Academy of Sciences 111, 6131-6138. [doi: 10.1073/pnas.1318948111]

Krämer, H., Niemöller, M., Amouyal, M., Revet, B., von Wilcken-Bergmann, B. and Müller-Hill, B. (1987) lac repressor forms loops with linear DNA carrying two suitably spaced lac operators. The EMBO journal 6:1481-1491. [PDF]

Krämer, H., Amouyal, M., Nordheim, A. and Müller-Hill, B. (1988) DNA supercoiling changes the spacing requirement of two lac operators for DNA loop formation with lac repressor. The EMBO journal 7:547-556. [PDF]

Niu, D.-K. and Jiang, L. (2013) Can ENCODE tell us how much junk DNA we carry in our genome? Biochemical and biophysical research communications 430, 1340-1343. [doi: 10.1016/j.bbrc.2012.12.074

Nobrega, M. A., Zhu, Y., Plajzer-Frick, I., Afzal, V. and Rubin, E. M. (2004) Megabase deletions of gene deserts result in viable mice. Nature 431, 988-993.

Palazzo, A.F. and Gregory, T R. (2014) The Case for Junk DNA. PLoS genetics 10, e1004351 [[doi: 10.1371/journal.pgen.1004351](http://DOI: 10.1371/journal.pgen.1004351)]

Struhl, K. (2007) Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nature structural & molecular biology 14:103-105. [doi: 10.1038/nsmb0207-103]

van Bakel, H., Nislow, C., Blencowe, B. J. and Hughes, T. R. (2010) Most “dark matter” transcripts are associated with known genes. PLoS biology 8, e1000371. [doi: 10.1371/journal.pbio.1000371]

Yamamoto, K. and Alberts, B. (1976) Steroid Receptors: Elements for Modulation of Eukaryotic Transcription. Annual review of biochemistry 45, 721-746.

Zuckerkandl, E. (1976) Gene control in eukaryotes and thec-value paradox “Excess” DNA as an impediment to transcription of coding sequences. Journal of molecular evolution 9, 73-104. [PDF]