Just a few years ago, the prevailing wisdom said that the genome comprises 3 percent or so genes and 97 percent “junk” (with 2 or 3 percent of that junk consisting of the fossilized remains of retroviruses that infected our ancestors somewhere along the line). After a decade of painstaking analysis by more than 200 scientists, the new ENCODE data show that indeed 2.94 percent of the genome is protein-coding genes, while 80.4 percent of sequences regulate how those genes get turned on, turned off, expressed, processed, and modified.

This fundamentally changes how most biologists understand the master instruction set of life: we are, in short, 3 percent  input/output and 80 percent logic. (Though perhaps a surprise to biologists, the finding will hardly astound anyone who has designed a complex interactive system.)

Correct me if I'm wrong, but this is a really big deal, right?

6 comments, sorted by Click to highlight new comments since: Today at 10:55 PM
New Comment

Another view.

The main ENCODE paper says:

Operationally, we define a functional element as a discrete genome segment that encodes a defined product (for example, protein or non-coding RNA) or displays a reproducible biochemical signature (for example, protein binding, or a specific chromatin structure).

Nothing in that implies that a piece of DNA has any function useful to the organism.

More controversy.

Correct me if I'm wrong, but this is a really big deal, right?

Mmmm. It's new data, which is important, but it's not new data that particularly upsets any accepted theoretical models.

It was easy to figure out the count of human proteins to human DNA base pairs, and figure out only a small fraction was actually protein coding back in the 1970s. So people started theorizing about all the rest being junk. We knew something of the 97% had to be regulatory in function, and everybody had their own preferred guesses. Those guesses have been steadily moving higher over the last 20 years (at least). Now we (apparently) know it's on the order of 80% that's regulatory, instead of 3% or 10% of 30% or whatever else people were theorizing.

It does suggest true parasite DNA is a lot less common than some thought (especially back in the 1970s). But there's still lots of room in the remaining 17% for parasite DNA, so while the magnitude has been reduced, the underlying theories still have a good chunk of the genome to play in.

Catching yourself explaining something that doesn't need explaining is an excellent opportunity to recalibrate :D

Oh, certainly, you reduce the explanatory power of an explanation, you lower the probability of the explanation being true.

But, well, "parasite DNA" at the fundamental level is assuming Darwinian mutation-and-selection happens among transposons. Which seems quite plausible on its own, even after this, especially since retroviruses can be treated as a special class of retrotransposons.

And now that I'm actually looking at the paper instead of the news, it's not clear how much of this stuff is "functional" because it actually does something like regulation of expression, and how much is "functional" because it's biologically active "parasite DNA".

Mostly, I'd classify this as another case of "no, really, skip the science news, and read settled science instead."

... on the order of 80% that's regulatory ...


1 Less than 2% is coding

2 About 8% is known to be regulatory. Most of the interesting stuff from the ENCODE project is about what parts do what in what cells in this category

3 Given how they looked, it is likely that another 10% or so is regulatory in cells that they have not looked at.

4 20% or so has absolutely no effect

5 When they talk about 80% being "functional" that is in the weakest sense of the word functional that anyone has even played with. Typical examples of the functions of the remaining 60% of the genome is that something binds to it for no reason, or something binds to it to prevent it from being transcribed, or it gets transcribed but the RNA thus produced does nothing.