Followup to:  An Alien God, The Wonder of Evolution, Evolutions Are Stupid

Yesterday, I wrote:

Humans can do things that evolutions probably can't do period over the expected lifetime of the universe.  As the eminent biologist Cynthia Kenyon once put it at a dinner I had the honor of attending, "One grad student can do things in an hour that evolution could not do in a billion years."  According to biologists' best current knowledge, evolutions have invented a fully rotating wheel on a grand total of three occasions.

But then, natural selection has not been running for a mere million years.  It's been running for 3.85 billion years.   That's enough to do something natural selection "could not do in a billion years" three times.  Surely the cumulative power of natural selection is beyond human intelligence?

Not necessarily.  There's a limit on how much complexity an evolution can support against the degenerative pressure of copying errors.

(Warning:  A simulation I wrote to verify the following arguments did not return the expected results.  See addendum and comments.)

(Addendum 2:  This discussion has now been summarized in the Less Wrong Wiki.  I recommend reading that instead.)

The vast majority of mutations are either neutral or detrimental; here we are focusing on detrimental mutations.  At equilibrium, the rate at which a detrimental mutation is introduced by copying errors, will equal the rate at which it is eliminated by selection.

A copying error introduces a single instantiation of the mutated gene.  A death eliminates a single instantiation of the mutated gene. (We'll ignore the possibility that it's a homozygote, etc; a failure to mate also works, etc.)  If the mutation is severely detrimental, it will be eliminated very quickly - the embryo might just fail to develop.  But if the mutation only leads to a 0.01% probability of dying, it might spread to 10,000 people before one of them died.  On average, one detrimental mutation leads to one death; the weaker the selection pressure against it, the more likely it is to spread.  Again, at equilibrium, copying errors will introduce mutations at the same rate that selection eliminates them. One mutation, one death.

This means that you need the same amount of selection pressure to keep a gene intact, whether it's a relatively important gene or a relatively unimportant one.  The more genes are around, the more selection pressure required.  Under too much selection pressure - too many children eliminated in each generation - a species will die out.

We can quantify selection pressure as follows:  Suppose that 2 parents give birth to an average of 16 children.  On average all but 2 children must either die or fail to reproduce.  Otherwise the species population very quickly goes to zero or infinity.  From 16 possibilities, all but 2 are eliminated - we can call this 3 bits of selection pressure.  Not bits like bytes on a hard drive, but mathematician's bits, information-theoretical bits; one bit is the ability to eliminate half the possibilities.  This is the speed limit on evolution.

Among mammals, it's safe to say that the selection pressure per generation is on the rough order of 1 bit.  Yes, many mammals give birth to more than 4 children, but neither does selection perfectly eliminate all but the most fit organisms.  The speed limit on evolution is an upper bound, not an average.

This 1 bit per generation has to be divided up among all the genetic variants being selected on, for the whole population.  It's not 1 bit per organism per generation, it's 1 bit per gene pool per generation.  Suppose there's some amazingly beneficial mutation making the rounds, so that organisms with the mutation have 50% more offspring.  And suppose there's another less beneficial mutation, that only contributes 1% to fitness.  Very often, an organism that lacks the 1% mutation, but has the 50% mutation, will outreproduce another who has the 1% mutation but not the 50% mutation.

There are limiting forces on variance; going from 10 to 20 children is harder than going from 1 to 2 children.  There's only so much selection to go around, and beneficial mutations compete to be promoted by it (metaphorically speaking).  There's an upper bound, a speed limit to evolution:  If Nature kills off a grand total of half the children, then the gene pool of the next generation can acquire a grand total of 1 bit of information.

I am informed that this speed limit holds even with semi-isolated breeding subpopulations, sexual reproduction, chromosomal linkages, and other complications.

Let's repeat that.  It's worth repeating.  A mammalian gene pool can acquire at most 1 bit of information per generation.

Among mammals, the rate of DNA copying errors is roughly 10^-8 per base per generation.  Copy a hundred million DNA bases, and on average, one will copy incorrectly.  One mutation, one death; each non-junk base of DNA soaks up the same amount of selection pressure to counter the degenerative pressure of copying errors.  It's a truism among biologists that most selection pressure goes toward maintaining existing genetic information, rather than promoting new mutations.

Natural selection probably hit its complexity bound no more than a hundred million generations after multicellular organisms got started.  Since then, over the last 600 million years, evolutions have substituted new complexity for lost complexity, rather than accumulating adaptations.  Anyone who doubts this should read George Williams's classic "Adaptation and Natural Selection", which treats the point at much greater length.

In material terms, a Homo sapiens genome contains roughly 3 billion bases.  We can see, however, that mammalian selection pressures aren't going to support 3 billion bases of useful information.  This was realized on purely mathematical grounds before "junk DNA" was discovered, before the Genome Project announced that humans probably had only 20-25,000 protein-coding genes.  Yes, there's genetic information that doesn't code for proteins - all sorts of regulatory regions and such.  But it is an excellent bet that nearly all the DNA which appears to be junk, really is junk.  Because, roughly speaking, an evolution isn't going to support more than 10^8 meaningful bases with 1 bit of selection pressure and a 10^-8 error rate.

Each base is 2 bits.  A byte is 8 bits.  So the meaningful DNA specifying a human must fit into at most 25 megabytes.

(Pause.)

Yes.  Really.

And the Human Genome Project gave the final confirmation.  25,000 genes plus regulatory regions will fit in 100,000,000 bases with lots of room to spare.

Amazing, isn't it?

Addendum:  genetics.py, a simple Python program that simulates mutation and selection in a sexually reproducing population, is failing to match the result described above.  Sexual recombination is random, each pair of parents have 4 children, and the top half of the population is selected each time.  Wei Dai rewrote the program in C++ and reports that the supportable amount of genetic information increases as the inverse square of the mutation rate(?!) which if generally true would make it possible for the entire human genome to be meaningful.

In the above post,  George Williams's arguments date back to 1966, and the result that the human genome contains <25,000 protein-coding regions comes from the Genome Project.  The argument that 2 parents having 16 children with 2 surviving implies a speed limit of 3 bits per generation was found here, and I understand that it dates back to Kimura's work in the 1950s.  However, the attempt to calculate a specific bound of 25 megabytes was my own.

It's possible that the simulation contains a bug, or that I used unrealistic assumptions.  If the entire human genome of 3 billion DNA bases could be meaningful, it's not clear why it would contain <25,000 genes.  Empirically, an average of O(1) bits of genetic information per generation seems to square well with observed evolutionary times; we don't actually see species gaining thousands of bits per generation.  There is also no reason to believe that a dog has greater morphological or biochemical complexity than a dinosaur.  In short, only the math I tried to calculate myself should be regarded as having failed, not the beliefs that are wider currency in evolutionary biology.  But until I understand what's going on, I would suggest citing only George Williams's arguments and the Genome Project result, not the specific mathematical calculation shown above.

New Comment
105 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

A lot of our DNA was acquired in the days when our ancestors were not yet mammals.

"Surely the cumulative power of natural selection is beyond human intelligence?"

Even if it was, why would you want to use it? Evolution has thoroughly screwed over more human beings than every brutal dictator who ever lived, and that's just humans, never mind the several billion extinct species which litter our planet's history.

3Houshalter
Nuclear technology has been used to kill hundreds of thousands of people, it's still a useful form of energy.

So the meaningful DNA specifying a human must fit into at most 25 megabytes.

And that's before compression :-)

"So the meaningful DNA specifying a human must fit into at most 25 megabytes."

These are bits of entropy, not bits on a hard drive. It's mathematically impossible to compress bits of entropy.

Eliezer, your argument seems to confuse two different senses of information. You first define "bit" as "the ability to eliminate half the possibilities" -- in which case, yes, if every organism has O(1) children then the logical "speed limit on evolution" is O(1) bits per generation.

But you then conclude that "the meaningful DNA specifying a human must fit into at most 25 megabytes" -- and more concretely, that "it is an excellent bet that nearly all the DNA which appears to be junk, really is junk." I don't think that follows at all.

The underlying question here seems to be this: suppose you're writing a software application, and as you proceed, many bits of code are generated at random, many bits are logically determined by previous bits (albeit in a more-or-less "mindless" way), and at most K times you have the chance to fix a bit as you wish. (Bits can also be deleted as you go.) Should we then say that whatever application you end up with can have at most K bits of "meaningful information"?

Arguably from some God's-eye view. But any mortal examining the code could see far more than K of the bits fulfillin... (read more)

Excluding the complex and subtle regulatory functions that non-coding DNA can possess strikes me as being extremely unwise.

There is no DNA in the maize genome that codes for striped kernels, because that color pattern is the result of transposons modulating gene expression. The behavior of one transposon is intricately linked to the total behavior of all transposons, and the genetic shifts they result in defy the simple mathematical rules of Mendelian inheritance. But more importantly, the behavior of transposons is deeply linked to the physical structure of the encoding regions they're associated with.

Roughly half the genome of corn is made up of transposons. Is this 'junk' or not?

Aaronson, McCabe:

Actually, these mathematician's bits are very close to bits on a hard drive. Genomes, so far as I know, have no ability to determine what the next base ought logically to be; there is no logical processing in a ribosome. Selection pressure has to support each physical DNA base against the degenerative pressure of copying errors. Unless changing the DNA base has no effect on the organism's fitness (a neutral mutation), the "one mutation, one death" rule comes into play.

Now certainly, once the brain is constructed and patterned,... (read more)

However, mutation rates vary and can be selected. They aren't simply a constraint.

Also, it's been a long time since I've thought about this and I may be wrong, but aren't you talking about 1 bit per linkage group and not one bit per genome? (And the size of linkage groups also varies and can be selected.)

Some viruse genomes face severe constraints on size -- they have a container they must fit into -- say an icosahedral shape -- and it would be a big step to increase that size. And some of those make proteins off both strands of DNA and sometimes in more t... (read more)

Eliezer, so long an organism's fitness depends on interactions between many different base pairs, the effect can be as if some of the base pairs are logically determined by others.

Also, unless I'm mistaken there are some logical operations that the genome can perform: copying, transpositions, reversals...

To illustrate, suppose (as apparently happens) a particular DNA stretch occurs over and over with variations: sometimes forwards and sometimes backwards, sometimes with 10% of the base pairs changed, sometimes chopped in half and sometimes appended to anot... (read more)

Tom:These are bits of entropy...mathematically impossible to compress

My bad, was thinking of the meaningful base pairs. Thanks for correcting me.

I interpret Eliezer to be saying that the Kolmogorov complexity of the human genome is roughly 25MB -- the absolute smallest computer program that could output a viable human genome would be about that size. But this minimal program would use a ridiculous number of esoteric tricks to make itself that small. You'd have to multiply that number by a large factor (representing how compressible, in principle, modern applications are) to make a comparison to hard drive bits as they are actually used.

Eek I just noticed an unfortunate way that last comment could be read. I meant I was thinking of material bits of information when I should have thought of information-theoretical bits. I in no way interpret your "bits of entropy" to mean physical, non-meaningful base pairs!

OK, I came up with a concrete problem whose solution would (I think) tell us something about whether Eliezer's argument can actually work, assuming a certain stylized set of primitive operations available to DNA: insertion, deletion, copying, and reversal. See here if you're interested.

Eliezer, I see two potential flaws in your argument, let me try and explain:

1.) The copy error rate can't directly translate mathematically into how often individuals in a species die out due to the copy error rate. We simply can't know how often a mutation is neutral, good, or detrimental, in part because that depends on the specific genome involved. I imagine some genomes are simply more robust than others. But I believe the prevailing wisdom is that most mutations are neutral, simply because proteins are too physically big to be effected by small change... (read more)

Scott, the mechanisms you've described indeed allow us to end up with more meaningful physical DNA than the amount of information in it. To give a concrete example, a protein-coding gene is copied, then mutates, and then there's two highly structured proteins performing different chemical functions, which because of their similarity, evolved faster than counting the bases separately would seem to allow.

So the 1 bit/generation speed limit on evolution is not a speed limit on altered DNA bases - definitely not!

The problem is that these meaningful bases also... (read more)

Quite a lot of mutations are so lethal that they abort embryonic development, yes. This is a severe problem with organisms drawn from a narrow gene pool, like humans and corn, and less so with others. It's worth noting that, if we consider these mutations in the argument, we have to consider not only the children who are born and are weeded out, but all of the embryos conceived and lost as well.

Given how few conceptions actually make it to birth, and how many infants died young before the advent of modern medicine, humans didn't lose two out of four, they lost more like two out of eight-to-twelve.

Eliezer, I'm a little skeptical of your statement that sexual reproduction/recombination won't add information...

  1. Single base pairs don't even code for amino acids, much less proteins. 2. If we're looking at how a mutation affects an organism's ability to reproduce, we want to consider at least an entire protein, not just an amino acid. 3. There can be multiple genes that are neutral on their own, yet in combination are either very harmful or very beneficial.

Can you provide an argument as to why none of this affects the "speed limit" (not even by a constant factor?)

"To sum up: The mathematician's bits here are very close to bits on a hard drive, because every DNA base that matters has to be supported by "one mutation, one death" to overcome per-base copying errors."

There are only twenty amino acids plus a stop code for each codon, so the theoretical information bound is 4.4 bits/codon, not 6 bits, even for coding DNA. A common amino acid, such as leucine, only requires two base pairs to specify; the third base pair can freely mutate without any phenotypic effects at all.

"Can you provide an ar... (read more)

Even in the argument, it applies to organisms that lose half of their offspring to selection. It's different for those that lose more, or less.

Among mammals, it's safe to say that the selection pressure per generation is on the rough order of 1 bit. Yes, many mammals give birth to more than 4 children, but neither does selection perfectly eliminate all but the most fit organisms. The speed limit on evolution is an upper bound, not an average.

One bit per generation equates to a selection pressure which kills half of each generation before they reproduce according to the first part of your post. Then you say 1 bit per generation is the most mammalian reproduction can sustain. But, more than hal... (read more)

"But, more than half of mammals (in many, perhaps most, species) die without reproducing. Wouldn't this result in a higher rate of selection and, therefore, more functional DNA?"

"Yes, many mammals give birth to more than 4 children, but neither does selection perfectly eliminate all but the most fit organisms. The speed limit on evolution is an upper bound, not an average."

But mammals have many ways of weeding out harmful variations, from antler fights to spermatozoa competition. And that's just if they have the four children. The provided 1 bit/generation figure isn't an upper bound, either.

Life spends a lot of time in non-equilibrium states as well, and those are the states in which evolution can operate most quickly.

"But basically, the 1 bit/generation bound is information-theoretic; it applies, not just to any species, but to any self-reproducing organism, even one based on RNA or silicon. The specifics of how information is utilized, in our case DNA -> mRNA -> protein, don't matter."

OK, and I'm familiar with information theory (less so with evolutionary biology, but I understand the basics) but I'm thinking that the 1 bit/generation bound is -- pardon the pun -- a bit misleading, since:

  1. A lot -- I mean a lot -- of crazy assumptions are made without

... (read more)

"But basically, the 1 bit/generation bound is information-theoretic; it applies, not just to any species, but to any self-reproducing organism, even one based on RNA or silicon. The specifics of how information is utilized, in our case DNA -> mRNA -> protein, don't matter."

OK, and I'm familiar with information theory (less so with evolutionary biology, but I understand the basics) but I'm thinking that the 1 bit/generation bound is -- pardon the pun -- a bit misleading, since:

  1. A lot -- I mean a lot -- of crazy assumptions are made without

... (read more)

David MacKay did a paper on this. Here's a quote from the abstract:

If variation is produced by mutation alone, then the entire population gains up to roughly 1 bit per generation. If variation is created by recombination, the population can gain O(G^0.5) bits per generation.

G is the size of the genome in bits.

[-]Fly200

I've been enjoying your evolution posts and wanted to toss in my own thoughts and see what I can learn.

"Our first lemma is a rule sometimes paraphrased as "one mutation, one death"."

Imagine that having a working copy of gene "E" is essential. Now suppose a mutation creates a broken gene "Ex". Animals that are heterozygous with "E" and "Ex" are fine and pass on their genes. Only homozygous "Ex" "Ex" result in a "death" that removes 2 mutations.

Now imagine that a duplic... (read more)

"But mammals have many ways of weeding out harmful variations, from antler fights to spermatozoa competition. And that's just if they have the four children. The provided 1 bit/generation figure isn't an upper bound, either."

Read a biology textbook, darn it. The DNA contents of a sperm have negligible impact on the sperm's ability to penetrate the egg. As for antler fights, it doesn't matter how individuals are removed from the gene pool. They can only be removed at a certain rate or else the species population goes to zero. Note than nonreproduc... (read more)

OK, I posted the following update to my blog entry:

Rereading the last few paragraphs of Eliezer's post, I see that he actually argues for his central claim -- that the human genome can’t contain more than 25MB of "meaningful DNA" -- on different (and much stronger) grounds than I thought! My apologies for not reading more carefully.

In particular, the argument has nothing to do with the number of generations since the dawn of time, and instead deals with the maximum number of DNA bases that can be simultaneously protected, in steady state, against... (read more)

Read a biology textbook, darn it. The DNA contents of a sperm have negligible impact on the sperm's ability to penetrate the egg.

Defective sperm - which are more-than-normally likely to be carry screwed-up DNA - is far less likely to reach the egg, and far less likely to penetrate it before a fully functional spermatozoan does. It's a weeding-out process.

As for antler fights, it doesn't matter how individuals are removed from the gene pool.
Of course it does! Just not to the maximum-bit-rate argument.
Yes, but they must be balanced by states where i
... (read more)

Eliezer, sorry for spamming, but I think I finally understand what you were getting at.

Von Neumann showed in the 50's that there's no in-principle limit to how big a computer one can build: even if some fraction p of the bits get corrupted at every time step, as long as p is smaller than some threshold one can use a hierarchical error-correcting code to correct the errors faster than they happen. Today we know that the same is true even for quantum computers.

What you're saying -- correct me if I'm wrong -- is that biological evolution never discovered thi... (read more)

Scott said: "25MB is enough for pretty much anything!"

Have people tried to measure the complexity of the 'interpreter' for the 25MB of 'tape' of DNA? Replication machinery is pretty complicated, possibly much more so than any genome.

Actually, Scott Aaronson, something you said in your second to last post made me think of another reason why the axiom "one mutation, one death" may not be true. Actually, it's just an elaberation of the point I made earlier but I thought I'd flesh it out a bit more.

The idea is that the more physically and mentally complex, and physically larger, a species gets, the more capable is it is of coping with detrimental genes and still surviving to reproduce. When you're physically bigger, and smarter, there's more 'surplus' resources to draw upon to h... (read more)

"Defective sperm - which are more-than-normally likely to be carry screwed-up DNA - is far less likely to reach the egg,"

Then the DNA never gets collected by researchers and included in the 10^-8 mutations/generation/base pair figure. If the actual rate of mutations are higher, but the non-detected mutations are weeded out, you still get the exact same result as if the rate of mutations is lower with no weeding-out.

"Of course it does! Just not to the maximum-bit-rate argument."

True.

"No, they mustn't. They can theoretically be kept ... (read more)

A mammalian gene pool can acquire at most 1 bit of information per generation.

Eliezer,

That's a very provocative, interestingly empirical, yet troublingly ambiguous statement. :)

I think it's important to note that evolution is very effective (within certain constraints) in figuring out ways to optimize not only genes but also genomes-- it seems probable that a large amount of said "bits" have been on the level of structural or mechanical optimizations.

These structural/mechanical optimizations might in turn involve mechanisms by which to use existi... (read more)

Wiseman, if it's true that (1) copying DNA inherently incurs a 10^-8 probability of error per base pair, and that (2) evolution hasn't invented any von-Neumann-type error-correction mechanism, then all the objections raised by you and others (and by me, earlier!) are irrelevant.

In particular, it doesn't matter how capable a species is of coping with a few detrimental mutations. For if the mutation rate is higher than what natural selection can correct, the species will just keep on mutating, from one generation to the next, until the mutations finally do ... (read more)

Aaronson: What you're saying -- correct me if I'm wrong -- is that biological evolution never discovered this fact [error-correcting codes].

You're not wrong. As you point out, it would halt all beneficial mutations as well. Plus there'd be some difficulty in crossover. Can evolution ever invent something like this? Maybe, or maybe it could just invent more efficient copying methods with 10^-10 error rate. And then a billion years later, much more complex organisms would abound. All of this is irrelevant, given that DNA is on the way out in much less... (read more)

I think that a subset of Fly's objections may be valid, especially the ones about sexual selection concentrating harmful mutations in a small subset of the population. This could plausibly increase the number of bits by a significant factor. OTOH, 25M is an upper bound, so the actual number of bits could easily still be less.

Great point about evolution not discovering hierarchical error-correcting code Scott A. Chris Phoenix frequently makes similar points about molecular nanotechnology in response to its critics.

Regarding the earlier posts point about ... (read more)

Scott A. I wasn't suggesting DNA would magically not mutate after it had evolved towards sophistication, only that the system of genes/DNA that govern a system would become robust enough so it would be immune to the effects of the mutations.

Anway, evolution does not have to "correct" these mutations, as long as the organism can survive with them, they have as much a chance of mutating to a neutral, positive, or other equally detremental state as it has of becoming worse. As a genome becomes larger and larger, it can cope with the same ratio of mu... (read more)

Wiseman, let M be the number of "functional" base pairs that get mutated per generation, and let C be the number of those base pairs that natural selection can affect per generation. Then if M>>C, the problem is that the base pairs will become mostly (though not entirely) random junk, regardless of what natural selection does. This is a point about random walks that has nothing to do with biology.

To illustrate, suppose we have an n-bit string. At every time step, we can change one of the bits to anything we want, but then two bits get ch... (read more)

"'Life spends a lot of time in non-equilibrium states as well, and those are the states in which evolution can operate most quickly.'

Yes, but they must be balanced by states where it operates more slowly. You can certainly have a situation where 1.5 bits are added in odd years and .5 bits in even years, but it's a wash: you still get 1 bit/year long term."

This seems to contradict your earlier assertion that the 1 bit/generation rate is "an upper bound, not an average." It seems to me to be more analogous to a roulette wheel or the Secon... (read more)

With sufficiently large selection populations, it's not clear to me how anything could be better than natural selection, since natural selection is what the system is trying to beat. Any model of natural selection will necessarily contain inaccuracies.

So here's my question: Can you actually do asymptotically better than natural natural selection by applying an error-correcting code that doesn't hamper beneficial mutations?

In principle, yes. In a given generation, all we want is a mutation rate that's nonzero, but below the rate that natural selection can correct. That way we can maintain a steady state indefinitely (if we're indeed at a local optimum), but still give beneficial mutations a chance to take over.

Now with DNA, the mutation rate is fixed at ~10^-8. Since we need to be able to weed out bad... (read more)

Fly: Imagine that having a working copy of gene "E" is essential. Now suppose a mutation creates a broken gene "Ex". Animals that are heterozygous with "E" and "Ex" are fine and pass on their genes. Only homozygous "Ex" "Ex" result in a "death" that removes 2 mutations.

Now imagine that a duplication event gives four copies of "E". In this example an animal would only need one working gene out of the four possible copies. When the rare "Ex" "Ex" "Ex&quo... (read more)

Eliezer, could you provide a link to this result? Something looks wrong about it.

Fisher's fundamental theorem of natural selection says the rate of natural selection is directly proportional to the variance in additive fitness in the population. At first sight that looks incompatible with your result.

You mention a site with selection at 0.01%. This would take a very long time for selection to act, and it would require that there not be stronger selection on any nearby linked site. It seems implausible that this site would have been selected before, with th... (read more)

"Now with DNA, the mutation rate is fixed at ~10^-8."

Well no, it isn't. Not to get too complicated, usually the mutation rate is lower than that, but occasionally things happen that bring the mutation rate rather higher. We have things like DNA repair mechanisms that are mutagenic and others that are less so, and when the former get turned on we get a burst of mutations.

"Since we need to be able to weed out bad mutations, this imposes an upper bound of ~10^8 on the number of functional base pairs."

Definitely no more than 10^8 sites that... (read more)

If a species can deal with detrimental mutations for several generations, then that simply means that the species has more time to weed out those really bad mutations, making the "one mutation, one death" equation inadequate to describe the die off rate based purely on the mutation rate. Yes, new mutations pop up all the time, but unless those mutations directly add on to the detrimental effects of previous mutations, the species still will survive another generation.

To add on to my other argument that we "know too little" to make hard ... (read more)

A comment from Shtetl-Optimized discussion:

It’s actually a common misconception that biological systems should have mechanisms that allow a certain number of mutations for the purpose of accruing beneficial adaptations. From an evolutionary perspective, all genes should favor the highest possible fidelity copies. Any genes that have any lower copying fidelity will necessarily have fewer copies in future generations and thus lose the evolutionary game to a gene with higher copying fidelity.

Remember, folks, evolution doesn't work for the good of the species,... (read more)

From an evolutionary perspective, all genes should favor the highest possible fidelity copies.

Hmm... Suppose there are two separated populations, identical except that one has a gene that makes the mutation rate negligibly low. Naturally the mutating population will acquire greater variation over time. If the environment shifts, the homogeneous population may be wiped out but part of the diverse population may survive. So in this case, lower-fidelity copying is more fit in the long run. This is highly contrived, of course.

Disagree. Any genome that has lower copy fidelity will only be removed from the gene pool if the errors in copy actually make the resultant organism unable to survive and reproduce, otherwise it's irrelevant how similar the copied genese are to the original. If the copy error rate produces detrimental genes at a rate that will not cause the species to go extinct, it will allow for any benificial mutations to arise and spread themselves throughout the gene pool at 'leisure'. As long as those positive genese are attached to a genome structure which produces ... (read more)

Remember, folks, evolution doesn't work for the good of the species, and there's no Evolution Fairy who tries to ensure its own continued freedom of action. It's just a statistical property of some genes winning out over others.

Right, but if the mutation rate for a given biochemistry is itself relatively immutable, then this might be a case where group selection actually works. In other words, one can imagine RNA, DNA, and other replicators fighting it out in the primordial soup, with the winning replicator being the one with the best mutation properties.

[-]Fly200

Eliezer: "Fly, you've just postulated four copies of the same gene, so that one death will remove four mutations. But these four copies will suffer mutations four times as often. Unless I'm missing something, this doesn't increase the bound on how much non-redundant information can be supported by one death. :)"

Yeah, you are right. You only gain if the redundancy means that the fitness hit is sufficiently minor that more than four errors could be removed with a single death.

The "one death, one mutation" rule applies if the mutation imme... (read more)

[-]Taka00

"On average all but 2 children must either die or fail to reproduce. Otherwise the species population very quickly goes to zero or infinity."

A population of infinity is of course non-existing. An "infinity" population is not just a mathematical impossibility. What you forget to take into account is that a growing population changes the conditions of the population, and changes selection pressure.

Furthermore you consider evolution of just a single species. But all species are considered to be descendants of the same LUCA (Last Universal... (read more)

Taka, if you don't draw conclusions from simplified models, then you can't make any decisions ever.

[-]Taka00

So let me be more concrete. Because every model is a simplification. What I mean to say is that the model used here, is far too simple to draw conclusions.

The central statement of this entry is "There's a limit on how much complexity an evolution can support against the degenerative pressure of copying errors".

In order to check the model, the statement should be quantified, so it can be matched with measurements. Maybe something like "the genome of a species can have maximally 50k genes". That requires that the model should be enhanced.... (read more)

"This increases the potential number of semi-meaningful bases (bases such that some mutations have no effect but other mutations have detrimental effect) but cancels out the ability to store any increased information in such bases."

If 27% of all mutations have absolutely no effect, the "one mutation = one death" rule is broken, and so more information can be stored because the effective mutation rate is lower (this also means, of course, that the rate of beneficial mutations is lower). So it may be a 40 MB bound instead of a 25 MB bound... (read more)

Most of our DNA is shared with all eukariots, so it was evolved before mammals existed.

MacKay's paper talks about gaining bits as in bits on a hard drive

I don't think MacKay's paper even has a coherent concept of information at all. As far as I can tell, in MacKay's model, if I give you a completely randomized 100 Mb hard drive, then I've just given you 50 Mb of useful information, because half of the bits are correct (we just don't know which ones.) This is not a useful model.

Rolf,

If you look at equation 3 of MacKay's paper, you'll see that he defines information in terms of frequency of an allele in a population, so you'd have to provide a whole population of randomized hard drives, and if you did so, the population would have zero information.

First, there is the correct point that our mutation rate has been at a steady decline - the first couple of billion years had a much higher rate of data encoding than the last couple of billion years, of which, the former had a much higher.

Second, there is the point that a significant portion of pregnancies are failures - we could possibly double the rate of data encoding from that alone, presuming all of one of those bits is improvement on genetic repair and similar functionality. (Reducing mutation rates of critical genes.)

Third, multiple populations co... (read more)

If you look at equation 3 of MacKay's paper, you'll see that he defines information in terms of frequency of an allele in a population

I apologize, my statement was ambiguous. The topic of Eliezer's post is how much information is in an individual organism's genome, since that's what limits the complexity of a single organism, which is what I'm talking about.

Equation 3 addresses the holistic information of the species, which I find irrelevant to the topic at hand. Maybe Alice, Bob, and Charlie's DNA could together have up to 75 MB of data in some holographi... (read more)

Humans can do things that evolutions probably can't do period over the expected lifetime of the universe. As the eminent biologist Cynthia Kenyon once put it at a dinner I had the honor of attending, "One grad student can do things in an hour that evolution could not do in a billion years." According to biologists' best current knowledge, evolutions have invented a fully rotating wheel on a grand total of three occasions.

FYI, God did not design humans, we are all naturally evolved. Evolution can and has indeed designed lots of fully rotating ... (read more)

OK, Let me make my point clearer, why we can't calculate the actual complexity limit of working DNA:

1.) Not all mutations are bad. Accepted knowledge: most are simply neutral, a few are bad, and even a fewer are good.
2.) If the mutations are good or neutral, they should effectivly be subtracted from the mutation rate, as they do not contribute to the "one mutation, one death" axiom because good/neutral mutations do not increase death probability.
3.) The mutations will not accumulate either, over many generations, if they are good/neutral. If a ... (read more)

Anything that a human can do, natural selection can do, by definition.

Ah, yes, the old "Einstein's mother must have been one heck of a physicist" argument, or "Shakespeare only wrote what his parents and teachers taught him to write: Words."

Even in the sense of Kolmogorov complexity / algorithmic information, humans can have complexity exceeding the complexity of natural selection because we are only a single one out of millions of species to have ever evolved.

And the things humans "do" are completely out of character for the... (read more)

Rolf,

Would you agree that the information-theoretic increase in the amount of adaptive data in a single organism is still limited by O(1) bits in Mackay's model?

I can't really process this query until you relate the words you've used to the math MacKay uses, i.e., give me some equations. Also, Eliezer is pretty clearly talking about information in populations, not just single genomes. For example, he wrote, "This 1 bit per generation has to be divided up among all the genetic variants being selected on, for the whole population. It's not 1 bit pe... (read more)

Even if most mutations is neutral, that just says that most of the genome don't contain any information. If you flip a base and it doesn't make any difference, then you've just proved that it was junk-DNA, right?

Hi Erik,

It's not junk DNA, it merely has usefulness in many different configurations. Perhaps if the mutation would be to skip a base pair entirely, rather than just mis-copy it, it would be more likely to be detrimental.