Speed Limit and Complexity Bound For Evolution (post summary)

(This page summarizes a discussion on the blog.)

A Yudkowsky post of November 4th, 2007, Natural selection's speed limit and complexity bound, tried to argue mathematically that there could be at most 25MB of meaningful information (or thereabouts) in the human genome, but computer simulations failed to bear out the mathematical argument. It does seem probable that evolution has some kind of speed limit and complexity bound - eminent evolutionary biologists seem to believe it, and in fact the Genome Project discovered only 25,000 genes in the human genome - but this particular math may not be the correct argument.

The first lemma of Yudkowsky's argument was the idea that if, say, 2 parents have 16 children, and on average only 2 of those children survive, then 1 out of 8 children survive, which corresponds to 3 bits of information-theoretical information. This part of the argument is taken from R. P. Worden's paper A Speed Limit For Evolution ¹ but would not seem to agree with the computer simulation in question, so it is possible that Worden imposed additional conditions (or perhaps the paper itself is wrong). According to Worden's paper, this is an evolutionary limit on the whole species - if the average surviving child is part of a litter of 16, then the 3-bit bound on information accumulated is not per couple but for the species as a whole. In general, Worden speaks of a species accumulating at most O(1) bits of information per generation.

The second lemma of Yudkowsky's argument is a well-known principle known as "one mutation, one death" which states that deleterious mutations (and the vast majority of mutations are deleterious) cause an equal number of deaths in the gene pool, whether the mutation is very harmful to an individual or only slightly harmful. At equilibrium, deleterious mutations must be eliminated from the gene pool at the same rate they are introduced: each event in which a copying error creates a mutation, must be eliminated by an extra death of an individual bearing that mutation. If a mutation is only very slightly deleterious - if it only kills one out of ten thousand bearers, say (or prevents one out of ten thousand children from being born) - then the mutation will spread farther before causing the deaths that prevent the mutation from spreading further. (This is a very disheartening Malthusian principle in general; if you invent glasses to make nearsightedness less dangerous, then more people will become nearsighted and the total danger will go back up.)

From the "one mutation, one death" lemma, Yudkowsky argued that each meaningful DNA base would require around the same amount of selection pressure to support its continued existence. Worden calculates a selection pressure of O(1) bits per generation in mammals, and the mutation rate in mammals is 10^-8 errors per base per generation. From this, Yudkowsky argued that at most 10^8 DNA bases = 25 megabytes of meaningful information, could be sustained by mammalian evolution against the degenerative pressure of mutation.

The idea of an upper bound on the sustainable information in a genome, and that mammals are already at this upper bound and have probably been there for tens of millions of years if not longer, is not original to Yudkowsky; it is found for example in George Williams's Adaptation and Natural Selection² (and duly credited). Indeed the essential idea goes back to Kimura, or even Fisher. Yudkowsky's novelty was to attempt to calculate the bound.

Although the actual Genome Project's finding of 25,000 genes fits well under Yudkowsky's attempted bound, the mathematical argument failed. A computer simulation failed to bear out the bound, and the flaw appears to have been as follows: Even if one mutation creates one death, this does not mean that one death eliminates only a single mutation. Organisms bearing more deleterious mutations are more likely to lose the evolutionary competition, and so each death can eliminate more mutations than average. If mating is random and the least fit organisms are perfectly eliminated in every generation, the information supportable in the genome goes as the inverse square of the mutation rate.

(Why? This may be a bit difficult to visualize. Roughly, if the average number of deleterious mutations in the gene pool is N, and mating is random, then the difference between the average number of mutations in the 'upper half' of the population, and the number of mutations in the 'lower half' of the population, is around sqrt(N). So eliminating the lower half of the population decreases the average number of mutations by sqrt(N). So at equilibrium, the mutation rate can introduce sqrt(N) mutations and be eliminated from the gene pool at the same rate, with N mutations total. Thus the supportable information goes as the inverse square of the mutation rate, under these assumptions and as borne out by computer simulation. A similar argument shows that with many beneficial mutations being introduced, and with random mating and perfect selection, information can be gained in the genome at a rate which goes as the square root of the number of incoming beneficial mutations (which would go as the square root of the genome size). It is possible that Worden made other assumptions which rule out this scenario.)

On the whole, the state of the argument can be described as follows: Fisher proposed that there would be a limit to how much selection a population could sustain without going extinct. Kimura suggested that mutation rates would determine the sustainable genetic information. Williams argued that most organisms would long since have hit this upper bound for their species, and that there was no reason to believe that mammals were more complex than e.g. dinosaurs. The actual Human Genome Project found only 25,000 protein-coding genes in the human genome, and humans are around 95% genetically similar to chimpanzees after five million years of divergence. Worden calculated O(1) bits of information absorbable per generation (which must either have been wrong, or required some additional assumptions). Yudkowsky's attempt to calculate an upper bound of 25MB of information in the human genome was contradicted by computer simulation, but this does not mean the actual information is greater than this. If we assume that only Yudkowsky's attempt at a novel contribution actually failed and that we should go on trusting the other authors, then on the whole it is still safe to say that evolutionary biologists think there are speed limits and complexity bounds - just not that the complexity bound is 25MB calculated the way Yudkowsky tried to calculate it.

References

Worden, R. P. 1995. A Speed Limit For Evolution. Journal of Theoretical Biology, 176, pp. 137-152. Online at http://dspace.dial.pipex.com/jcollie/sle/.↩
Williams, G. C. 1966. Adaptation and Natural Selection. Princeton, NJ: Princeton.↩