Here's an equation for the MMLA vs Loss plot:
A MMLA = 100% corresponds to a loss of 1.8304. Using the scaling laws, listed here, this can be reached using:
To strongman a counter to the fox-rabbit-toaster argument: when you make a toaster, you do make a part to prevent electricity from getting to the coils. In fact, you make several of them. First, you make a timer to shut the toaster off when it's done. You also make a fuse, an some over-heating shut-off switch. These prevent a run-away situation that would cause the toaster to burn down your house. So an advocate of a god who makes both rabbits and foxes could point out that foxes and rabbits make a self-regulating population control system, and that would follow the same sort of engineering logic as putting a fuse on a toaster.
Calling evolution uninteligent is foolish egocentrism. This is a process that has produces a world of nanotech-based creatures of astronomical complexity, using the entire surface of the Earth as a giant quantum computer. The simplest element of biological creation--the behavior of a single protein--requires computation at or beyond the limit of humanity's computing capability. It's staggering to me that all the structured computation and creative outpouring of evolution gets called "uninteligent" despite producing creatures of complexity vastly beyond the capability of the human mind to comprehend. It is a mind, and an (super) intelligent one by virtue of this ability to create complex structures through astronomical volumes of structured computation, but it is not an intelligence like ours.
It is our concept of intelligence that is deficient in not accommodating other minds so different from our own. Indeed, evolution isn't going to score high on any IQ test (except by way of human beings which it has created), because we designed measures of intelligence as ways of measuring ourselves. Not calling evolution a mind requires the egocentric stance that a mind must work like ours or it is not a mind.
Indeed, a human being would have designed a better retina since we have different faculties than evolution. But if the tables were turned, we could be labeled unintligent non-minds for not being able intuitively imagine protein folding or the various other computational faculties that evolution has and we do not.
Data seems to be a bottleneck, so we should expect the number of model parameters to run high to compensate.
Note, that a MMLU of 100% should be achievable using a model the same size as Megatron-Turing NLG, and a data only 2.1x more data than GPT-4, which should be achievable in the near term.