An underlying generator of many recent disagreements appears to be differing locations of the disagreeing individuals on the Yudkowsky-Hanson spectrum. That is, differences in individuals' distributions over the background variable "how does an optimizer's ability to optimize scale as you apply (meta) optimization to it". I'm going to simplify my discussion by boiling this variable down to an "expected steepness" (as opposed to a distribution over optimization-in vs. optimization-out curves).

Eliezer believes this relationship to be steeper than do many others in the alignment sphere. For instance, many (~half by my count) of the disagreements listed in Paul's recent post appear to pretty directly imply that Paul believes this relationship is much less steep than does Eliezer. It seems likely to me that this difference is the primary generator of those disagreements.

Eliezer has previously suggested formalizing this "optimization-in vs. optimization-out" relationship in Intelligence Explosion Microeconomics. Clearly this is not such an easy thing, or he probably would have just done it himself. Nonetheless this may be a pathway towards resolving some of these disagreements.

So, how do optimizers scale with applied optimization?

I'll quickly give my two cents on the matter by noting that we live in a world where the best mathematicians are, quite literally, something like 1000 times as productive as the average mathematicians (keeping in mind this gap only spans about a quarter of the range of variation of mathematical ability in the modern human population; about +3 to +6 stdevs).

This huge difference in observed performance occurs despite the minds in question having about the same amount of hardware (brain size) built out of components running at about the same speed (neurons) executing the same basic algorithm (genetic differences are tiny compared to size of genome). The difference appears to be largely a result of small algorithmic tweaks.

This leads me to the conclusion that, if you take a ~human level AGI and optimize it a little bit further, you probably end up with something pretty strongly superhuman. It also leads me to conclude that it's hard to build a ~human level AGI in the first place because it's a small target on the capability axis: you'll likely just accidentally blow past it, and even if you can hit it, someone else will just blow past it soon thereafter.

New to LessWrong?

New Comment
9 comments, sorted by Click to highlight new comments since: Today at 9:45 AM

executing the same basic algorithm (genetic differences are tiny compared to size of genome).

This seems moderately misleading. People start out nearly the same, but apply their algorithm to somewhat different domains. Running a year of human-level compute on different data should be expected to produce much more divergent results than is captured by the genetic differences.

Specialization on different topics likely explains much more than algorithmic tweaks explain.

Specialization on different topics likely explains much more than algorithmic tweaks explain.

That the very best mathematicians are generally less specialized than their more average peers suggests otherwise.

I had in mind an earlier and somewhat more subtle type of specialization, along the lines of what Henrich discusses in WEIRDest People.

An example is that people who learn to read at an early age tend to have poorer facial recognition, and more of the abstract cognitive skills that are measured by IQ test. This kind of difference likely alters a nontrivial amount of learning over a period of 15 or so years before people start thinking about specializations within higher math.

It's certainly plausible that something like this pumps in quite a bit of variation on top of the genetics, but I don't think it detracts much from the core argument: if you push just a little harder on a general optimizer, you get a lot more capabilities out.

There are other reasons why top mathematicians could have better output compared to average mathematicians. They could be working on more salient problems, there's selection bias in who we call a "top mathematician", they could be situated in an intellectual microcosm more suitable for mathematical progress, etc.

Also, there's a question of how directly you can apply optimization pressure towards a given goal. When you select the best of n mathematicians by highest long run net output, it's true that you're only applying log(n) bits of optimization pressure towards high long run net output, and so the fact that this results in a lot of improvement does show that long run net output is, in some sense, "easy" to optimize for. However, those log(n) bits of optimization pressure are being directly applied towards that goal, and it's not easy to have a learning process that applies optimization pressure in a similarly direct manner (as opposed to optimizing for something like "ability to do well on this math problem dataset"). 

E.g., optimizing a GPT model for predictive loss does not directly optimize it for coherent text, as demonstrated by the fact that sampling from a GPT model can get you much more coherent text with only a handful of samples. It still cost millions of dollars to turn GPT-2 into GPT-3.

An analogy: it's clear that, if you take the "most aligned"[1] out of n humans, you get a human that's much more "aligned" than the average person. Does this mean it's also "easy" to optimize for alignment?[2]

Finally, AI Impacts has done a number of investigations into how long it took for AI systems to go from ~human level to better than human level in different domains. E.g., it took 10 years for diagnosis of diabetic retinopathy. I think this line of research is more directly informative on this question.

  1. ^

    I'm leaving the operationalization of "most aligned" deliberately vague. Most likely, humans will vary significantly along whatever axis you choose. 

  2. ^

    I actually think it does mean optimizing for alignment is "easy", in the sense that a few bits of optimization pressure directed purely towards alignment will go a long way in improving alignment. Again, that doesn't mean it's easy to actually direct optimization pressure in such a manner. (Though directing optimization pressure towards capabilities is probably easier than directing it towards alignment.)

There are other reasons why top mathematicians could have better output compared to average mathematicians. They could be working on more salient problems, there's selection bias in who we call a "top mathematician", they could be situated in an intellectual microcosm more suitable for mathematical progress, etc.

Do you really think these things contribute much to a factor of a thousand? Roughly speaking, what I'm talking about here is how much longer it would take for an average mathematician to reproduce the works of Terry Tao (assuming the same prior information as Terry had before figuring out the things he figured out, of course).

However, those log(n) bits of optimization pressure are being directly applied towards that goal, and it's not easy to have a learning process that applies optimization pressure in a similarly direct manner (as opposed to optimizing for something like "ability to do well on this math problem dataset"). 

I think Terry Tao would do noticeably much better on a math problem dataset compared to most other mathematicians! This is where it's important to note that "optimization in vs. optimization out" is not actually a single "steepness" parameter, but the shape of a curve. If the thing you're optimizing doesn't already have the rough shape of an optimizer, then maybe you aren't really managing to do much meta-optimization. In other words, the scaling might not be very steep because, as you said, it's hard to figure out exactly how to direct "dumb" (i.e. SGD) optimization pressure.

But suppose you've trained an absolutely massive model that's managed to stumble onto the "rough shape of an optimizer" and is now roughly human-level. It seems obvious to me that you don't need to push on this thing very hard to get what we would recognize as massive performance increases for the reason above: it's not very hard to pick out a Terry Tao from the Earth's supply of mathematicians, even by dumb optimization on a pretty simple metric (such as performance on some math dataset).

Finally, AI Impacts has done a number of investigations into how long it took for AI systems to go from ~human level to better than human level in different domains. E.g., it took 10 years for diagnosis of diabetic retinopathy. I think this line of research is more directly informative on this question.

I don't see this as very informative about how optimizers scale as you apply meta-optimization. If the thing you're optimizing is not really itself an optimizer (e.g. a narrow domain tool), then what you're measuring is more akin to the total amount of optimization you've put into it, rather than the strength of the optimizer you've produced by applying meta-optimization.

I think the "factor of a thousand" here is mostly mathematics itself being a very unusual field. If you can reach even a tiny bit further in some "concept-space" than others, for whatever reasons internal or external, then you can publish everything you find in that margin and it will all be new. If you can't, then you pretty much have to work on the diminishing unpublished corners of already reached concept-space and most will look derivative.

I would certainly expect AI to blow rapidly past human mathematicians at some point due to surpassing human "reach". Whether that would also enable breakthroughs in other various sciences that rely on mathematics remains to be seen. Advances in theoretical physics may well need new abstract mathematical insights. Technology and engineering probably does not.

Sure, having just a little bit more general optimization power lets you search slightly deeper into abstract structures, opening up tons of options. Among human professions, this may be especially apparent in mathematics. But that doesn't make it any less scary?

Like, I could have said something similar about the best vs. average programmers/"hackers" instead; there's a similarly huge range of variation there too. Perhaps that would have been a better analogy, since the very best hackers have some more obviously scary capabilities (e.g. ability to find security vulnerabilities).

It's definitely scary. I think it is somewhat less scary in general capabilities than for mathematics (and a few closely related fields) in particular. Most of the scary things that UFAI can do will - unlike mathematics - involve feedback cycles with the real world. This includes programming (and hacking!), science research and development, stock market prediction or manipulation, and targeted persuasion.

I don't think the first average-human level AIs for these tasks will be immediately followed by superhuman AIs. In the absence of a rapid self-improvement takeoff, I would expect a fairly steady progression through from average human capabilities (though with weird strengths and weaknesses), through increasingly rare human capability and eventually into superhuman. While ability to play chess is a terrible analogy for AGI, it did follow this sort of capability pattern. Computer chess programs were beating increasingly more skilled enthusiasts for decades before finally exceeding the top grandmaster capabilities.

In the absence of rapid AGI self improvement or a possible sudden crystallization of hardware overhang into superhuman AGI capability through software breakthrough, I don't much fear improvement curves in AI capability blowing through the human range in an eyeblink. It's certainly a risk, but not a large chunk of my total credence for extinction. Most of my weight is on weakly superhuman AGI being able to improve itself or successors into strongly superhuman AGI.