Against easy superintelligence: the unforeseen friction argument

by Stuart_Armstrong5 min read10th Jul 201348 comments

25

Personal Blog

In 1932, Stanley Baldwin, prime minister of the largest empire the world had ever seen, proclaimed that "The bomber will always get through". Backed up by most of the professional military opinion of the time, by the experience of the first world war, and by reasonable extrapolations and arguments, he laid out a vision of the future where the unstoppable heavy bomber would utterly devastate countries if a war started. Deterrence - building more bombers yourself to threaten complete retaliation - seemed the only counter.

And yet, things didn't turn out that way. Against all past trends, the light fighter plane surpassed the heavily armed bomber in aerial combat, the development of radar changed the strategic balance, and cities and industry proved much more resilient to bombing than anyone had a right to suspect.

Could anyone have predicted these changes ahead of time? Most probably, no. All of these ran counter to what was known and understood, (and radar was a completely new and unexpected development). What could and should have been predicted, though, was that something would happen to weaken the impact of the all-conquering bomber. The extreme predictions would be unrealistic; frictions, technological changes, changes in military doctrine and hidden, unknown factors, would undermine them.

This is what I call the "generalised friction" argument. Simple predictive models, based on strong models or current understanding, will likely not succeed as well as expected: there will likely be delays, obstacles, and unexpected difficulties along the way.

I am, of course, thinking of AI predictions here, specifically of the Omohundro-Yudkowsky model of AI recursive self-improvements that rapidly reach great power, with convergent instrumental goals that make the AI into a power-hungry expected utility maximiser. This model I see as the "supply and demand curve" of AI prediction: too simple to be true in the form described.

But the supply and demand curves are generally approximately true, especially over the long term. So this isn't an argument that the Omohundro-Yudkowsky model is wrong, but that it will likely not happen as flawlessly as described. Ultimately, the "bomber will always get through" turned out to be true: but only in the form of the ICBM. If you take the old arguments and replace "bomber" with "ICBM", you end with strong and accurate predictions. So "the AI may not foom in the manner and on the timescales described" is not saying "the AI won't foom".

Also, it should be emphasised that this argument is strictly about our predictive ability, and does not say anything about the capacity or difficulty of AI per se.

Why frictions?

An analogy often used for AI is that of the nuclear chain reaction: here is a perfect example of a recursive improvement, as the chain reaction grows and grows indefinitely. Scepticism about the chain reaction was unjustified, though experts were far too willing to rule it out ahead of time, based on unsound heuristics.

In contrast, many examples of simple models were slowed or derailed by events. The examples that came immediately to mind, for me, were the bomber example, the failure of expansion into space after the first moon landing, and the failure of early AI predictions. To be fair, there are also examples of unanticipated success, often in economic policy, but even in government interventions. But generally, dramatic predictions fail, either by being wrong or by being too optimistic on the timeline. Why is this?

Beware the opposition

One reason that predictions fail is because they underestimate human opposition. The bomber fleets may have seemed invincible, but that didn't take into account that large number of smart people were working away to try and counter them. The solution turned out to be improved fighters and radar; but even without knowing that, it should have been obvious some new methods or technologies were going to be invented or developed. Since the strength of the bomber depended on a certain strategic landscape, it should have been seen that deliberate attempts to modify that landscape would likely result in a reduction of the bomber's efficacy.

Opposition is much harder to model, especially in such a wide area as modern warfare and technology. Still, theorisers should realised that there would have been some opposition, and that, historically, ways have been found to counter most weapons, in ways that were not obvious at the time of the weapon's creation. It is easier to change the strategic landscape than to preserve it, so anything that depends on the current strategic landscape will most likely be blunted by human effort.

This kind of friction is less relevant to AI (though see the last section), and not relevant at all to the chain reaction example: there are no fiendish atoms plotting how to fight against human efforts to disintegrate then.

If noise is expected, expect reduced impact

The second, more general, friction argument, is just a rephrasing of the truism that things are rarely as easy as they seem. This is related to "the first step fallacy", the argument that just because we can start climbing a hill, doesn't mean we can reach the sky.

Another way of phrasing it is in terms of entropy or noise: adding noise to a process rarely improves it, and almost always makes it worse. Here the "noise" is all the unforeseen and unpredictable details that we didn't model, didn't (couldn't) account for, but that would have their bearing on our prediction. These details may make our prediction more certain or faster, but they are unlikely to do so.

The sci-fi authors of 1960 didn't expect that we would give up on space: they saw the first steps into space, and extrapolated to space stations and martian colonies. But this was a fragile model, dependent on continued investment in space exploration, and assuming there would be no setbacks. But changes in government investment and unexpected setbacks were not unheard of: indeed, they were practically certain, and would have messed up any simplistic model.

Let us return to chain reactions. Imagine that an alien had appeared and told us that our theory of fission was very wrong, that there were completely new and unexpected phenomena that happened at these energies that we hadn't yet modelled. Would this have increased or decreased the likelihood of a chain reaction? This feels like it can only decrease it: the chain reaction depended on a feedback loop, and random changes are more likely to break the loop than reinforce it. Now imagine that the first chain reaction suffered not from an incomplete theory, but from very sloppy experimental proceeding: now we're nearly certain we won't see the chain reaction, as this kind of noise degraded very strongly towards the status quo.

So why then, were the doubters wrong to claim that the chain reaction wouldn't work? Because we were pretty certain at the time that these noises wouldn't materialise. We didn't only have a theory that said we should expect to see a chain reaction, barring unexpected phenomena; we had a well-tested theory that said we should not expect to see unexpected phenomena. We had an anti-noise theory: any behaviour that potentially broke the chain reaction would have been a great surprise. Assuming a minimum of competence of the experimenters (a supposition backed up by history), success was the most likely outcome.

Contrast that with AI predictions: here, we expect noise. We expect AI to be different from our current models, we expect developments to go in unpredictable directions, we expect to see problems that are not evident from our current vantage point. All this noise is likely to press against our current model, increasing its uncertainty, extending its timeline. Even if our theory was much more developed that it is now, even if we thought about it for a thousand years and had accounted for every eventuality we could think of, if we expect that there is still noise, we should caveat our prediction.

Who cares?

Right, so we may be justified in increasing our uncertainty about the impact of AI foom, and in questioning the timeline. But what difference does it make in practice? Even with all the caveats, there is still a worryingly high probability of a fast, deadly foom, well worth putting all our efforts into preventing. And slow, deadly fooms aren't much better, either! So how is the argument relevant?

It becomes relevant in accessing the relative worth of different interventions. For instance, one way of containing an AI would be to build a community of fast uploads around it: with the uploads matching the AI in reasoning speed, they have a higher chance of controlling it. Or we could try and build capacity for adaptation at a later date: if the AIs have a slow takeoff, it might be better to equip the people of the time with the tools to contain it (since they will have a much better understanding of the situation), rather than do it all ahead of time. Or we could try and build Oracles or reduced impact AIs, hoping that we haven't left out anything important.

All these interventions share a common feature: they are stupid to attempt in the case of a strong, fast foom. They have practically no chance of working, and are just a waste of time and effort. If, however, we increase the chances of weaker, slower fooms, then they start to seem more attractive - possibly worth putting some effort into, in case the friendly AI approach doesn't bear fruits in time.

Personal Blog

25