Wiki Contributions


I agree with you that the "stereotyped image of AI catastrophe" is not what failure will most likely look like, and it's great to see more discussion of alternative scenarios. But why exactly should we expect that the problems you describe will be exacerbated in a future with powerful AI, compared to the state of contemporary human societies? Humans also often optimise for what's easy to measure, especially in organisations. Is the concern that current ML systems are unable to optimise hard-to-measure goals, or goals that are hard to represent in a computerised form? That is true but I think of this as a limitation of contemporary ML approaches rather than a fundamental property of advanced AI. With general intelligence, it should also be possible to optimise goals that are hard-to-measure.

Similarly, humans / companies / organisations regularly exhibit influence-seeking behaviour, and this can cause harm but it's also usually possible to keep it in check to at least a certain degree.

So, while you point at things that can plausibly go wrong, I'd say that these are perennial issues that may become better or worse during and after the transition to advanced AI, and it's hard to predict what will happen. Of course, this does not make a very appealing tale of doom – but maybe it would be best to dispense with tales of doom altogether.

I'm also not yet convinced that "these capture the most important dynamics of catastrophe." Specifically, I think the following are also potentially serious issues:
- Unfortunate circumstances in future cooperation problems between AI systems (and / or humans) result in widespread defection, leading to poor outcomes for everyone.
- Conflicts between key future actors (AI or human) result in large quantities of disvalue (agential s-risks).
- New technology leads to radical value drift of a form that we wouldn't endorse.

Thanks for elaborating. There seem to be two different ideas:

1), that it is a promising strategy to try and constrain early AGI capabilities and knowledge

2), that even without such constraints, a paperclipper entails a smaller risk of worst-case outcomes with large amounts of disvalue, compared to a near miss. (Brian Tomasik has also written about this.)

1) is very plausible, perhaps even obvious, though as you say it's not clear how feasible this will be. I'm not convinced of 2), even though I've heard / read many people expressing this idea. I think it's unclear what would result in more disvalue in expectation. For instance, a paperclipper would have no qualms to threaten other actors (with something that we would consider disvalue), while a near-miss might still have, depending on what exactly the failure mode is. In terms of incidental suffering, it's true that a near-miss is more likely to do something about human minds, but again it's also possible the system is, despite the failure, still compassionate enough to refrain from this, or use digital anesthesia. (It all depends on what plausible failure modes look like, and that's very hard to say.)

Another risk from bugs comes not from the AGI system caring incorrectly about our values, but from having inadequate security. If our values are accurately encoded in an AGI system that cares about satisfying them, they become a target for threats from other actors who can gain from manipulating the first system.

I agree that this is a serious risk, but I wouldn't categorise it as a "risk from bugs". Every actor with goals faces the possibility that other actors may attempt to gain bargaining leverage by threatening to deliberately thwart these goals. So this does not require bugs; rather, the problem arises by default for any actor (human or AI), and I think there's no obvious solution. (I've written about surrogate goals as a possible solution for at least some parts of the problem).

the very worst outcomes seem more likely if the system was trained using human modelling because these worst outcomes depend on the information in human models.

What about the possibility that the AGI system threatens others, rather than being threatened itself? Prima facie, that might also lead to worst-case outcomes. Do you envision a system that's not trained using human modelling and therefore just wouldn't know enough about human minds to make any effective threats? I'm not sure how an AI system can meaningfully be said to have "human-level general intelligence" and yet be completely inept in this regard. (Also, if you have such fine-grained control over what your system does or does not know about, or if you can have it do very powerful things without possessing dangerous kinds of knowledge and abilities, then I think many commonly discussed AI safety problems become non-issues anyway, as you can just constrain the system acccordingly.)

Upvoted. I've long thought that Drexler's work is a valuable contribution to the debate that hasn't received enough attention so far, so it's great to see that this has now been published.

I am very sympathetic to the main thrust of the argument – questioning the implicit assumption that powerful AI will come in the shape of one or more unified agents that optimise the outside world according to their goals. However, given our cluelessness and the vast range of possible scenarios (e.g. ems, strong forms of biological enhancement, merging of biological and artificial intelligence, brain-computer interfaces, etc.), I find it hard to justify a very high degree of confidence in Drexler's model in particular.

I agree that establishing a cooperative mindset in the AI / ML community is very important. I'm less sure if economic incentives or government policy are a realistic way to get there. Can you think of a precedent or example for such external incentives in other areas?

Also, collaboration between the researchers that develop AI may be just one piece of the puzzle. You could still get military arms races between nations even if most researchers are collaborative. If there are several AI systems, then we also need to ensure cooperation between these AIs, which isn't necessarily the same as cooperation between the researchers that build them.

What exactly do you think we need to specify in the Smoking Lesion?