All of Disposable Identity's Comments + Replies

That aside, I'm not sure what argument you're making here.

I do not often comment on Less Wrong. (Although I am starting to, this is one of my first comment!)
Hopefully, my thoughts will become clearer as I write more, and get myself more acquainted with the local assumptions and cultural codes.

In the meanwhile, let me expand:

Two possible interpretations that come to mind (probably both of these are wrong):

  1. You're arguing that all humans in the world will refuse to build dangerous AI, therefore AI won't be dangerous.
  2. You're arguing that natural selection doesn
... (read more)

But 'alignment is tractable when you actually work on it' doesn't imply 'the only reason capabilities outgeneralized alignment in our evolutionary history was that evolution was myopic and therefore not able to do long-term planning aimed at alignment desiderata'.

I am not claiming evolution is 'not able to do long-term planning aimed at alignment desiderata'.
I am claiming it did not even try.

If you're myopically optimizing for two things ('make the agent want to pursue the intended goal' and 'make the agent capable at pursuing the intended goal') and one g

... (read more)

Many comparisons are made with Natural Selection (NS) optimizing for IGF, on the grounds that this is our only example of an optimization process yielding intelligence.
 

I would suggest considering one very relevant fact: NS has not optimized for alignment, but only for a myopic version of IGF. I would also suggest considering that humans have not optimized for alignment either.
 

Let's look at some quotes, with those considerations in mind:

And in the same stroke that its capabilities leap forward, its alignment properties are revealed to be shallow

... (read more)
2Rob Bensinger1y
We already know how to produce 'one intelligence not conquering the rest'. E.g., a human being is an intelligence that doesn't conquer the world. GPT-3 is an intelligence that doesn't conquer the world either. The problem is to build aligned AI that can do a pivotal act that ends the acute existential risk period, not just to build an AI that doesn't destroy the world itself. That aside, I'm not sure what argument you're making here. Two possible interpretations that come to mind (probably both of these are wrong): 1. You're arguing that all humans in the world will refuse to build dangerous AI, therefore AI won't be dangerous. 2. You're arguing that natural selection doesn't tell us how hard it is to pull off a pivotal act, since natural selection wasn't trying to do a pivotal act. 1 seems obviously wrong to me; if everyone in the world had the ability to deploy AGI, then someone would destroy the world with AGI. 2 seems broadly correct to me, but I don't see the relevance. Nate and I indeed think that pivotal acts are possible. Nate is using natural selection here to argue against 'AI progress will be continuous', not to argue against 'it's possible to use sufficiently advanced AI systems to end the acute existential risk period'.
5Rob Bensinger1y
I don't think the "which is why" claim here is true, if you mean 'this is the only reason'. 'Alignment is exactly as easy as capabilities if you're not myopic' seems like a claim that needs to be argued for positively. NS didn't optimize for humans to be good at biochemistry, nuclear physics, or chess, either. NS produces many things that it wasn't specifically optimizing for. One of the main things that Nate is pointing out in the OP is that alignment isn't on that list, even though a huge number of other things are. "NS doesn't produce things it didn't optimize for" is an overly general response, because it would rule out things like 'humans landing on the Moon'. This would obviously be an incredibly positive development, and would increase our success odds a ton! Nate isn't arguing 'when you actually try to do alignment, you can never make any headway'. But 'alignment is tractable when you actually work on it' doesn't imply 'the only reason capabilities outgeneralized alignment in our evolutionary history was that evolution was myopic and therefore not able to do long-term planning aimed at alignment desiderata'. Evolution was also myopic with respect to capabilities, and not able to do long-term planning aimed at capabilities desiderata; and yet capabilities generalized amazingly well, far beyond evolution's wildest dreams. If you're myopically optimizing for two things ('make the agent want to pursue the intended goal' and 'make the agent capable at pursuing the intended goal') and one generalizes vastly better than the other, this points toward a difference between the two myopically-optimized targets.