Did people say why they deferred to these people?
No, only asked respondents to give names
I think another interesting question to correlate this would be "If you believe AI x-risk is a severely important issue, what year did you come to believe that?".
Agree, that would have been interesting to ask
Things that surprised me about the results
Sorry for late, will be out this month!
Just wanted to say this is the single most useful thing I've read for improving my understanding of alignment difficulty. Thanks for taking the time to write it!
Part of me thinks: I was trying to push on whether it has a world model or rather has just memorised loads of stuff on the internet and learned a bunch of heuristics for how to produce compelling internet-like text. For me, "world model" evokes some object that has a map-territory relationship with the world. It's not clear to me that GPT-3 has that.
Another part of me thinks: I'm confused. It seems just as reasonable to claim that it obviously has a world model that's just not very smart. I'm probably using bad concepts and should think about this more.
It looks good to me!
This is already true for GPT-3
Idk, maybe...?
Re the argument for "Why internalization might be difficult", I asked Evan Hubinger for his take on your rendition of the argument, and he thinks it's not right.
Rather, the argument that Risks from Learned Optimization makes that internalization would be difficult is that:
Especially since this post is now (rightly!) cited in several introductory AI risk syllabi, it might be worth correcting this, if you agree it's an error.
Edit: or do you just mean that even though you take the same steps, the two feel different because retreating =/= going further along the wall
Yeah, this — I now see what you were getting at!
One argument for alignment difficulty is that corrigibility is "anti-natural" in a certain sense. I've tried to write out my understanding of this argument, and would be curious if anyone could add or improve anything about it.
I'd be equally interested in any attempts at succinctly stating other arguments for/against alignment difficulty.
Finally posted: https://www.lesswrong.com/posts/qccxb3uzwFDsRuJuP/deference-on-ai-timelines-survey-results