CEO at Redwood Research.
AI safety is a highly collaborative field--almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I'm saying this here because it would feel repetitive to say "these ideas were developed in collaboration with various people" in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.
Please contact me via email (bshlegeris@gmail.com) instead of messaging me on LessWrong.
I don't quite understand your argument here. Suppose you give people the choice between two chatbots, and they know one will cause them to deconvert, and the other one won't. I think it's pretty likely they'll prefer the latter. Are you imagining that no one will offer to sell them the latter?
I am also sympathetic to the higher level point. If we have AIs sufficiently aligned that we can make them follow the OpenAI model spec, I'd be at least somewhat surprised if governments and AI companies didn't find a way to craft model specs that avoided the problems described in this post.
I agree that it is unclear whether this would go the way I hypothesized here. But I think it's arguably the straightforward extrapolation of how America handles this kind of problem: are you surprised that America allows parents to homeschool their children?
Note also that AI companies can compete on how user-validating the models are, and in absence of some kind of regulation, people will be able to pay for models that have the properties they want.
I don't think it's obviously appropriate for AI companies to decide exactly what reflection process and exposure to society people who use their AIs need to have. And people might prefer to use AIs that do conform to their beliefs if they are allowed to buy access to them. So I think you might have to ban such AIs in order to prevent people from using them.
I try to address in my "analysis through swerving around obstacles" section.
The argument you're giving against the scenario I spell out here is "surely the AIs will instead be designed to promote a particular set of mandated policies on how you should reflect and update over time." I agree that that's conceivable. But it also seems kind of fucked up and illiberal.
I want to explore a scenario where there isn't a centrally enforced policy on this stuff, where people are able to use the AIs as they wish. I think that this is a plausible path society could take (e.g. it's kind of analogous to how you're allowed to isolate yourself and your family now).
I can't believe I screwed that up, lol.
I think both are big problems. Maybe I should have been clearer about the symmetry here. The thesis I care about here is pretty symmetrical between those problems.
My concern here is that maybe they would change their minds if they really reflected on it, but it's not clear that they will in fact do that reflection if they have to decide quickly whether to do so!
oh thanks, fixed. I just internally substitute "AI takeover" anytime anyone says "AI kills everyone" because this comes up constantly, and I'd forgotten that I'd done so here :P
I've seen physical copies around, but I actually haven't read it. It's possible that you're totally right, in which case I apologize and should have finished my review with "I'm an idiot for not realizing it was worth my time to read Uncontrollable so that I could recommend it to people".
I would appreciate a digital copy and audio copy if you wanted to email them to me! I'm not sure I'll consume it because I don't know if it's that decision-relevant to me.
It seems like you think there is some asymmetry between people talking to honest chatbots and people talking to dishonest chatbots that want to take all their stuff. I feel like there isn't a structural difference between those. It's going to be totally reasonable to want to have AIs watch over the conversations that you and/or your children are having with any other chatbot.