Richard Ngo writes
Since Stuart Russell's proposed alignment solution in Human Compatible is the most publicly-prominent alignment agenda, I should be more explicit about my belief that it almost entirely fails to address the core problems I expect on realistic pathways to AGI.
Specifying an update rule which converges to a desirable goal is just a reframing of the problem of specifying a desirable goal, with the "uncertainty" part as a red herring. https://arbital.com/p/updated_deference/… In other words, Russell gives a wrong-way reduction.
I originally included CIRL in my curriculum (https://docs.google.com/document/d/1mTm_sT2YQx3mRXQD6J2xD2QJG1c3kHyvX8kQc_IQ0ns/edit?usp=drivesdk…) out of some kind of deferent/catering to academic mainstream instinct. Probably a mistake; my current annoyance about deferential thinking has reminded me to take it out.
Howie writes:
My impression is that ~everyone I know in the alignment community is very pessimistic about SR's agenda. Does it sound right that your view is basically a consensus? (There's prob some selection bias in who I know).
Richard responds:
I think it's fair to say that this is a pretty widespread opinion. Partly it's because Stuart is much more skeptical of deep learning (and even machine learning more generally!) than almost any other alignment researcher, and so he's working in a different paradigm.
Is Richard correct and if so why? (I would also like a clearer explanation why Richard is skeptical of Stuart's agenda. I agree that the reframing doesn't completely solve the problem, but I don't understand why it can't be a useful piece).
One of the things that almost all AI researchers agree on is that rationality is convergent: as something thinks better, it will be more successful, and to be successful, it will have to think better. In order to think well, it need to have a model of itself and what it knows and don't know, and also a model of its own uncertainty -- to do Bayesian updates, you need probability priors. All Russell has done is say "thus you shouldn't have a utility function that maps a state to its utility, you should have a utility functional that maps a state to a probabi... (read more)