All of VesaVanDelzig's Comments + Replies

If you had a defense of the idea, or a link to one I could read, I would be very interested to hear it. I wasn't trying to be dogmatically skeptical. 

2Rohin Shah7mo
Responded above

His hostility to the program as I understand it is that is CIRL doesn't much answer the question of how to specify specify a learning procedure that would go from an observations of a human being to a correct model of a human being's utility function. This is the hard part of the problem. This is why he says "specifying an update rule which converges to a desirable goal is just a reframing of the problem of specifying a desirable goal, with the "uncertainty" part a red herring". 

One of the big things that CIRL was claimed to have going for it is that ... (read more)

1Tor Økland Barstad7mo
But maybe continuing to be deferential (in many/most situations) would be part of the utility function it converged towards? Not saying this consideration refutes your point, but it is a consideration. (I don't have much of an opinion regarding the study-worthiness of CIRL btw, and I know very little about CIRL. Though I do have the perspective that one alignment-methodology need not necessarily be the "enemy" of another, partly because we might want AGI-systems where sub-systems also are AGIs (and based on different alignment-methodologies), and where we see whether outputs from different sub-systems converge.)