VesaVanDelzig
5
2
VesaVanDelzig has not written any posts yet.

VesaVanDelzig has not written any posts yet.

His hostility to the program as I understand it is that is CIRL doesn't much answer the question of how to specify specify a learning procedure that would go from an observations of a human being to a correct model of a human being's utility function. This is the hard part of the problem. This is why he says "specifying an update rule which converges to a desirable goal is just a reframing of the problem of specifying a desirable goal, with the "uncertainty" part a red herring".
One of the big things that CIRL was claimed to have going for it is that this uncertainty about what the true reward function was... (read more)
If you had a defense of the idea, or a link to one I could read, I would be very interested to hear it. I wasn't trying to be dogmatically skeptical.