Here are some more general results derived from "Occam's razor is insufficient to infer the preferences of irrational agents":
These are all true; the key question is to what extent they are true. Do we need to have minimal supervision, or add minimal assumptions, to get the AI to deduce our values correctly? After all, we do produce a lot of data, that it could use to learn our values, if it gets the basic assumptions right.
Or do we need to put in a lot more work in the assumptions? After all, most of the data we produce is by humans, for humans, who will accept the implicit part of the data, hence there are few explicit labels for the AI to use.
My feeling is that we probably only need a few assumptions - but maybe more than some optimists in this area believe.