Even if transfer learning is a thing that could work, in any given domain that doesn't have terrible feedback loops, would it not be more efficient to just apply the deliberate practice and metacognition to the domain itself? Like, if I'm trying to learn how to solve puzzle games, would it not be more efficient to just practice solving puzzle games than to do physics problems and try to generalise? Or if you think that this sort of general rationality training is only important for 'specialising in problems we don't understand' type stuff with bad feedback...
This seems like it's equivocating between planning in the sense of "the agent, who assigns some (possibly negative) value to following any given arrow, plans out which sequence of arrows it should follow to accumulate the most value*" and planning in the sense of "the agent's accumulated-value is a state function". The former lets you take the detour in the first planning example (in some cases) while spiralling endlessly down the money-pump helix in the cyclical preferences example; the point of money pump arguments is to get the latter sort of planning f... (read more)