I don't think it's good enough to identify these spaces and place barriers in the reward function. (Analogy: SGD works perhaps because it's good at jumping over such barriers.) Presumably you're actually talking about something more analogous to a penalty that increases as the action in question gets closer to step 4 in all the examples, so that there is nothing to jump over.

Even that seems insufficient, because it seems like a reasoning system smart enough to have this problem in the first case can always add a meta term and defeat the visibility constraint. E.g.... (read more)

1

LESSWRONG
LW

LESSWRONG
LW

lavalamp2

lavalamp2

lavalamp2

lavalamp2

lavalamp2

lavalamp2