This felt weird to me, so I tried to construct a non-math example. Suppose we have a reward learning agent where we have designed the reward space so that "ask the human whether to do X" always has higher reward than "do X". The agent is now considering whether to ask the human to try heroin, or just give them heroin. If the agent gives them heroin, it will see their look of ecstasy and will update to have the reward function "5 for giving the human heroin, 7 for asking the human". If the agent asks the human, then the human will say "no", and the agent will update to have the reward function "-1 for giving the human heroin, 1 for asking the human". In both cases asking the human is the optimal action, yet the agent will end up giving the human heroin without asking.
This seems isomorphic to the example you gave, and it's a little clearer what I find weird:
Pretty sure that he meant to say "an irrational agent" instead of "a rational agent", see https://arxiv.org/abs/1712.05812