Oliver Clive-Griffin

Posts

Sorted by New

Wiki Contributions

Comments

The target network has access to more information than the Q-network does, and thus is a better predictor

Some additional context for anyone confused by this (as I was): "more information" is not a statement about training data, but instead about which prediction task the target network has to do.

In other words, the target network has an "easier" prediction task: predicting the return from a state further in the future. I was confused because I thought the suggestion was that the target network has been trained for longer, but it hasn't, it's just an old checkpoint of the Q-network.

The differentiation between the Q- and target-networks here is actually not very important, the key point would hold even if you were using the Q-network instead. That is: predicting  is "easier" than predicting  for any network, as the first is in some sense a subset of the second task.