From OpenAI Five's blogpost:
We’re still fixing bugs. The chart shows a training run of the code that defeated amateur players, compared to a version where we simply fixed a number of bugs, such as rare crashes during training, or a bug which resulted in a large negative reward for reaching level 25. It turns out it’s possible to beat good humans while still hiding serious bugs!
One common line of thought is thinking that goals are very brittle - small misspecifications will be amplified after optimizing.
Yet Open AI Five managed to wrangle a good performance out of a seriously buggy reward function.
Hardly conclusive, but it would be interesting to see more examples of this. One could also do deliberate experiments to see how much you can distort a reward function before behaviour breaks.
Some other factors which might be relevant:
*It's been a while since I've read through all their DOTA stuff, so there could be other factors, but I do remember that one was because the game was so big, that helped get them to do reasonable stuff faster so they'd have more of something like the pieces to assemble into a strategy.
** I don't think they communicate (via a channel for that, the chat) though. (Though whether or not they use the alarm clicking thing (with the circles and the exclamation marks and the sound would also be relevant.)