Reward hacking is becoming more sophisticated and deliberate in frontier LLMs — LessWrong