x
Reinforcement learning scaling might incentivise hidden reasoning architectures for AI — LessWrong