Ah great point, regarding the comment you link to:
Why specifically would you expect that RL on coding wouldn’t sufficiently advance coding abilities of LLM‘s to significantly accelerate the search for a better learning algorithm or architecture?
That seems to imply that:
Has anyone done a well executed lesswrong finetune?
Couldn't find anything decent and i think this might be a good experiment.