The power of LLMs comes almost entirely from imitation learning on human text. This leads to powerful capabilities quickly, but with a natural ceiling (i.e., existing human knowledge), beyond which it’s unclear how to make AI much better.
What do we make of RLVR on top of strong base models? Doesn’t this seem likely to learn genuinely new classes of problem currently unsolvable by humans? (I suppose it require us to be able to write reward functions, but we have Lean and the economy and nature that are glad to provide rewards even if we don’t know the solution ahead of time.)
Ok. For what it’s worth, it was clear to me from the site UX where I could see the others’ names. But I did find it a bit surprising. Looking forward to meeting y’all.
What do we make of RLVR on top of strong base models? Doesn’t this seem likely to learn genuinely new classes of problem currently unsolvable by humans? (I suppose it require us to be able to write reward functions, but we have Lean and the economy and nature that are glad to provide rewards even if we don’t know the solution ahead of time.)