davelaing's Shortform

davelaing

This is a special post for quick takes by davelaing. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

I've seen a bunch of people talking about how recent reasoning models are only useful for tasks which we are able to automatically verify.

I'm not sure this is necessarily true.

Reading the rStar paper has me thinking that if someone is able to turn the RL handle on mostly-general reasoning - using automatically verifiable tasks to power the training - it seems plausible that they might end up locking onto something that generalises enough to be superhuman on other tasks.

It's a shame that little things - counting, tokenization - seem like they're muddying the waters for LLM poetry (although maybe I'm out-of-date with my understanding of this). If that weren't the case, it feels like it'd be a nice way to check out-of-distribution reasoning power.

I've seen a bunch of people talking about how recent reasoning models are only useful for tasks which we are able to automatically verify.

I'm not sure this is necessarily true.

davelaing's Shortform

2