LESSWRONG
LW

646
fujisawa_sora
5020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
A Philosopher Walks Into A Coffee Shop
fujisawa_sora14d10

This is the first time I’ve constantly chuckled in a while, thanks! And for the half that I didn’t get, it was good learning.

Reply
Recent AI model progress feels mostly like bullshit
fujisawa_sora7mo62

I primarily use LLMs when working with mathematics, which is one of the areas where the recent RL paradigm was a clear improvement—reasoning models are finally useful. However, I agree with you that benchmark-chasing isn’t optimal, in that it still can’t admit when it’s wrong. It doesn’t have to give up, but when it couldn’t do something, I’d rather it list out what it tried as ideas, rather than pretending it can solve everything, because then I actually have to read through everything.

Of course, this can be solved with some amateur mathematicians reading through it and using RL to penalize BS. So, I think this is a case where benchmark performance was prioritized over actual usefulness.

Reply