LESSWRONG
LW

747
LuigiPagani
2130
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Will Orion/Gemini 2/Llama-4 outperform o1
LuigiPagani1y*10

I agree it’s not very clear. The focus focus of my question would like to be on reasoning benchmarks—specifically in areas like mathematics, coding, and logical reasoning—while disregarding aspects like agency. When it comes to the "next frontier" models, I’d only consider entries like Orion, Claude 3.5 Opus (or Claude 4 Opus, depending on its eventual naming), Llama 4 (big), and Gemini 2 . A good way to identify it would be by the price per million tokens, for example the new Sonnet is much less expensive than o1 and also of Opus, so it doesn't count as next-frontier model. Of course, the increasingly confusing naming conventions these companies adopt make it harder to define and categorize these "frontier models" clearly. I am editing the answer to make it clearer. Thanks a lot for the feedback!

Reply
Will Orion/Gemini 2/Llama-4 outperform o1
Answer by LuigiPaganiNov 18, 2024*20

I would bet on approximately the same performance in math, coding and reasoning

Reply1
jacquesthibs's Shortform
LuigiPagani1y10

Are you sure he is an OpenAi employee?

Reply
2Will Orion/Gemini 2/Llama-4 outperform o1
Q
1y
Q
3