x

LESSWRONG

LW

karolinakorgul — LessWrong

karolinakorgul

karolinakorgul

Message

7

1y

karolinakorgul

7

1y

Are recent LLMs better at reasoning or better at memorizing?

by Jude Khouja, harrymayne, ryanothnielkearns, and karolinakorgul

TLDR; By carefully designing a reasoning benchmark that counteracts memorization skills in LLMs, LingOly-TOO (L2) Benchmark challenges frontier models with unseen questions and answers and makes the case that LLMs are not consistent reasoning machines yet. Links: Paper - Leaderboard - Dataset Figure 1: LingOly-TOO Benchmark results from the paper....

Mar 7, 2025•11