x

LESSWRONG

LW

Defender7762 — LessWrong

Defender7762

Defender7762

Message

2

3

1y

Defender7762

1y

Defender7762's Shortform

Apr 17, 2025•1

Debunk the myth -Testing the generalized reasoning ability of LLM

Conclusion Current LLM Reasoning Ability: As of March 2025, the actual reasoning capabilities of publicly available LLMs are approximately 50 times lower than what is suggested by benchmarks like AIME. Today, various false marketing about LLM's reasoning ability is rampant on the Internet. They usually make strong claims: they get...

Apr 11, 2025•1