x

LESSWRONG

LW

cakubilo

cakubilo

Message

33

1

1

1y

cakubilo

33

1y

cakubilo — LessWrong

People aren't properly calibrated on FrontierMath

As most of you know, openAI has recently showcased o3's SOTA results on various benchmarks. In my opinion FrontierMath was the hardest of the bunch, and it was reflected in model performance as the SOTA was 2% before Friday. It also seems to be the benchmark with the least visibility....

Dec 23, 2024•31