x

LESSWRONG

LW

Miguel Angel — LessWrong

Miguel Angel

Miguel Angel

Message

4

1

9mo

Miguel Angel

4

9mo

How Do We Evaluate the Quality of LLMs' Mathematical Responses?

by Bruno Lopez Orozco, Jesus Tadeo Cruz Soto, Miguel Angel Peñaloza Pérez Language models have made tremendous progress in their ability to solve mathematical problems, but how do we really know how well they're doing? It's not enough to just check if the final answer is correct; we need to...

Oct 29, 2025•5