How Do We Evaluate the Quality of LLMs' Mathematical Responses? — LessWrong