LESSWRONG
LW

527
nikhilchandak
1010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Incorrect Baseline Evaluations Call into Question Recent LLM-RL Claims
nikhilchandak5mo21

Just to add, quite a few other papers like Absolute Zero and SimpleRL-Zoo which report on MATH500 also show that Qwen-2.5-MATH 7B has ~64% accuracy:

From Absolute Zero (M500 column below -- 64.8):

From SimpleRL Zoo (63.6):

We reported numbers from Hochlehnert et al. as their paper was explicitly focused on reproducing model performance on various datasets.  

Reply