Just to add, quite a few other papers like Absolute Zero and SimpleRL-Zoo which report on MATH500 also show that Qwen-2.5-MATH 7B has ~64% accuracy:
From Absolute Zero (M500 column below -- 64.8):
From SimpleRL Zoo (63.6):
We reported numbers from Hochlehnert et al. as their paper was explicitly focused on reproducing model performance on various datasets.
Just to add, quite a few other papers like Absolute Zero and SimpleRL-Zoo which report on MATH500 also show that Qwen-2.5-MATH 7B has ~64% accuracy:
From Absolute Zero (M500 column below -- 64.8):
From SimpleRL Zoo (63.6):
We reported numbers from Hochlehnert et al. as their paper was explicitly focused on reproducing model performance on various datasets.