x
Revisiting GSM-Symbolic: Do 2026 Frontier Models Still Fail at Confounded Grade School Math? — LessWrong