Help ARC evaluate capabilities of current language models (still need people) — LessWrong