ARC-AGI-2 human baseline surpassed (updated)
...contrary to the misleading leaderboard (which their technical paper implies should actually list humans at ~53%, as explained below): The 98% listed as the "Human Panel" score for ARC-AGI-1 is relatively easy to interpret. It was the score of an actual human who attempted all 100 private evaluation tasks.[1] The...