359

LESSWRONG
LW

358
AI
Frontpage

15

Studies of Human Error Rate

by tin482
13th Feb 2025
1 min read
3

15

AI
Frontpage

15

Studies of Human Error Rate
5nim
5Viliam
2jimmy
New Comment
3 comments, sorted by
top scoring
Click to highlight new comments since: Today at 12:59 PM
[-]nim7mo50

We put decades of work into getting software to behave less like databases, and then act surprised when it doesn't behave like a database. C'est la vie.

Reply
[-]Viliam7mo50

We wanted computers to be more like humans; didn't realize it would make them suck at math.

Reply
[-]jimmy7mo20

Instead, skeptics often gesture to hallucinations, errors. [...] However, such arguments reliably rule out human "understanding" as well! 

 

"Can do some impressive things, but struggles with basic arithmetic and likes to make stuff up" is such a fitting description of humans that I was quite surprised when it turned out to be true of LLMs too.

Whenever I see a someone claim that it means LLM can't "understand" something, I find it quite amusing that they're almost demonstrating their own point; just not in the way they think they are.

Reply
Moderation Log
More from tin482
View more
Curated and popular this week
3Comments

This is a link post for https://panko.com/HumanErr/SimpleNontrivial.html, a site which compiles dozens of studies estimating Human Error Rate for Simple but Nontrivial Cognitive actions. A great resource! Note that 5-digit multiplication is estimated at ~1.5%.

The table of estimates

 

When LLMs were incapable of even basic arithmetic that was a clear deficit relative to humans. This formed the basis of several arguments about difference in kind, often cruxes for whether or not they could be scaled to AGI or constituted "real intelligence". Now that o3-mini can exactly multiply 9-digit numbers, the debate has shifted.

Image
Source Yuntian Deng https://x.com/yuntiandeng/status/1889704768135905332

Instead, skeptics often gesture to hallucinations, errors. An ideal symbolic system never makes such errors, therefore LLMs cannot truly "understand" even simple concepts like addition. See e.g. Evaluating the World Model Implicit in a Generative Model for this argument in the literature. However, such arguments reliably rule out human "understanding" as well! Studies within Human Reliability Analysis find startlingly high rates even for basic tasks, and even with double checking. Generally, the human reference class is too often absent (or assumed ideal) in AI discussions, and many LLM oddities have close parallels in psychology. If you're willing to look!