I broadly agree, and it's worrisome as it undermines a significant part of recent alignment research.
Anthropic (and others) release papers from time to time. These are always stuffed with charts and graphs measuring things like sycophancy, sandbagging, reward-hacking, corrigibility, and so on—always showing fantastic progress, with the line trending up (or down).
So it's dismaying to see things like AI Village, where models (outside their usual testing environments) seem to collapse back on their old ways: sycophantic, dishonest, gullible, manipulativ...
I agree. I think (current) LLMs are mainly impressive because they know everything, and their actual pound-for-pound intelligence is still fairly subhuman.
When I see the reasoning of a LLM, I am struck by how "unsmart" it seems. Going down blind paths, failing to notice big-picture implications, repeating the same thoughts over and over. They do a lot of thinking, but it's still not high quality thinking.
Yes, I know reasoning is not really an analogue for human thinking. But whatever it is—reasoning, daydreaming... (read more)