I have a view of LLMs that I think is super important, and I have a lengthy draft post justifying this view in detail that's been lying around for over a year now. I've decided to finally just get the main points out there without much elaboration or editing.
LLMs are still basically just predicting what token comes next. This isn't a statement about their intelligence or capabilities! This is just what they're trying to do, as opposed to trying to make things happen in the world or communicate certain things to people.
There are partial explanations as to why LLMs hallucinate, such as:
... but they fail to explain all the weird hallucinatory behaviors at once. "This is just a prediction of what a hypothetical AI assistant might say" straightforwardly explains hallucinations.
The difference between the underlying LLM ("the shoggoth") and the character it's predicting the behavior of ("the mask") is still incredibly distinct and important.
AI companies try to hide this distinction because it's confusing and they hope it won't matter in the future, so they name both the LLM and the assistant character "Claude" or whatever. This just confuses everyone even more. This would seem obviously silly in other contexts: Imagine if OpenAI named their video model "Sora", and also named a robot character that appears in the model's videos "Sora", and made the robot say "Hi! I'm Sora, a text-to-video model developed by OpenAI!", and the world only cared about debating whether "Sora" the robot is friendly or not.
Hallucinations can be mitigated by:
...but as long as the LLM is still just trying to predict what text is coming up next, as opposed to trying to write the text for a particular end, the issue will never fully go away.
"But we have RL post-training that turns the base LLM into a consequentialist agent!" No, it doesn't (yet). If that were true, it wouldn't be hallucinating. Outcome-based RL is inefficient right now and mostly just biases the predictions towards a few good problem-solving tricks, and RLHF was always just fancier fine-tuning.
For all of pretraining, the LLM has zero ability to influence the world. It has no experience with changing the data it's seeing. Why would it be easy to teach it to do this? There's no simple way to snap an AI whose goal is world-predicting into an AI whose goal is world-influencing; these things are superficially similar to us humans, but to think we can go from one to the other with a little post-training is like thinking we can breed cats into bats in a few centuries.
Am I saying this to downplay AI progress? No! In fact, I think this implies:
Here's a simple reason why "X% of our code is written by AI" doesn't mean much: I could write 100% of my code with an LLM from three years ago. I would just have to specify everything in painstaking detail, to the point where I'm almost just typing it myself. It certainly wouldn't mean I've become more productive, and if I was an AI developer, it wouldn't mean I've achieved RSI.
Now, percentage of AI-written code is probably somewhat correlated with productivity gains in practice, but AI companies seem to be Goodharting this metric.
Some people can be too dismissive of the differences between humans and LLMs.
One one hand, it's true that some people cherry-pick the mistakes that LLMs make and use them to denounce their intelligence, even though they're mistakes that many humans make. For example, some have said LLMs can't be intelligent because they can't multiply big numbers accurately without a calculator or a scratchpad; but humans can't do that, either.
On the other hand, I see people hand-wave away some important things. Someone will point out how strange it is that LLMs still hallucinate, and someone else will say "nah, humans make things up all the time!" But like, if you ask an LLM for someone's biographical information, it sometimes will give highly specific fake details mixed in with real details, without being misled by unreliable sources or having an agenda to persuade you of. Even an overconfident and dishonest human wouldn't do that. There's clearly something different in kind from what we humans do.