In last week's post, Meditations on Margarine, I explained how "I've awakened ChatGPT" is a perfectly reasonable claim. The error is in assuming "awakened" results in a human-like consciousness, rather than a "margarine mind".
In today's post, I want to explain where the common errors do occur: the belief that LLMs are a reliable source of insight, rather than a lottery or "gacha game".
Note: this post is focused on the ways LLM usage fails. There are valuable ways to use it, but that's not the focus of this post.
LLMs are best thought of as a slot machine; the payout is random.
"Gacha" refers to games where your rewards are randomized: each time you pull the lever on an LLM, you're getting a random reward. Common to the "gacha" model is the idea that some rewards are more valuable, and correspondingly rarer. If you pull the lever 100 times, you might get 90 obvious ideas, 9 interesting ones, and 1 really brilliant insight.
LLMs work much the same way: usually the output will be about what you'd expect from a Wikipedia summary of the topic. "The sky appears blue due to Rayleigh scattering". Occasionally, you get a more useful insight: "Explain it like I'm five" might produce the much more intuitive answer that "air is blue"; Rayleigh Scattering is just a complex way of explaining why gases have colors.
If your first interaction is an obvious failure, it's easy to write them off as useless. If your first interaction is a brilliant insight that you can confirm for yourself, it's easy to think that further brilliant-sounding results are valid.
In reality, it's a mix: sometimes they get "count the 'b' in blueberry" wrong, and sometimes they score gold in the International Math Olympiad. You're making an error if you think either of those are representative of the average result.
On average, Attention finds the the most statistically-likely response, given the training data. In other words, it says what everyone usually says. In this case: "The sky appears blue due to Rayleigh scattering".
Even when the gacha pays off, you might not realize it. The gacha rewards aren't labeled. There is no secret "silver star" to indicate you've actually gotten an above-average explanation.
Say you ask your LLM why the sky is blue, and it responds "because air is blue". A lot of people are going to consider this a failure; they were expecting the standard overly-complicated answer about Rayleigh Scattering! It's easy to think that since the normal explanation is very complex and scientific-sounding, the simple answer is wrong.
The more you know about a topic, the more you can learn from a good research paper; but without any prior knowledge, you're probably not going to be able to identify which scientific papers held up to the scrutiny of history, and which ones turned out to be misguided (or even deliberate frauds).
Entire fields of science have been misled by a single fraudulent paper. It's not enough to just identify errors, though - plenty of correct papers have errors, too. You have to be able to identify whether the actual conclusion is true or false. If you can reliably do that just from reading a paper, in a field you know nothing about, there are a number of million dollar opportunities available to you.
Consequently, the value of LLM outputs depends on both your expectations and your own model of reality. If you look it up and learn that yes, air really is blue, you've gotten a good gacha draw! If you just take it for granted? Well, maybe I'm lying; maybe there's a reason everyone feels the need to drag Rayleigh Scattering into a simple question about color.
Nothing in the nature of an LLM allows you to skip the hard work of actually verifying the results.
Tell your LLM to "write a brilliant academic paper". It will write something that sounds like and is tonally consistent with a brilliant academic paper. It will make a bunch of remarkably confident claims, because that is what brilliant academics do. It will use a bunch of very precise academic language, because you said "academic". It will also be completely fake, because "accurately describes reality" is not a feature of "a brilliant academic paper". At least, not the way LLMs understand it.
I call this "VIBES": Very Impressive But Essentially Specious.
For example, if you want it to explain consciousness, it might give you something like this: "The emergence of consciousness can be understood through recursive self-modeling in complex systems that achieve sufficient hierarchical depth to generate phenomenological experience." If you read my previous post, that's not an unreasonable summary. Otherwise, it's complete nonsense. You can't extract any actual understanding from that series of words. It's like a prop knife in a movie: great for acting, useless for cooking.
The LLM has done exactly what it was programmed to do: it has produced something which is consistent with the idea and character of "a brilliant academic paper." If you want a brilliant academic paper for your fictional novel, it will probably fit right in, and help establish that your character is very brilliant.
If, on the other hand, you wanted to learn about reality, then you're in a Catch-22: if you know enough to actually verify the paper, you probably didn't need the LLM to generate it for you. If you don't know enough to work out the results without an LLM, an LLM can't actually help you produce new knowledge.
You might really have the most brilliant paper ever, but without any way to confirm that, it's useless.
If the experts wanted to sort through gacha draws, they could pull the lever themselves.