(Related text posted to Twitter; this version is edited and has a more advanced final section.)
Imagine yourself in a box, trying to predict the next word - assign as much probability mass to the next token as possible - for all the text on the Internet.
Koan: Is this a task whose difficulty caps out as human intelligence, or at the intelligence level of the smartest human who wrote any Internet text? What factors make that task easier, or harder? (If you don't have an answer, maybe take a minute to generate one, or alternatively, try to predict what I'll say next; if you do have an answer, take a moment to review it inside your mind, or maybe say the words out loud.)
Consider that somewhere on the internet is probably a list of thruples: <product of 2 prime numbers, first prime, second prime>.
GPT obviously isn't going to predict that successfully for significantly-sized primes, but it illustrates the basic point:
There is no law saying that a predictor only needs to be as intelligent as the generator, in order to predict the generator's next token.
Indeed, in general, you've got to be more intelligent to predict particular X, than to generate realistic X. GPTs are being trained to a much harder task than GANs.
Same spirit: <Hash, plaintext> pairs, which you can't predict without cracking the hash algorithm, but which you could far more easily generate typical instances of if you were trying to pass a GAN's discriminator about it (assuming a discriminator that had learned to compute hash functions).
Consider that some of the text on the Internet isn't humans casually chatting. It's the results section of a science paper. It's news stories that say what happened on a particular day, where maybe no human would be smart enough to predict the next thing that happened in the news story in advance of it happening.
As Ilya Sutskever compactly put it, to learn to predict text, is to learn to predict the causal processes of which the text is a shadow.
Lots of what's shadowed on the Internet has a *complicated* causal process generating it.
Consider that sometimes human beings, in the course of talking, make errors.
GPTs are not being trained to imitate human error. They're being trained to *predict* human error.
Consider the asymmetry between you, who makes an error, and an outside mind that knows you well enough and in enough detail to predict *which* errors you'll make.
If you then ask that predictor to become an actress and play the character of you, the actress will guess which errors you'll make, and play those errors. If the actress guesses correctly, it doesn't mean the actress is just as error-prone as you.
Consider that a lot of the text on the Internet isn't extemporaneous speech. It's text that people crafted over hours or days.
GPT-4 is being asked to predict it in 200 serial steps or however many layers it's got, just like if a human was extemporizing their immediate thoughts.
A human can write a rap battle in an hour. A GPT loss function would like the GPT to be intelligent enough to predict it on the fly.
Or maybe simplest:
Imagine somebody telling you to make up random words, and you say, "Morvelkainen bloombla ringa mongo."
Imagine a mind of a level - where, to be clear, I'm not saying GPTs are at this level yet -
Imagine a Mind of a level where it can hear you say 'morvelkainen blaambla ringa', and maybe also read your entire social media history, and then manage to assign 20% probability that your next utterance is 'mongo'.
The fact that this Mind could double as a really good actor playing your character, does not mean They are only exactly as smart as you.
When you're trying to be human-equivalent at writing text, you can just make up whatever output, and it's now a human output because you're human and you chose to output that.
GPT-4 is being asked to predict all that stuff you're making up. It doesn't get to make up whatever. It is being asked to model what you were thinking - the thoughts in your mind whose shadow is your text output - so as to assign as much probability as possible to your true next word.
Figuring out that your next utterance is 'mongo' is not mostly a question, I'd guess, of that mighty Mind being hammered into the shape of a thing that can simulate arbitrary humans, and then some less intelligent subprocess being responsible for adapting the shape of that Mind to be you exactly, after which it simulates you saying 'mongo'. Figuring out exactly who's talking, to that degree, is a hard inference problem which seems like noticeably harder mental work than the part where you just say 'mongo'.
When you predict how to chip a flint handaxe, you are not mostly a causal process that behaves like a flint handaxe, plus some computationally weaker thing that figures out which flint handaxe to be. It's not a problem that is best solved by "have the difficult ability to be like any particular flint handaxe, and then easily figure out which flint handaxe to be".
GPT-4 is still not as smart as a human in many ways, but it's naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.
And since the task that GPTs are being trained on is different from and harder than the task of being a human, it would be surprising - even leaving aside all the ways that gradient descent differs from natural selection - if GPTs ended up thinking the way humans do, in order to solve that problem.
GPTs are not Imitators, nor Simulators, but Predictors.
I think an issue is that GPT is used to mean two things:
[See the Appendix]
The latter kind of GPT, is what I think is rightly called a "Simulator".
From @janus' Simulators (italicised by me):... (read more)
Predictors are (with a sampling loop) simulators! That's the secret of mind
While the claim - the task ‘predict next token on the internet’ absolutely does not imply learning it caps at human-level intelligence - is true, some parts of the post and reasoning leading to the claims at the end of the post are confused or wrong.
Let’s start from the end and try to figure out what goes wrong.
From a high-level perspective, it is clear that this is just wrong. Part of what human brains are doing is to minimise prediction error with regard to sensory inputs. Unbounded version of the task is basically of same generality and difficulty as what GPT is doing, and is roughly equivalent to understand everything what is understandable in the observable universe. For example: a friend of mine worked at ... (read more)
I don't see how the comparison of hardness of 'GPT task' and 'being an actual human' should technically work - to me it mostly seems like a type error.
- The task 'predict the activation of photoreceptors in human retina' clearly has same difficulty as 'predict next word on the internet' in the limit. (cf Why Simulator AIs want to be Active Inference AIs)
- Maybe you mean something like task + performance threshold. Here 'predict the activation of photoreceptors in human retina well enough to be able to function as a typical human' is clearly less difficult than task + performance threshold 'predict next word on the internet, almost perfectly'. But this comparison does not seem to be particularly informative.
- Going in this direction we can make comparisons between thresholds closer to reality e.g. 'predict the activation of photoreceptors in human retina, and do other similar computation well enough to be able to function as a typical human' vs. 'predict next word on the internet, at the level of GPT4' . This seems hard to order - humans are usually able to do the human task and would fail at the GPT4 task at GPT4 level; GPT4 is able to do the GPT4 task and would fail at... (read more)
I will try to explain Yann Lecun's argument against auto-regressive LLMs, which I agree with. The main crux of it is that being extremely superhuman at predicting the next token from the distribution of internet text does not imply the ability to generate sequences of arbitrary length from that distribution.
GPT4's ability to impressively predict the next token depends very crucially on the tokens in its context window actually belonging to the distribution of internet text written by humans. When you run GPT in sampling mode, every token you sample from it takes it ever so slightly outside the distribution it was trained on. At each new generated token it still assumes that the past 999 tokens were written by humans, but since its actual input was generated partly by itself, as the length of the sequence you wish to predict increases, you take GPT further and further outside of the distribution it knows.
The most salient example of this is when you try to make chatGPT play chess and write chess analysis. At some point, it will make a mistake and write something like "the queen was captured" when in fact the queen was not captured. This is not the kind of mistake that chess boo... (read more)
Is this a limitation in practice? Rap Battles are a bad example because they happen to be the exception of a task premised on being "one shot" and real time, but the overall point stands. We ask GPT to do tasks in one try, one step, that humans do with many steps, iteratively and recursively.
Take this "the queen was captured" problem. As a human I might be analyzing a game, glance at the wrong move, think a thought about the analysis premised on that move (or even start writing words down!) and then notice the error and just fix it. I am doing this right now, in my thoughts and on the keyboard, writing this comment.
Same thing works with ChatGPT, t... (read more)
When I prompt GPT-5 it's already out of distribution because the training data mostly isn't GPT prompts, and none of it is GPT-5 prompts. If I prompt with "this is a rap battle between Dath Ilan and Earthsea" that's not a high likelihood sentence in the training data. And then the response is also out of distribution, because the training data mostly isn't GPT responses, and none of it is GPT-5 responses.
So why do we think that the responses are further out of distribution than the prompts?
Possible answer: because we try to select prompts that work well, with human ingenuity and trial and error, so they will tend to work better and be effectively closer to the distribution. Whereas the responses are not filtered in the same way.
But the responses are optimized only to be in distribution, whereas the prompts are also optimized for achieving some human objective like generating a funny rap battle. So once the optimizer achieves some threshold of reliability the error rate should go down as text is generated, not up.
This argument seems to depend on:
I'm not an expert in this topic, but it seems to me that "doomed" is the wrong word. LLMs aren't the fastest or most reliable way to compute 2+2, but it is going to become trivial for them to access the tool that is the best way to perform this computation. They will be able to gather data from the outside world using these plugins. They will be able to launch fine-tuning and training processes and interact with other pre-trained models. They will be able to interact with robotics and access cloud computing resources.
LLMs strike me as analogous to the cell. Is a cell capable of vision on its own? Only in the most rudimentary sense of having photoresponsive molecules that trigger cell signals. But cells that are configured correctly can form an eye. And we know that cells have somehow been able to evolve themselves into a functioning eye. I don't see a reason why LLMs, perhaps in combination with other software structures, can't form an AGI with some combination of human and AI-assisted engineering.
(There is a Paul Christiano response over at the EA forum.)
Very minor point, but humans can rap battle on the fly: https://youtu.be/0pJRmtWNP1g?t=158
From @janus' Simulators:
I tried (poorly) to draw attention to this thesis in my "The Limit of Language Models".
I honestly don't see the relevance of this.
OK, yes, to be a perfect text predictor, or even an approximately perfect text predictor, you'd have to be very smart and smart in a very weird way. But there's literally no reason to think that the architectures being used can ever get that good at prediction, especially not if they have to meet any realistic size constraint and/or are restricted to any realistically available amount of training input.
What we've seen them do so far is to generate vaguely plausible text, while making many mistakes that don't look like the kinds of mistakes the sources of their training input would never actually make. It doesn't follow that they can or will actually become unboundedly good predictors of humans or any other source of training data. In fact I don't think that's plausible at all.
It definitely fails in some cases. For example, there's surely text on the Internet that breaks down RSA key generation, with examples. Therefore, to be a truly perfect predictor even of the sort of thing that's already in the training data, you'd have to be able to complete the sentence "the prime factors of the hexadecimal integer 0xda52ab1517291d1032f91532c54a221a... (read more)
You're making a claim about both:
It sounds like you agree that the relevant cognitive capabilities are likely to exist, though maybe not for prime number factorization, and that it's unclear whether they'd fit inside current architectures.
I do not read Eliezer as making a claim that future GPT-n generations will become perfect (or approximately perfect) text predictors. He is specifically rebutting claims others have made, that GPTs/etc can not become ASI, because e.g. they are "merely imitating" human text. This is not obviously true; to the extent that there exist some cognitive capabilities which are physically possible to instantiate in GPT-n model weights which can solve these prediction problems, and are within the region of possible outcomes of our training regimes (+ the data used for them), then it is possible that we will find them.
This seems like it's assuming the conclusion (that reaching dangerous capabilities using these architectures is implausible). Eliezer did consider it unlikely, though GPT-4 was a negative update in that regard.
This seems like it's assuming that the system ends up outer-aligned.
I think that bringing up the extreme difficulty of approximately perfect prediction, with a series of very difficult examples, and treating that as interesting enough to post about, amounts to taking it for granted that it is plausible that these architectures can get very, very good at prediction.
I don't find that plausible, and I'm sure that there are many, many other people who won't find it plausible either, once you call their attention to the assumption. The burden of proof falls on the proponent; if Eliezer wants us to worry about it, it's his job to make it plausible to us.
It might be. I have avoided remembering "alignment" jargon, because every time I've looked at it I've gotten the strong feeling that the whole ontology is completely wrong, and I don't want to break my mind by internalizing it.
It assumes that it ends up doing what you were trying to train it to do. That's not guaranteed, for sure... but on the other hand, it's not guaranteed that it won't. I mean, the whole line of argument assumes that it gets incredibly good at what you were trying to train it to do. And all I said was "it's not obvious that you have a problem". I was very careful not to say that "you don't have a problem".
I agree that the post makes somewhat less sense without the surrounding context (in that it was originally generated as a series of tweets, which I think were mostly responding to people making a variety of mistaken claims about the fundamental limitations of GPT/etc).
Referring back to your top-level comment:
The relevance should be clear: in the limit of capabilities, such systems could be dangerous. Whether the relevant threshold is reachable via current methods is unknown - I don't think Eliezer thinks it's overwhelmingly likely; I myself am uncertain. You do not need a system capable of reversing hashes for that system to be dangerous in the relevant sense. (If you disagree with the entire thesis of AI x-risk then perhaps you disagree with that, but if so, then perhaps mention that up-front, so as to save time arguing about things that aren't actually cruxy for you?)... (read more)
When people call GPT an imitator, it's intended to guide us to accurate intuitions about how to predict its outputs. If I consider GPT as a token predictor, that does not help me very much to predict whether it will output computer code successfully implementing some obscure modified algorithm I've specified in natural language. If I consider it as an imitator, then I can guess that the departures of my modified algorithm from what it's likely to have encountered on the internet means that it's unlikely to produce a correct output.
I work in biology, and it's common to anthropomorphize biological systems to generate hypotheses and synthesize information about how such systems will behave. We also want to be able to translate this back into concrete scientific terms to check if it still makes sense. The Selfish Gene does a great job at managing this back and forth, and that's part of why it's such an illuminating classic.
I think it's useful to promote a greater understanding of GPT-as-predictor, which is the literal and concrete truth, but it is better to integrate that with a more intuitive understanding of GPT-as-imitator, which is my default for figuring out how to work with the technology in practice.
That's an empirical question that interpretability and neuroscience should strive to settle (if only they had the time). Transformers are acyclic, the learned algorithm just processes a single relatively small vector one relatively simple operation at a time, several dozen times. Could be that what it learns to represent are mostly the same obvious things that the brain learns (or is developmentally programmed) to represent, until you really run wild with the scaling, beyond mere ability to imitate internal representations of thoughts and emotions of every human in the world. (There are some papers that correlate transformer embeddings with electrode array readings from human brains, but this obviously needs more decades of study and better electrode arrays to get anywhere.)
Naive question: can you predict something without simulating it?
See "good regulator theorem," and various LW discussion (esp. John Wentworth trying to fix it). For practical purposes, yes, you can predict things without simulating them. The more revealing of the subject your prediction has to get, though, the more of an isomorphism to a simulation you have to contain.
But when you say Simulator, with caps, people will generally take you to be talking about janus' Simulators post, which is not about the AI predicting people by simulating them in detail, but is instead about the AI learning dynamics of text (analogous to how the laws of physics are dynamics of the state of the world), and predicting text by stepping forward these dynamics.
A bit late in commenting and I understand the "mongo" example was pointing at a more general concept, but I decided to check in on the current state of prediction. Not perfect, n=1, could certainly be set out better, but thought I'd give this a whirl:
Hello, I'd like to test your predictive ability on something interesting and novel. May we?
Hello! Of course, I'd be happy to help you with your prediction. What would you like me to predict?
First, some context — I'm an American tech CEO. I like and have read a lot of classical philoso... (read more)
I can imagine a world where LLMs tend to fall into local maxima where they get really good at imitation or simulation, and then they plateau (perhaps only until their developers figure out what adjustments need to be made). But I don't have a good enough model of LLMs to be very sure whether that will happen or not.
Yes, predicting some sequences can be arbitrarily hard. But I have doubts that LLM training will try to predict very hard sequences.
Suppose that some sequences are not only difficult but impossible to predict, because they're random? I would expect that with enough training, it would overfit and memorize them, because they get visited more than once in the training data. Memorization rather than generalization seems likely to happen for anything particularly difficult?
Meanwhile, there is a sea of easier sequences. Wouldn't it be more "evolutionarily profit... (read more)
I think the devil may be in the details here. Ask GPT to hash you a word (let alone guess which word was the origin of a hash), it'll just put together some hash-like string of tokens. It's got the right length and the right character set (namely, it's a hex number), but otherwise, it's nonsense.
This ties into the whole "does it understand" question because it's a very simple example of a simple prediction question with a very deep underlying complexity in which a GPT doesn't perform much better than an N-gram Markov chain. Because there is a lot of comple... (read more)
It is much, much easier for me to predict a text if I have seen a lot of similar texts beforehand, compared to if I have never seen such a text, and need to model the mind that is writing it with their knowledge and intentions, and the causal relations, to generate the result myself. I think the prime numbers are a good illustration here. I can easily imagine a machine learning algorithm that has seen people list prime numbers in order, and you can give it the first couple of primes and it will spit out the next couple, while having no idea what prime numb... (read more)
You are missing a whole stage of chatGPT training. They are first trained to predict words, but then they are reinforced by RLHF. This means they are trained to get rewarded when answering in a format that human evaluators are expected to estimate as "good response". Unlike the text prediction, that might belong to some random minds, here the focus is clear and the reward function is reflecting generalized preferences of OpenAI content moderators and content policy makers. This is stage where a text predictor, acquires his value system and preferences, thi... (read more)
It seems to me that imitation requires some form of prediction in order to work. First make some prediction of the behavioral trajectory of another agent; then try to minimize the deviation of your own behavior from an equivalent trajectory. In this scheme, prediction constitutes a strict subset of the computational complexity necessary to enable imitation. How would GPT's task flip this around?
And if prediction is what's going on, in the much-more-powerful-than-imitation sense, what sort of training scheme would be necessary to produce pure imitation without also training the more powerful predictor as a prerequisite?
This seems like a testable hypothesis. What would it take to train a GPTx on Eliezer's writings and compare its output with the original? And then check if the EliezerGPT is immeasurably smarter than the original?
Alternatively, since predicting Eliezer is in a way like inverting a one-way function, GPTx might top out way below the reasonably accurate predictability level, unless P=NP.