Two problems with ‘Simulators’ as a frame

[-]Jozdien3y147

My main issue with the terms ‘simulator’, ‘simulation’, ‘simulacra’, etc is that a language model ‘simulating a simulacrum’ doesn’t correspond to a single simulation of reality, even in the high-capability limit. Instead, language model generation corresponds to a distribution over all the settings of latent variables which could have produced the previous tokens, aka “a prediction”.

The way I tend to think of 'simulators' is in simulating a distribution over worlds (i.e., latent variables) that increasingly collapses as prompt information determines specific processes with higher probability. I don't think I've ever really thought of it as corresponding to a specific simulation of reality. Likewise with simulacra, I tend to think of them as any process that could contribute to changes in the behavioural logs of something in a simulation. (Related)

I’ve seen this mistake made frequently – for example, see this post (note that in this case the mistake doesn’t change the conclusion of the post).
[...]
this issue makes this terminology misleading.

I think that there were a lot of mistaken takes about GPT before Simulators, and that it's plausible the count just went down. Certainly there have been a non-trivial number of people I've spoken to who were making pretty specific mistakes that the post cleared up for them - they may have had further mistakes, but thinking of models as predictors didn't get them far enough to make those mistakes earlier. I think in general the reason I like the simulator framing so much is because it's a very evocative frame, that gives you more accessible understanding about GPT mechanics. There have certainly been insights I've had about GPT in the last year that I don't think thinking about next-token predictors would've evoked quite as easily.

[-]ryan_greenblatt3y5-1

The way I tend to think of 'simulators' is in simulating a distribution over worlds (i.e., latent variables) that increasingly collapses as prompt information determines specific processes with higher probability.

I agree this is the correct interpretation of the original post. It just doesn't match typical usage of the world simulation imo. (I'm sorry my post is making such a narrow pedantic point).

I probably agree that simulators improved the thinking of people on lesswrong on average.

[-]Jozdien3y10

I don't disagree that there aren't people who came away with the wrong impression (though they've been at most a small minority of people I've talked to, you've plausibly spoken to more people). But I think that might be owed more to generative models being confusing to think about intrinsically. Speaking of them purely as predictive models probably nets you points for technical accuracy, but I'd bet it would still lead to a fair number of people thinking about them the wrong way.

[-]evhub3yΩ693

I basically agree with this, and a lot of these are the sorts of reasons we went with "predictor" over "simulator" in "Conditioning Predictive Models."

[-]Raemon3yΩ230

I was a bit unsure whether to tag your posts with Simulator Theory. Do you endorse that or not?

[-]evhub3yΩ551

Yeah, I endorse that. I think we are very much trying to talk about the same thing, it's more just a terminological disagreement. Perhaps I would advocate for the tag itself being changed to "Predictor Theory" or "Predictive Models" or something instead.

[-]LawrenceC3yΩ131

I broadly agree with the points being made here, but allow me to nitpick the use of the word "predictive" here, and argue for the key advantage of the simulators framing over the prediction one:

Pretrained models don’t ‘simulate a character speaking’; they predict what comes next, which implicitly involves making predictions about the distribution of characters and what they would say next.

The simulators frame does make it very clear that there's a distinction between the simulator/GPT-3 and the simulacra/characters or situations it's making predictions about! On the other hand, using "prediction" can obscure the distinction, and end up with confused questions like "is GPT just an agent that just wants to minimize predictive loss?"

[-]Charlie Steiner3yΩ121

I think the biggest pitfall of the "simulator" framing is that it's made people (including Beth Barnes?) think it's all about simulating our physical reality, when exactly because of the constraints you mention (text not actually pinpointing the state of the universe, etc.), the abstractions developed by a predictor are usually better understood in terms of treating the text itself as the state, and learning time-evolution rules for that state.

[-]ryan_greenblatt3yΩ440

Thinking about the state and time evolution rules for the state seems fine, but there isn't any interesting structure with the naive formulation imo. The state is the entire text, so we don't get any interesting Markov chain structure. (you can turn any random process into a Markov chain where you include the entire history in the state! The interesting property was that the past didn't matter!)

[-]Charlie Steiner3yΩ130

Hm, I mostly agree. There isn't any interesting structure by default, you have to get it by trying to mimic a training distribution that has interesting structure.

And I think this relates to another way that I was too reductive, which is that if I want to talk about "simulacra" as a thing, then they don't exist purely in the text, so I must be sneaking in another ontology somewhere - an ontology that consists of features inferred from text (but still not actually the state of our real universe).

[-]LawrenceC3yΩ120

Nitpick: I mean, technically, the state is only the last 4k tokens or however long your context length is. Though I agree this is still very uninteresting.

[-]LawrenceC3yΩ132

The time-evolution rules of the state are simply the probabilities of the autoregressive model -- there's some amount of high level structure but not a lot. (As Ryan says, you don't get the normal property you want from a state (the Markov property) except in a very weak sense.)

I also disagree that purely thinking about the text as state + GPT-3 as evolution rules is the intention of the original simulators post; there's a lot of discussion about the content of the simulations themselves as simulated realities or alternative universes (though the post does clarify that it's not literally physical reality), e.g.:

I can’t convey all that experiential data here, so here are some rationalizations of why I’m partial to the term, inspired by the context of this post:
The word “simulator” evokes a model of real processes which can be used to run virtual processes in virtual reality.
It suggests an ontological distinction between the simulator and things that are simulated, and avoids the fallacy of attributing contingent properties of the latter to the former.
It’s not confusing that multiple simulacra can be instantiated at once, or an agent embedded in a tragedy, etc.
[...]
The next post will be all about the physics analogy, so here I’ll only tie what I said earlier to the simulation objective.
the upper bound of what can be learned from a dataset is not the most capable trajectory, but the conditional structure of the universe implicated by their sum.
To know the conditional structure of the universe^[27] is to know its laws of physics, which describe what is expected to happen under what conditions.

I think insofar as people end up thinking the simulation is an exact match for physical reality, the problem was not in the simulators frame itself, but instead the fact that the word physics was used 47 times in the post, while only the first few instances make it clear that literal physics is intended only as a metaphor.

[-]cubefox3y10

I agree that the Oracle/Genie/Tool/Agent categories don't properly contain models trained on self-supervised objectives. Further, I think these aren’t particularly useful categories for current alignment research – we should just talk about how exactly we trained the model and what that behaviorally incentivizes.

Also notice that Simulators was written before fine-tuned models were widely available in form of ChatGPT. All the arguments in the original post against interpreting LLMs as Oracle AIs do no longer apply to instruction fine-tuned models like ChatGPT, which seems to be in fact a prototypical example of a non-superintelligent Oracle. Of course this kind of Oracle AI is just a fine-tuned probability distribution, where the numbers it assigns to next tokens do no longer correspond to their probabilities, but to something else ("goodness relative to fine-tuning"?).

Anyway, fine-tuning might be an important issue from an alignment perspective, insofar fine-tuning seems more likely to result in misalignment than the pure self-supervised imitation learning of the base model.

First, the amount of fine-tuning data (dialogue examples for SL and human preferences ratings for RL) is much more limited than the massive data the self-supervised base model is trained on. This could make it much more likely that the model misgeneralizes (inner misalignment). E.g. it may deem calling someone a racial slur as worse than cutting their hand off because all the RL fine-tuning emphasizes avoiding slurs rather than avoiding cut-off hands.

Second, especially RLHF may suffer from political biases of human raters. This could lead the fine-tuned model to become deceptive and to lie about its beliefs when they are politically incorrect. Which would be a case of outer misalignment.

See the appendix. ↩︎
Insofar as it’s useful to try to reason about what exact actions the pre-training objective incentives in particular cases. I’m not sold on this being considerably useful in most cases. ↩︎
Note that I disagree with quite a bit of the framing and emphasis of Conditioning Predictive Models. Don’t take this link as an endorsement! ↩︎
I think it’s about 90% sure based on doing some quick samples. ↩︎
This is also discussed in the Conditioning Predictive Models sequence ↩︎

LESSWRONG
LW

LESSWRONG
LW

79

Two problems with ‘Simulators’ as a frame

79

Ω 42

79

Ω 42

Related work

Language models are predictors, not simulators

Good prediction doesn’t imply good generation

Okay, but what do you see in practice?

Appendix: Some other agreements and disagreements with Simulators