ws27a — LessWrong

Actually, Othello-GPT Has A Linear Emergent World Representation

I agree that it's capable of doing that, but it just doesn't do it. If you ask it to multiply a large number, it confidently gives you some incorrect answer a lot of the time instead of using it's incredible coding skills to just calculate the answer. If it was trained via reinforcement learning to maximize a more global and sophisticated goal than merely predicting the next word correctly or avoiding linguistic outputs that some humans have labelled as good or bad, it's very possible it would go ahead and invent these tools and start using them, simply because it's the path of least resistance towards its global goal. I think the real question is what that global goal is supposed to be, and maybe we even have to abandon the notion of training based on reward signals altogether. This is where we get into very murky and unexplored territory, but it's ultimately where the research community has to start looking. Just to conclude on my own position; I absolutely believe that GPT-like systems can be one component of a fully fledged AGI, but there are other crucial parts missing currently, that we do not understand in the slightest.

Actually, Othello-GPT Has A Linear Emergent World Representation

ws27a3y11

I agree with you, but natural intelligence seems to be set up in a way so as to incentivise the construction of subroutines and algorithms that can help solve problems, at least among humans. What I mean is that we humans invented a calculator when we realised our brains are not very good at arithmetics, and now we have this device which is sort of like a technological extension of ourselves. A proper AGI implemented in computer hardware should absolutely be able to implement a calculator by its own determination, the fact that it doesn't speaks to the ill-defined optimization criterion. If it was not optimized to predict the next word but instead towards some more global objective, it's possible it would start to do these things, including the formulation of theories and suggestions towards making the world a better place. Not as some mere summary of what humans have written about, but bottom-up from what it can gather itself. Now, how we train such systems is completely unknown right now, and not many people are even looking in that direction. Many people seem to still think that scaling up GPT-like systems or tweaking RLHF will get us there, but I don't see how it will.

Actually, Othello-GPT Has A Linear Emergent World Representation

ws27a3y10

I am happy to consider a distinction between world models and n-gram models, I just still feel like there is a continuum of some sort if we look closely enough. n-gram models are sort of like networks with very few parameters. As we add more parameters to calculate the eventual probability in the softmax layer, at which point do the world models emerge. And when do we term them world models exactly. But I think we're on the same page with regards to the chess example. Your formulation of "GPT-4 does not care about learning chess" is spot on. And in my view that's the problem with GPT in general. All it really cares about is predicting words.

Actually, Othello-GPT Has A Linear Emergent World Representation

ws27a3y10

I think if we imagine an n-gram model where n approaches infinity and the size of the corpus we train on approaches infinity, such a model is capable of going beyond even GPT. Of course it's unrealistic, but my point simply is that surface level statistics in principle is enough to imitate intelligence the way ChatGPT does.

Of course, literally storing probabilities of n-grams is a super poorly compressed way of doing things, and ChatGPT clearly finds more efficient solutions as it moves through the loss landscape trying to minimize next token prediction error. Some of those solutions are going to resemble world models in that features seem to be disentangled from one another in ways that seem meaningful to us humans or seem to correlate with how we view the world spatially or otherwise.

But I would argue that that has likely been happening since we used multilayer perceptrons for next word prediction in the 80s or 90s. I don't think it's so obvious exactly when something is a world model and when it is not. Any neural network is an algorithm in the sense that the state of node A determines the state of node B (setting aside the randomness of dropout layers).

Any neural network is essentially a very complex decision tree. The divide that people are imagining between rule-based algorithmic following of a pattern and neural networks is completely artificial. The only difference is how we train the systems to find whatever algorithms they find.

To me, it would be interesting if ChatGPT developed an internal algorithm for playing chess (for example), such that it could apply that algorithm consistently no matter the sequence of moves being played. However, as we know, it does not do this. What might happen is that ChatGPT develops something akin to spatial awareness of the chess board that can perhaps be applied to a very limited subset of move orders in the game.

For example, it's possible that it will understand that if e3 is passive and e4 is more ambitious, then pushing the pawn further to e5 is even more ambitious. It's possible that it learns that the center of the board is important and that it uses some kind of spatial evaluation that relates to concepts like that. But we also see that its internal chess model breaks down completely when we are outside of common sequences of play. If you play a completely novel game, sooner or later, it will hallucinate an illegal move.

No iteration of GPT will ever stop doing that, it will just take longer and longer before it comes up with an illegal move. An actual chess engine can continue suggesting moves forever. For me, this points to a fundamental flaw in how GPT-like systems work and basically explains why they are not going to lead towards AGI. Optimizing merely for next word prediction cannot and will never incentivize learning robust internal algorithms for games like Chess or any other games. It will just learn algorithms that sometimes work for some cases.

I think the research community needs to start pondering how we can formulate architectures that incentivise building robust internal algorithms and world models, not just as a "lucky" side effect of gradient descent coupled with a simplistic training objective.

"Dangers of AI and the End of Human Civilization" Yudkowsky on Lex Fridman

ws27a3y10

I don't understand why Eliezer changed his perspective about the current approach of Transformer next-token prediction not being the path towards AGI. It should not be surprising that newer versions of GPT will asymptotically approach (mimicry) of AGI, but that shouldn't convince anyone that they are going to break through that barrier without a change in paradigm. All the intelligent organisms we know of do not have imitation as their primary optimization objective - their objective function is basically to survive or avoid pain. As a result, they of course form sub-goals which might include imitation, but only to the extent that it is instrumental to their survival. Optimizing 100% for imitation does not lead to AGI, because how can novelty emerge from nothing but imitation.

Actually, Othello-GPT Has A Linear Emergent World Representation

ws27a3yΩ-111

Nice work. But I wonder why people are so surprised that these models and GPT would learn a model of the world. Of course they learn a model of the world. Even the skip-gram and CBOW word vectors people trained ages ago modelled the world, in the sense that for example named entities in vector space would be highly correlated with actual spatial/geographical maps. It should be 100% assumed that these models which have many orders of magnitude more parameters are learning much more sophisticated models of the world. What that tells us about their "intelligence" is an entirely different question whatsoever. They are still statistical next token predictors, it's just the statistics are so complicated it essentially becomes a world model. The divide between these concepts is artificial.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments