Chess bots do not have goals

zulupineapple

I see the opposite claim made in The Problem, and see it implied along with most mentions of AlphaGo. I also see some people who might agree with me, e.g. here, or here, but they don't get convincing responses.

It's an odd thing to claim a chess bot is "trying to win", given that, after training, the bot receives no reward for winning, and no feedback for losing. It doesn't even know that the sequence of boards it is given is from the same game. It does not react to the opponent making illegal moves, either by insisting that it won, or by making illegal moves of its own. It does not try to frustrate human opponents or bait weaker opponents into making mistakes. It does not seek out more games in order to win more games. It is entirely incapable of considering any such actions, or other actions you'd expect if it were "trying to win", regardless of how deep its networks are and how long it has been trained, because the training environment did not reward them.

It is certainly true that in the narrow domain of valid chess moves, the bot does optimize "winning" or some proxy of it. But once the bot enters the domain of the real world, the utility function is extended and the description "trying to win" no longer needs to apply, nor does any other simple description of a goal. There are many utility functions that look like "trying to win" when restricted to valid chess moves, and only a narrow subset of those look like "trying to win" in the real world. There is no mechanism for training to produce functions that extend like that. In fact, any neurons spent on considering real-world context are not considering valid chess moves and therefore a waste of compute.

People seem to believe that the bot trained to "win" in a narrow domain will extend to a bot that "tries to win" in the real world, but I have seen no such argument, certainly nothing justifying the high confidence needed for high p(doom). You're very welcome to point me to arguments I may have missed.

People seem to believe that the bot trained to "win" in a narrow domain will extend to a bot that "tries to win" in the real world

I think the concern is that an AGI will not be trained on a narrow domain. The Problem isn't arguing that Stockfish is an ASI or will become one, it's arguing that an ASI will be just as relentless in its domain (the real world) as Stockfish is in its (valid chess moves).

All AI is trained in narrow domains, to some extent. There is no way to make a training environment as complex as the real world. I could have make the same post about LLMs, except there the supposed goal is a lot less clear. Do you have a better example of a "goal oriented" AI in a complex domain?

You might reasonably argue that making aligned narrow AI is easy, but greedy capitalists will build unaligned AI instead. I think it would be off topic to debate here how likely that is. But I don't think this is the prevailing thought, and I don't think it produces the p(doom)=0.9 that some people hold.

And I do personally believe, that EY and many others believe, that with enough optimization, even a chess bot should become dangerous. Not sure if there is any evidence for that belief.

As far as I understand, the difference between AlphaGo and the real potentially dangerous AIs is the following. Whatever ontology or utility function AlphaGo has^[1], it doesn't describe anything except for the Go board and whatever potential moves the opponent might come up with. AlphaGo wouldn't learn almost anything about the opponent from what he/she/it does on the Go board.

On the other hand, we have LLMs trained on huge amounts of text-related data, which is enough to develop complex ontologies. For example, unlike AlphaGo, GPT-4o has somehow learned to elicit likes out of the user by being sycophantic. If AI takeover or living independently of the creators' will is not in the realm of the LLM's abilities, then why would the LLM attempt it in the first place?

^{^}
One should also remember that EpochAI estimated AlphaGo as having 8.2e6 parameters, so complex ontologies could be unlikely to even fit into AlphaGo.

People seem to believe that the bot trained to "win" in a narrow domain will extend to a bot that "tries to win" in the real world

And I do personally believe, that EY and many others believe, that with enough optimization, even a chess bot should become dangerous. Not sure if there is any evidence for that belief.

^{^}
One should also remember that EpochAI estimated AlphaGo as having 8.2e6 parameters, so complex ontologies could be unlikely to even fit into AlphaGo.

LESSWRONG
LW

LESSWRONG
LW

5

Chess bots do not have goals

5

5

5