New Answer

New Comment

1 Answers sorted by
top scoring

Jul 04, 2019*

Interesting question. I found this article https://arxiv.org/abs/1802.07740 together with the papers that cite it https://ui.adsabs.harvard.edu/abs/2018arXiv180207740R/citations as a good starting point.

12 comments, sorted by

top scoring

Click to highlight new comments since: Today at 2:35 AM

[-]habryka6y120

Ramana Kumar and Scott Garrabrant's post "Thoughts on Human Models" provides a bit of context for this:

In this post, we discuss several reasons to be cautious about AGI designs that use human models. We suggest that the AGI safety research community put more effort into developing approaches that work well in the absence of human models, alongside the approaches that rely on human models. This would be a significant addition to the current safety research landscape, especially if we focus on working out and trying concrete approaches as opposed to developing theory. We also acknowledge various reasons why avoiding human models seems difficult.

[-]Shmi6y30

I am not sure that a theory of mind is needed here. If one were to treat humans as a natural phenomenon, living, like the tuberculosis bacillus, or non-living like ice spreading over a lake in freezing temperatures, then the overt behavioral aspects is all that is needed to detect a threat to be eliminated. And then it's trivial to find a means to dispose of the threat, humans are fragile and stupid and have created a lot of ready means of mass destruction.

[-]Raemon6y20

Human behavior is much more complex that ice spreading over a lake. So it's actually simplifying the situation to think in terms of "agents that have goals – what do I predict they want?", in a way that it wouldn't be for ice.

[This comment is no longer endorsed by its author]Reply

[-]Shmi6y30

Every behavior is complex when you look into the details. But the general patterns are often quite simple. And humans are no exception. They expand and take over, easy to predict. Sometimes the expansion stalls for a time, but then resumes. What do you think is so different in overall human patterns from the natural phenomena?

[-]Raemon6y20

If that's true, why do you think humans have theory of mind?

[-]Shmi6y20

Not sure what you are asking and how it is relevant to the general patterns that could trigger an adverse AI response. Also, how much of your stance is triggered by the "humans are special" belief?

[-]Raemon6y50

Ah, yeah I just re-read the opening thread and then re-read your comment think I just agree.

[-]ioannes6y10

And then it's trivial to find a means to dispose of the threat, humans are fragile and stupid and have created a lot of ready means of mass destruction.

If by "a lot of ready means of mass destruction" you're thinking of nukes, it doesn't seem trivial to design a way to use nukes to destroy / neutralize all humans without jeopardizing the AGI's own survival.

We don't have a way of reliably modeling the results of very many simultaneous nuclear blasts, and it seems like the AGI probably wouldn't have a way to reliable model this either unless it ran more empirical tests (which would be easy to notice).

It seems like an AGI wouldn't execute a "kill all humans" plan unless it was confident that executing the plan would in expectation result in a higher chance of its own survival than not executing the plan. I don't see how an AGI could become confident about high-variance "kill all humans" plans like using nukes without having much better predictive models than we do. (And it seems like more empirical data about what multiple simultaneous nuclear explosions do would be required to have better models for this case.)

[-]Shmi6y20

Humans are trivial to kill. Physically, chemically, biologically or psychologically. And a combination of those would be even more effective in collapsing the human population. I will not go here into the details, to avoid arguments and negative attention. And if your argument is that humans are tough to kill, then look into the historic data of population collapse, and that was without any adversarial pressure. Or with, if you consider the indigenous population of the American continent.

[-]Gurkenglas6y30

Does the brute-force minimax algorithm for tic tac toe count? Would a brute-force minimax algorithm for chess count? How about a neural net approximation like AlphaZero?

[-]Pattern6y10

But it seems like it would still need to have a worked-out theory of mind, just to get to the point of understanding that humans are agent-like things that could bear on the AGI's self-preservation.

It could happen before it understands us - if you don't like things that are difficult to predict*, and you find people difficult to predict, then do you dislike people?

(And killing living creatures seems a bit easier than destroying rocks.)

[-]ioannes6y10

Wouldn't an AI following that procedure be really easy to spot? (Because it's not deceptive, and it just starts trying to destroy things it can't predict as it encounters them.)

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

14

[ Question ]

What's state-of-the-art in AI understanding of theory of mind?

14

14

1 Answers sorted by
top scoring

Jul 04, 2019*

14

[ Question ]

What's state-of-the-art in AI understanding of theory of mind?

14

14

1 Answers sorted by top scoring

Jul 04, 2019*

1 Answers sorted by
top scoring