LESSWRONG
LW

Coherence ArgumentsExistential riskAIRationality
Frontpage

22

[Linkpost] Will AI avoid exploitation?

by cdkg
6th Aug 2023
1 min read
1

22

Coherence ArgumentsExistential riskAIRationality
Frontpage

22

[Linkpost] Will AI avoid exploitation?
4Charlie Steiner
New Comment
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 11:29 AM
[-]Charlie Steiner2y42

This was great!

In the Avoidance section, I'd like to have seen discussion of the argument that if some specific observer is unable to exploit an AI, then it's useful for the observer to model the AI as maximizing utility. This argument (like other Avoidance arguments) is a little circular, because "explotation" is defined relative to some scoring function, so we're left with the question is how we learned about the lack of exploitability relative to this specific function. But there's still a substantive kernel, which is that the real AI might be some complicated, flawed process, but (arguendo) from the perspective of some even-more-flawed observer the best model might be simple. E.g. a poor chess player might not have any better model of an above-average chess player than just expecting them to make high-scoring moves.

This also makes me more interested in the author's perspective on other possible models of smarter-than-human decision-making.

Reply
Moderation Log
More from cdkg
View more
Curated and popular this week
1Comments

As some of you may know, I'm editing a special issue of the journal Philosophical Studies on AI safety (along with @Dan H ). I thought I'd share the first paper from the issue, which deals with some issues in AI safety theory that have been frequently discussed on LessWrong.

Here's the abstract:

A simple argument suggests that we can fruitfully model advanced AI systems using expected utility theory. According to this argument, an agent will need to act as if maximising expected utility if they’re to avoid exploitation. Insofar as we should expect advanced AI to avoid exploitation, it follows that we should expect advanced AI to act as if maximising expected utility. I spell out this argument more carefully and demonstrate that it fails, but show that the manner of its failure is instructive: in exploring the argument, we gain insight into how to model advanced AI systems.

You can find the paper here: https://link.springer.com/article/10.1007/s11098-023-02023-4