1607

LESSWRONG
LW

1606
Personal Blog

5

[ Parent Question — What are some good examples of incorrigibility? ]

What are some good examples of gaming that is hard to detect?

by SoerenMind
16th May 2019
1 min read
A
1
3

5

Personal Blog

5

What are some good examples of gaming that is hard to detect?
1SoerenMind
2habryka
1SoerenMind
New Answer
New Comment

1 Answers sorted by
top scoring

SoerenMind

May 17, 2019

10

For example, an RL agent that learns a policy that looks good to humans but isn't. Adversarial examples that only fool a neural nets wouldn't count.

[This comment is no longer endorsed by its author]
Add Comment

1 Related Questions

Parent Question
23What are some good examples of incorrigibility?
Q
RyanCarey, Shmi
7y
Q
17
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 1:18 PM
[-]habryka6y20

Could you clarify this a bit? I assume you are thinking about subsets of specification gaming that would not be obvious if they were happening?

If so, then I guess all the adversarial examples in image classification comes to mind, which fits specification gaming pretty well and required quite a large literature to understand.

Reply
[-]SoerenMind6y10

Thanks, updated.

Reply
Moderation Log
More from SoerenMind
View more
Curated and popular this week
A
1
2

For example, an RL agent that learns a policy that looks good to humans but isn't. Adversarial examples that only fool a neural nets wouldn't count.