LESSWRONG
LW

1147
Abhinav Sharma
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Risks from Learned Optimization: Introduction
Abhinav Sharma6y10

Because we don't know when a neural network runs into the Mesa optimization problem we are prone to adversarial attacks ? Like the example with red doors is a neat one. There as a human programmer we thought that the algorithm is learning to read red doors but maybe all it was doing was to learn to distinguish red from everything else. Also isn't every neural network we train today performing some sort of Mesa optimization ?

Reply
No posts to display.