Abhinav Sharma

Posts

Sorted by New

Wiki Contributions

Comments

Risks from Learned Optimization: Introduction

Because we don't know when a neural network runs into the Mesa optimization problem we are prone to adversarial attacks ? Like the example with red doors is a neat one. There as a human programmer we thought that the algorithm is learning to read red doors but maybe all it was doing was to learn to distinguish red from everything else. Also isn't every neural network we train today performing some sort of Mesa optimization ?