Aharon Azulay — LessWrong

However, I am pretty pessimistic in general about reliable safeguards against superintelligence with any methods, given how exceptionally hard it is to reason about how a system far smarter than me could evade my plans.

To use an imperfect analogy, I could defeat the narrowly superintelligent Stockfish at 'queen odds chess' where Stockfish starts the game down a queen.

Can't we think of interpretability and black-box safeguards as the extra pieces we can use to reliably win against rogue superintelligence?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments