LESSWRONG
LW

Aharon Azulay
2010
Message
Dialogue
Subscribe

https://azuleye.github.io/

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Interpretability Will Not Reliably Find Deceptive AI
Aharon Azulay3mo*30

However, I am pretty pessimistic in general about reliable safeguards against superintelligence with any methods, given how exceptionally hard it is to reason about how a system far smarter than me could evade my plans.

To use an imperfect analogy, I could defeat the narrowly superintelligent Stockfish at 'queen odds chess' where Stockfish starts the game down a queen. 

Can't we think of interpretability and black-box safeguards as the extra pieces we can use to reliably win against rogue superintelligence?

Reply
No wikitag contributions to display.
No posts to display.