LESSWRONG
LW

Wikitags

Principles in AI alignment

Edited by Eliezer Yudkowsky last updated 17th Feb 2017

A 'principle' of is something we want in a broad sense for the whole AI, which has informed narrower design proposals for particular parts or aspects of the AI.

For example:

  • The says that the AI should never be searching for a way to defeat our safety measures or do something else we don't want, even if we think this search will come up empty; it's just the wrong thing for us to program computing power to do.
    • This informs the proposal of : we ought to build an AI that wants to attain the class of outcomes we want to see.
    • This informs the proposal of , subproposal : if we build a into the AI, we need to make sure the AI experiences no to .
  • The says that when we are building the first aligned AGI, we should try to do as little as possible, using the least dangerous cognitive computations possible, that is necessary in order to prevent the default outcome of the world being destroyed by the first unaligned AGI.
    • This informs the proposal of and : We are safer if all goals and subgoals of the AI are formulated in such a way that they can be achieved as greatly as preferable using a bounded amount of effort, and the AI only exerts enough effort to do that.
    • This informs the proposal of : It seems like there are some proposals that don't require the AI to understand and predict humans in great detail, just to master engineering; and it seems like we can head off multiple thorny problems by not having the AI trying to model humans or other minds in as much detail as possible.

Please be about declaring things to be 'principles' unless they have already informed more than one specific design proposal and more than one person thinks they are a good idea. You could call them 'proposed principles' and post them under your own domain if you personally think they are a good idea. There are a lot of possible 'broad design wishes', or things that people think are 'broad design wishes', and the principles that have actually already informed specific design proposals would otherwise get lost in the crowd.

Parents:
Children:
and 2 more
instrumental pressure
Discussion0
Discussion0
AI alignment
AI alignment
Corrigibility
Utility indifference
pivotal-act
Value alignment problem
Non-adversarial principle
Non-adversarial principle
suspend button
Minimality principle
disable the suspend button
guarded
Separation from hyperexistential risk
Mild optimization
Behaviorism
Taskishness