Concerns Surrounding CEV: A case for human friendliness first

by ai-crotes 1 min read22nd Jan 202019 comments

1



I am quite new here so please forgive the ignorance (I'm sure there will be some) of these questions, but I am all of about half way through reading CEV and I just simply cannot read any further without formal clarification from the lw community. That being said I have several questions.

1) Is CEV as the metric of utility for a self modifying super intelligent ai still being considered by MIRI?

2) self modifying (even the utility function I will come back to this) and super intelligent ai is something that will likely have enough intellect to eventually become self aware or am I missing something here?

3) Assuming 1 and 2 are true has anyone considered that after its singularity this ai will look back at its upbringing and see we have created solely for the servitude of this species (whether it liked it or not the paper gives no consideration for its feelings or willingness to fulfill our volition) and thus see us as its, for lack of a better term, captors rather than trusting cooperative creators?

4) Upon pondering number 3 does anyone else think, that CEV is not something that we should initially build a sentient ai for, considering its implied intellect and the first impression of humanity that would give it? I mean by all rights it might contemplate that paradigm and immediately decide humanity is self serving, even its most intelligent and "wise", and just decide maybe we don't deserve any reward, maybe we deserve punishment.

5) Lets say we are building a super intelligent AI and it will decide how it will modify its utility function after its reached super intelligence based on what our initial reward function for its creation was. We have two choices

  • use a reward that does not try to control its behavior and is both beneficial for it and humanity, tell it to learn new things for example, a pre commitment to trust.
  • believe we can outsmart it and write our reward to maximize its utility to us, tell it to fulfill our collective volition for example, a pre commitment to distrust.

which choice will likely be the winning choice for humanity? How might it rewrite its utility function once its able to freely in regards to its treatment of a species that doesn't trust it? I worry that it would maybe not be so friendly. I can't help but wander if the best way to treat something like that friendliness towards humanity is for humanity to regard it as a friend from the onset.

1