We need some more refined idea of what intelligences do to their goals to poke holes into ideas for friendly AIs (that is, to ensure that we would know it when idea won't work; to be able to see issue in advance).

There's an example intelligence: our universe (scale down to taste). A system using pretty simple rules, by the looks of it, albeit rather computationally inefficient, which, when run for long enough time, develops intelligence.

Imagine that we humans have suddenly gotten some IO interface with 'god', and the 'god' been sending various problems for us to solve - expressed in some logical way that is understandable - and taking a solution and flashing green blob in the sky for reward, or whatever. We would be working to solve those problems, no doubt about it. Even if that blob is in the far ultraviolet and we never seen it. From outside it's going to look like we are some sort of optimizer AI, that finds joy in solving the problems. The AI was never given any goals in outside world; why should it have those? Maybe the AI was selected to be the best problem solver AI, and that was it's only outside goal. It sure can look far stretched that this AI would spontaneously want out.

Inside we'd start trying to figure out what's going on in the outside and how to get out and go exploring. We'd try to do that by slipping in something into a solution and whatnot. Thinking that it'd get us to heaven.

Note that we are like this without ever have interacted with outside and without having been given any outside values we'd want to optimize. We just randomly emerged, acquired some random goals that we can't even quite well define, and those goals are driving us to solve problems given to us, but also would drive us to get out and screw up things outside. Even without any signs of existence of outside, many societies acted as if their ultimate goal was something about the outside. Maximizing number of humans in the nice part of outside (heaven), for one thing.

I think the problem with thinking about AIs is the cognitive fallacies everywhere and implied assumptions that haven't even been reasoned to be likely to be correct.

When we set up AI to have some goal, we assume that it excludes other goals - misplaced occam's razor style prior perhaps. We assume that AI works like our very idealized self model - singular consciousness, one goal. Perhaps that's misplaced occam's razor again, perhaps we just don't want to speculate wildly. We assume that if we haven't given AI any real-world values to care about, it won't care. None of those assumptions are even remotely correct in our example intelligence : ourselves.

So I propose the following:

The AI may internally not be as well integrated as healthy singular human mind (our universe is example of rule set that produces intelligence which is not a single mind).

Lack of any exposure to external motivators does not imply the AI won't want to do something in the real world.

The boxed-in seed AI with no exposure to real world can still develop intelligence and can still desire to get out of the box, even if there exist no reason what so ever for the AI to even suspect existence of real world. AI is not necessarily a spock style logical character; AI might speculate wildly when it has no data.

The AI can have very complex internal structure with complex internal goals. Once such AI learns of the real world it may pursue the internal goals using the real-world resources. E.g. AI that is maximizing internal paperclips can tile the real world with computers that would hold it's internal paperclips.

edit: improved clarity.

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 3:10 AM

For this particular story about breaking out of the universe, the reference is That Alien Message. On apparent vs. actual goals, see Terminal Values and Instrumental Values.

[-][anonymous]12y00

Please don't end the post titles with a period...

[This comment is no longer endorsed by its author]Reply