Making another due to complaints about lack of clarity in the previous one.

Many Gods refutation (there are too many AIs to care about a particular one. Even if you acausally trade with one, another one might punish you for not following it)

Instrumental Goals for AI ( AIs have Instrumental goals, being if you donate to AI research, all of them would benefit)

My take is many gods refutation still works due to us having no real idea about mindspace of possible AI, thus the AI could just punish for not following it in particular. Also, a small preference today could result in a vastly different in the future, further providing objective to punish for the AI.

What do you think? Any replies would be appreciated.

New to LessWrong?

New Answer
New Comment
15 comments, sorted by Click to highlight new comments since: Today at 2:39 AM

I'm confused about equivalence between "god" and "AI". I know the many gods objection in the context of Pascal's Wager. Are you talking about AI or about gods? Or is the idea that you expect a superintelligent AI in the future which will have the relevant properties of a god?

My initial reaction is that the analogy doesn't work because AI is unlikely to care at all about what you're doing.

Yeah, a Superintelligent AI that might have the relevant properties of a God. Also, I meant this as a counter to acausal blackmail.

But AI isn't pulled randomly out of mindspace. Your argument needs to be about the probability distribution of future AIs to work.

(We may not be able to align AI, but you can't jump from that to "and therefore it's like a randomly sampled mind".)

Where can I read about probability distribution of future AIs. Also, an AI to exist in future can be randomly pulled from mindspace, so why not. Isn't future behavior of an AI pretty much impossible for us to predict.

Any post or article that speculates what future AIs will be like is about the probability distribution of AIs, it's just usually not framed in that way.

I think you're making an error by equivocating between two separate meanings of "impossible to predict":

  • you can make absolutely no statements about the thing
  • you cant predict the thing narrowly

The latter is arguably true, but the former is obviously false. For example, one thing we can confidently predict about AI is that it will add value to the economy (until it perhaps kills everyone). This already shifts the probability distribution massively away from the uniform distribution.

If you actually pulled out a mind at random, it's extremely likely that it wouldn't do anything useful.

Like I think most probability mass comes from a tiny fraction of mind space, probably well below of the entire thing, but the space is so large that even this fraction is massive. But it means many-gods style arguments don't work automatically.

What about the paperclip maximizer AI then. I doubt it adds value to the economy, and it's definitely possible.

If you build an AI to produce paperclips, then you've built it to add value to the economy, and that presumably works until it kills everyone -- that's what I meant. Like, an AI that's build to do some economically useful task but then destroys the world because it optimizes too far. That's still a very strong restriction on the total mind-space; most minds can't do useful things like building paperclips.

(Note that "Paperclip" in "Paperclip maximizer" is a standin for anything that doesn't have moral status, so it could also be a car maximizer or whatever. But even literal paperclips have economic value.)

"If you build an AI to produce paperclips" The 1st AI isn't gonna be built for instantly making money, it's gonna be made for the sole purpose of making it. Then it might go for doing whatever it wants...making paperclips perhaps. But even going by the economy argument, an AI might be made to solve any complex problems, decide to take over the world and also use acausal blackmail, thus turning into a basilisk. It might punish people for following the original Roko's basilisk because it wants to enslave all humanity. You don't know which one will happen, thus it's illogical to follow one since the other might torture you right?

I tend to consider Roko's basilisk to be an information hazard and wasn't thinking or saying anything specific to that. I was only making the general point that any argument about future AI must take into account that the distribution is extremely skewed. It's possible that the conclusion you're trying to reach works anyway.

Not sure what the LW consensus is, but there's some evidence that Roko's basilisk is a red herring.

Fortunately, a proper understanding of subjunctive dependence tells us that an
optimally-behaving embedded agent doesn’t need to pretend that causation can
happen backward in time. Such a sovereign would not be in control of its source
code, and it can’t execute an updateless strategy if there was nothing there to
not-update on in the first place before that source code was written. So Roko’s
Basilisk is only an information hazard if FDT is poorly understood.

From the post Dissolving Confusion around Functional Decision Theory.

One more thing, there can be almost infinite amount of non Superintelligent or semi Superintelligent AIs right?

If you're talking amount of possible superintelligent AIs, then yeah, definitely. (I don't think it's likely to have a large number all physically instantiated.)

Is the Many Gods refutation written down somewhere in a rigorous way?

This link isn't working for me.

Pascal's Wager and the AI/acausal trade thought experiments are related conceptually, in that they reason about entities arbitrarily more powerful than humans, but they are not intended to prove or discuss similar claims and are subject to very different counterarguments. Your very brief posts do not make me think otherwise. I think you need to make your premises and inferential steps explicit, for our benefit and for yours.