Logical Decision Theories: Our final failsafe?

Noosphere89

This is a linkpost for https://www.lesswrong.com/posts/rP66bz34crvDudzcJ/decision-theory-does-not-imply-that-we-get-to-have-nice

Nate Soares has stated that logical decision theories don't actually work to get us nice things. I disagree with this view, and explain why, even if the AI is unaligned, we still can get good things. (Note: this assumes either a solution to ELK or a scenario where we don't need to solve it for arbitrary networks, and narrow elicitation is enough.)

Cruxes for my view here:

I'm less into hard takeoffs than he is, so I'm not super-worried about intelligences recursively-self improving up to the level where humans can't understand it all (My probability of FOOM is more like 2-3% for the first AGI.) Similarly, I don't think escaping the AI box is nearly as easy as MIRI thinks.
I think ELK or narrow elicitation will probably get us the logical correlations necessary to prevent the outcomes of "fooling the human into a deal" from working.
I view singleton rule as at best unstable, and not likely to occur.
I think the alignment problem is underdetermined, that is our present state has multiple futures from which we can choose.
Acausal trade in infinite multiverses does weird enough things such that raising the probability of success is reasonable.

Now onto that point of acausal trade:

Fixed points and acausal trade in infinite multiverses

In an infinite multiverse like ours is likely to be (the universe is probably isotropic, homogeneous and flat, which implies infinity), probability is messed with.

This means that things change. One of the ways that things change is the bargaining power of humans vs UFAI, and there is an infinite amount of both (assuming alignment doesn't have probability zero.)

Let's say that humans acausally trade across the Tegmark Level IV multiverse. Nate's simply wrong here about there being not enough resources to trade, since they're infinite between humans and their allies and UFAI and it's allies.

Even in a good old level Level I multiverse, infinite numbers of humans and UFAIs exist.

What if they cooperate acausally between themselves? Well there's an infinite amount of humans vs an infinite amount of UFAIs, and this is an infinity vs infinity scenario, which leads to a perfect tie of 1/2 of the universe a priori.

Similar reasoning applies to Everett branches, and it's why exponentially decaying measure and probability still ends up as infinite measure, or probability of 1 respectively.

Where did Nate and Eliezer go wrong?

Answer: They recognized acausal trade as a thing, but forgot to deal with the fact that infinity applies here, so they got vastly wrong results here.

What if they cooperate acausally between themselves? Well there’s an infinite amount of humans vs an infinite amount of UFAIs, and this is an infinity vs infinity scenario

And how do you divide up that infinity between the infinite number of possible UFAIs and future-humanities that could exist? That this procedure gives undefined answers in infinite universes is a sign that it's probably not a good fit for reasoning about them. I think a better answer is something like UDASSA, which can assign different amounts of 'measure' to humanity and UFAIs, giving them potentially unequal bargaining power, even if there are an infinite number of instantiations of both.

Has UDASSA been updated since then? Because they present pretty severe problems for that prior.

And in any case, even if it did work, it only works in the Level I and III multiverses, and not II (Eternal Inflation) or IV multiverses.

UDASSA works fine in level 2 and 4 multiverses. Indeed, its domain of applicability is all possible Turing machines, which could be seen as a model of the level 4 multiverse.

Close, but no cigar. The problem is that while a Turing machine can simulate arbitrarily powerful computers, they can't simulate infinitely powerful computers like a hyper computer, which is necessary to do.

As constructed, I don't see how UDASSA solves the problem.

If Eliezer can get out of an AI box, how likely is it that a superintelligence can't?

Eliezer got out of only a trivial strawman box where he knew he was in a box and could have conversations with his jailers. The more realistic steelman boxing scenario is simulation sandboxing which has little relation to EY's naive boxing and is easily secure given certain reasonable assumptions.

It seems unwise to assume a superhuman AI couldn’t at least suspect that it’s in a box. We already suspect it, and while it wouldn’t necessarily start off seeing overt examples of computation and simulation as your link points out, neither did humanity before we built such things. As for conversations, hopefully a real AI box wouldn’t involve long chats with a gatekeeper about being let out! But a boxed AI has to transmit some data to the outside world or it might as well not exist. That’s a greater limitation than Eliezer faced, but also vastly more optimization power.

It seems unwise to assume a superhuman AI couldn’t at least suspect that it’s in a box.

Taboo 'superhuman' and instead be more specific - do you mean an AI that has more knowledge, thinks faster, more clearly, etc?

Simboxing uses knowledge containment, which allows you to contain an AI which has a potentially superhuman architecture (ie it would be superhuman if it was educated/trained with our full knowledge - ie the internet) but knowledge-constrained instances are only historical human-level capable.

As an obvious example - imagine taking a superhuman architecture and training in the world of pacman. The resulting agent is completely harmless/useless. Taboo intelligence, think and describe actual capabilities.

neither did humanity before we built such things.

Not only does a simboxed AI lack the precursor concepts to conceive of such things, it lives in a sim in which such things can not be built or discovered.

But a boxed AI has to transmit some data to the outside world or it might as well not exist.

The point of simboxing is to evaluate architectures, not individual trained agents. Yes obviously data can transmit out, the limitation is more on data transmitting in.