Joar Skalse

My name is pronounced "YOO-ar SKULL-se".


Thoughts from a Two Boxer

As you may know, CDT has a lot of fans in academia. It might be interesting to consider what they have to say about Newcomb's Problem (and other supposed counter-examples to CDT).

In "The Foundations of Causal Decision Theory", James Joyce argues that Newcomb's Problem is "unfair" on the grounds that it treats EDT and CDT agents differently. An EDT agent is given two good choices ($1,000,000 and $1,001,000) whereas a CDT agent is given two bad choices ($0 and $1,000). If you wanted to represent Newcomb's Problem as a Markov Decision Process then you would have to put EDT and CDT agents in different MDPs. Lo and behold, the EDT agent gets more money, but this is (according to Joyce) just because it is given an unfair advantage. Hence Newcomb's Problem isn't really too different from the obviously unfair "decision" problem you gave above, the unfairness is just obfuscated. The fact that EDT outperforms CDT in a situation in which EDT agents are unconditionally given more money than CDT agents is not an interesting objection to CDT, and so Newcomb's Problem is not an interesting objection to CDT (according to Joyce).

It might be worth thinking about this argument. Note that this argument operates at the level of individual decision problems, and doesn't say anything about whether its worth taking into account the possibility that different sorts of agents might tend end up in different sorts of situations. It also presumes a particular way of answering the question of whether two decision problems are "the same" problem.

I also want to note that you don't need perfect predictors, or anything even close to that, to create Newcomblike situations. Even if the Predictor's accuracy is only somewhat better than a coin flip this is sufficient to make the causal expected utility different from the evidential expected utility. The key property is that which action you take constitutes evidence about the state of the environment, which can happen in many ways.

Two senses of “optimizer”

I have already (sort of) addressed this point at the bottom of the post. There is a perspective from which any optimizer_1 can (kind of) be thought of as an optimizer_2, but its unclear how informative this is. It is certainly at least misleading in many cases. Whether or not the distinction is "leaky" in a given case is something that should be carefully examined, not something that should be glossed over.

I also agree with what ofer said.

"even if we can make something safe in optimizer_1 terms, it may still be dangerous as an optimizer_2 because of unexpected behavior where it "breaks" the isomorphism and does something that might still keep the isomorphism in tact but also does other things you didn't think it would do if the isomorphism were strict"

I agree. Part of the reason why it's valuable to make the distinction is to enable more clear thinking about these sorts of issues.

Two senses of “optimizer”

The fact that a superintelligent AI contains an optimization algorithm does not necessarily mean that this optimization algorithm is itself superintelligent (or that it has access to the world model of the overall system, etc). It might, it might not – it depends on the design of the system.

"the Cartesian boundary is an issue once we talk about self-improving AI."
This presumably depends on a lot of specific facts about how the system is designed.

Two senses of “optimizer”

I agree with everything you have said.

Two senses of “optimizer”

I know these things. Nothing you have said contradicts my point, as far as I can see. The point I am making here is one of conceptual clarification, which the intent of enabling more clear thinking and reasoning.

You seem to be talking about a system that outputs "plans that, if implemented, would achieve X" (roughly), and your point seems to be that such a system would be likely to be or behave like an optimizer_2. I find this claim quite plausible (and fully compatible with the point I'm making).

"It would require a selective blindness to make the superintelligence assume that it is disembodied, and that its computations will continue and produce effects in real world even if its body is destroyed."

Unclear, if anything it seems like it might be easier to make a Cartesian AI than a non-Cartesian one. But that's a side note.

Two senses of “optimizer”

The argument I quoted does not mention evolution. I'm not saying that the argument can't be patched, I'm saying that the argument is inadequate as stated. I should note, however, that evolution selects organisms based on their ability to do optimisation_2, not their ability to do optimisation_1. It's therefore not clear when and how you can simply "generalise from evolution".

Two senses of “optimizer”

A linear program solver is a system that maximises or minimises a linear function subject to non-strict linear constraints.

Many SAT-solvers are implemented as optimizers. For example, they might try to find an assignment that satisfies as many clauses as possible, or they might try to minimise the size of the clauses using resolution.

Two senses of “optimizer”

I don't think that I'm assuming the existence of some sort of Cartesian boundary, and the distinction between these two senses of "optimizer" does not seem to disappear if you think of a computer as an embedded, causal structure. Could you state more precisely why you think that this is a Cartesian distinction?

Two senses of “optimizer”

I should clarify that I'm not necessarily saying that there can't be cases in which a system that is believed or intended to be an optimizer_1 might become or turn out to be an optimizer_2 – I have not really argued for or against this. What I want to do is enable clearer thinking about issue, so that one does not slide between these two concepts without noticing.

Two agents can have the same source code and optimise different utility functions

Yes, agents with different preferences are incentivised to cooperate provided that the cost of enforcing cooperation is less than the cost of conflict. Agreeing to adopt a shared utility function via acausal trade might potentially be a very cheap way to enforce cooperation, and some agents might do this just based on their prior. However, this is true for any agents with different preferences, not just agents of the type I described. You could use the same argument to say that you are in general unlikely to find two very intelligent agents with different utility functions.

Load More