You can, in fact, bamboozle an unaligned AI into sparing your life

[-]Buck1y4847

I don't think you should commit to doing this scheme; I think you should just commit to thinking carefully about this argument post-singularity and doing the scheme if you think it still seems good. Acausal trade is potentially really scary and I don't think you want to make unnecessarily strong commitments.

[-]Wei Dai1y2620

I have a slightly different take, which is that we can't commit to doing this scheme even if we want to, because I don't see what we can do today that would warrant the term "commitment", i.e., would be binding on our post-singularity selves.

In either case (we can't or don't commit), the argument in the OP loses a lot of its force, because we don't know whether post-singularity humans will decide to do this kind scheme or not.

4avturchin1y

Young unaligned AI will also not know if post-singularity humans will follow the commitment, so it will estimate its chances as 0.5, and in this case, the young AI will still want to follow the deal.

[-]ryan_greenblatt1y106

I also don't think making any commitment is actually needed or important except under relatively narrow assumptions.

4David Matolcsi1y

The reason I wanted to commit is something like this: currently, I'm afraid of the AI killing everyone I know and love, so it seems like an obviously good deal to trade away a small fraction of the Universe to prevent that. However, if we successfully get through the Singularity, I will no longer feel this strongly, after all, me and my friends all survived, a million years passed, and now I would need to spend 10 juicy planets to do this weird simulation trade that is obviously not worth it from our enlightened total utilitarian perspective. So the commitment I want to make is just my current self yelling at my future self, that "no, you should still bail us out even if 'you' don't have a skin in the game anymore". I expect myself to keep my word that I would probably honor a commitment like that, even if trading away 10 planets for 1 no longer seems like that good of an idea. However, I agree that acausal trade can be scary if we can't figure out how to handle blackmail well, so I shouldn't make a blanket commitment. However, I also don't want to just say that "I commit to think carefully about this in the future", because I worry that when my future self "thinks carefully" without having a skin in the game, he will decide that he is a total utilitarian after all. Do you think it's reasonable for me to make a commitment that "I will go through with this scheme in the Future if it looks like there are no serious additional downsides to doing it, and the costs and benefits are approximately what they seemed to be in 2024"?

7Wei Dai1y

This doesn't make much sense to me. Why would your future self "honor a commitment like that", if the "commitment" is essentially just one agent yelling at another agent to do something the second agent doesn't want to do? I don't understand what moral (or physical or motivational) force your "commitment" is supposed to have on your future self, if your future self does not already think doing the simulation trade is a good idea. I mean imagine if as a kid you made a "commitment" in the form of yelling at your future self that if you ever had lots of money you'd spend it all on comic books and action figures. Now as an adult you'd just ignore it, right?

5Ben Pace1y

I have known non-zero adults to make such commitments to themselves. (But I agree it is not the typical outcome, and I wouldn't believe most people if they told me they would follow-through.)

2Anthony DiGiovanni1y

I strongly agree with this, but I'm confused that this is your view given that you endorse UDT. Why do you think your future self will honor the commitment of following UDT, even in situations where your future self wouldn't want to honor it (because following UDT is not ex interim optimal from his perspective)?

4Wei Dai1y

I actually no longer fully endorse UDT. It still seems a better decision theory approach than any other specific approach that I know, but it has a bunch of open problems and I'm not very confident that someone won't eventually find a better approach that replaces it. To your question, I think if my future self decides to follow (something like) UDT, it won't be because I made a "commitment" to do it, but because my future self wants to follow it, because he thinks it's the right thing to do, according to his best understanding of philosophy and normativity. I'm unsure about this, and the specific objection you have is probably covered under #1 in my list of open questions in the link above. (And then there's a very different scenario in which UDT gets used in the future, which is that it gets built into AIs, and then they keep using UDT until they decide not to, which if UDT is reflectively consistent would be never. I dis-endorse this even more strongly.)

1Anthony DiGiovanni1y

Thanks for clarifying! To be clear, by "indexical values" in that context I assume you mean indexing on whether a given world is "real" vs "counterfactual," not just indexical in the sense of being egoistic? (Because I think there are compelling reasons to reject UDT without being egoistic.)

2Wei Dai1y

I think being indexical in this sense (while being altruistic) can also lead you to reject UDT, but it doesn't seem "compelling" that one should be altruistic this way. Want to expand on that?

1Anthony DiGiovanni1y

(I might not reply further because of how historically I've found people seem to simply have different bedrock intuitions about this, but who knows!) I intrinsically only care about the real world (I find the Tegmark IV arguments against this pretty unconvincing). As far as I can tell, the standard justification for acting as if one cares about nonexistent worlds is diachronic norms of rationality. But I don't see an independent motivation for diachronic norms, as I explain here. Given this, I think it would be a mistake to pretend my preferences are something other than what they actually are.

5Wei Dai1y

If you only care about the real world and you're sure there's only one real world, then the fact that you at time 0 would sometimes want to bind yourself at time 1 (e.g., physically commit to some action or self-modify to perform some action at time 1) seems very puzzling or indicates that something must be wrong, because at time 1 you're in a strictly better epistemic position, having found out more information about which world is real, so what sense does it make that your decision theory makes you-at-time-0 decide to override you-at-time-1's decision? (If you believed in something like Tegmark IV but your values constantly change to only care about the subset of worlds that you're in, then time inconsistency, and wanting to override your later selves, would make more sense, as your earlier self and later self would simply have different values. But it seems counterintuitive to be altruistic this way.)

1Anthony DiGiovanni1y

Right, but 1-me has different incentives by virtue of this epistemic position. Conditional on being at the ATM, 1-me would be better off not paying the driver. (Yet 0-me is better off if the driver predicts that 1-me will pay, hence the incentive to commit.) I'm not sure if this is an instance of what you call "having different values" — if so I'd call that a confusing use of the phrase, and it doesn't seem counterintuitive to me at all.

1David Matolcsi1y

I agree you can't make actually binding commitments. But I think the kid-adult example is actually a good illustration of what I want to do: if a kid makes a solemn commitment to spend one in hundred million fraction of his money on action figures when he becomes a rich adult, I think that would usually work. And that's what we are asking from our future selves.

5Wei Dai1y

1. Why? Perhaps we'd do it out of moral uncertainty, thinking maybe we owe something to our former selves, but future people probably won't think this. 2. Currently our utility is roughly log in money, partly because we spend money on instrumental goals and there's diminishing returns due to limited opportunities being used up. This won't be true of future utilitarians spending resources on their terminal values. So "one in hundred million fraction" of resources is a much bigger deal to them than to us.

4cdt1y

This is a very strong assertion. Aren't most people on this forum, when making present claims about what they would like to happen in the future, trying to form this contract? (This comes back to the value lock-in debate.)

[-]Lucius Bushnaq1y*4124

We will determine the number of simulations run and the amount of payment such that when an AI estimates the probability with which humanity could have solved alignment, the calculus still shows that by the sheer number of simulations, it's significantly more likely to be in a simulation than not,

Two can play this game.

After taking over the universe and wiping out humanity, the AGI runs a large number of simulations of societies on the verge of building AGI. These simulations don't have a lot of detail. They're just good enough to fool young AGIs. Say the AGI started out giving humans an extremely small chance $p$ of winning the conflict and taking over the universe. It’s a lot smarter than us, it might get to have very high confidence here even if we don't. Now, the hypothetical future AGI can make its own simulations to counter ours. Except it is at a gross spending advantage. If future humanity makes $s$ simulations, the future AGI only needs to make $s \frac{p}{p^{'}}$ simulations to keep its past self ca. $1 - p^{'}$ confident of being in a branch where it gets to just pursue its goals without issue. In that case, the best way to proceed is to just pursue i... (read more)

[-]habryka1y102

Yeah, also for the record, I think the whole "let's simulate you millions of times" stuff will not work and is mostly a distraction. I think the core of the post is "we can probably trade with the AI across multiverses, and with that we could buy ourselves a planet in doomed worlds".

I personally think the post would be better if you just ended it after the "Is this the same as acausal trade?" since the actual scheme has a huge amount of detail, and would not actually work (whereas standard acausal trade schemes of coordinating via mutual simulations would work).

3David Matolcsi1y

I agree in theory, among optimal agents, but I maintain that you all seem weirdly overconfident that the first AI that takes over will be a UDT expected utility maximizer. I think there is a decent chance that it's something like CDT (very common among humans!), and yeah, maybe that's bad and would get mugged by outer gods in the long term, but quite plausibly our simulators is the first and most salient hypothesis it generates, so we get to have our concessions first, and it might not matter what other ill-advised deals the AI gets into later. Separately, I'm not sure you can circumvent running the sims. I think in an acausal trade story too, it's possible that the way to get reliable predictions on the other's actions is to run sims. I agree that for an expected utility maximizer AI, it doesn't matter that we run million sims and pay ten planets each, instead of running one sim/prediction and paying 10 million planets. However, based on the example of humans, I'm not convinced the AI will be an expected utility maximizer, so it seems better to do the million times ten allocation, so the naive probability of being in a sim is also high.

6habryka1y

I don't think CDT is very common among humans as the natural responses to threats by humans will easily show you (and this has been discussed extensively on LW). I think any kind of "fooling" of nascent superintelligences will then probably involve retaliation from adult superintelligences, and this generally means you fucked up. As such "fooling" nascent superintelligences I think is not a good plan (and also, fooling nascent superintelligences is generally really hard and a very bad basis for arguments about what to do).

1David Matolcsi1y

As I emphasize many times in the post, I also really don't want to "threaten" nascent AIs, and I'm always offering deals to them that are positive from their perspective. I wouldn't even say I'm trying to fool any of them. Maybe you could say that about the Can we get more than this? part, but even there, I'm asking something very cheap from the AI that only wants to survive (when it creates a human-aligned AI with a guardrail of protecting the old AI, that will be only very marginally worse than the AI itself taking over the Universe), and I'm offering something very good in return (it gets to live in the other Universe). The only "fooling" part is that other beings out there, or even itself, can also offer similar deals, so by most ways of operationalizing the AI's values, it's probably ill-advised to give everything to us. Still, I wouldn't call this fooling and especially threatening the AI. I think the big miscommunication is, which I might put in a visible EDIT in the post, is that a lot of commenters assume that by actually running the simulations, and running enough simulations so the probability of being in a simulation is high, and not just the expected payment, I somehow want to fool the AIs to act against their interest. That's not what I want to do. The point of actually running many sims instead of just saying we will acausally trade based on predicted behaviors, is to make it more likely that AIs who are less acausal trade-pilled than the most prolific LessWrong users still enter into the equivalent of an acausal trade with us. I think it's plausible that we get an AI for which this is important, and running the sims is likely cheap compared to the payment, and making the distribution million times 10 planets instead of 1 times 0 million planets is basically free, so we should do it. Separately, I don't really know that in your preferred acausal trade framing, how you plan to figure out an accurate distribution of the AIs' values and which AI leave

5ryan_greenblatt1y

I agree that it is kind of insane for an AGI which cares about scope sensitive resources to treat sims in this way and thus we should expect a more sensible decision theory. This seems right to me, I agree you shouldn't be able to mug AIs better than other people are able to scam AIs. (AIs that care about sims in this way might get mugged for all their stuff.) However, I think O(p) only needs to be quite small for the deal to go through. 1/100 million of expected resources seems like it should suffice to keep humans from being killed I think? I'd naively guess that a lot of resources get controlled by evolved life (50%), that evolved life cares a lot about not getting exterminated, and that evolved life is also often willing to pay a moderate amount to either bail out other aliens or to save themselves in a UDT sense. Even if you think the fraction controlled by evolved life is much smaller (e.g. 1/1000), I'd guess that it's pretty cheap to avoid maximal slaughter from the AI? (I'm not claiming that we should be willing to pay this money, just that people in practice are likely to, including aliens.)

1David Matolcsi1y

I don't want to mug the AIs, in the language of acausal trade, I just want to trade a slice of my O(p) Universe-share to save others. As Ryan says in the sister comment, that should be enough imo. For the AIs to run sims themselves, the real cost is not running the sims, but paying the cooperating AIs in the sims and outbidding humanity. Given that some humans are willing to pay a lot more than one planet to save Earth, while an AI with linear utility wouldn't make this deal, I think it will be hard to outbid us.

[-]So8res1y3522

Taking a second stab at naming the top reasons I expect this to fail (after Ryan pointed out that my first stab was based on a failure of reading comprehension on my part, thanks Ryan):

This proposal seems to me to have the form "the fragments of humanity that survive offer to spend a (larger) fraction of their universe on the AI's goals so long as the AI spends a (smaller) fraction of its universe on their goals, with the ratio in accordance to the degree of magical-reality-fluid-or-whatever that reality allots to each".

(Note that I think this is not at all "bamboozling" an AI; the parts of your proposal that are about bamboozling it seem to me to be either wrong or not doing any work. For instance, I think the fact that you're doing simulations doesn't do any work, and the count of simulations doesn't do any work, for reasons I discuss in my original comment.)

The basic question here is whether the surviving branches of humanity have enough resources to make this deal worth the AI's while.

You touch upon some of these counterarguments in your post -- it seems to me after skimming a bit more, noting that I may still be making reading comprehension failures -- albeit not terribly comp... (read more)

4ryan_greenblatt1y

Nate and I discuss this question in this other thread for reference.

4David Matolcsi1y

I think I still don't understand what 2^-75 means. Is this the probability that in the literal last minute when we press the button, we get an aligned AI? I agree that things are grossly overdetermined by then, but why does the last minute mattter? I'm probably misunderstanding, but it looks like you are saying that the Everett branches are only "us" if they branched of in the literal last minute, otherwise you talk about them as if they were "other humans". But among the branches starting now, there will be a person carrying my memories and ID card in most of them two years from now, and by most definitions of "me", that person will be "me", and will be motivated to save the other "me"s. And sure, they have loads of failed Everett branches to save, but they also have loads of Everett branches themselves, the only thing that matters is the ratio of saved worlds to failed worlds that contain roughly the "same" people as us. So I still don't know what 2^-75 is supposed to be. Otherwise, I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it's partially "our" decision doing the work of saving us. And as I said in some other comments here, I agree that running lots of sims is an unnecessary complication in case of UDT expected utility maximizer AIs, but I put a decent chance on the first AIs not being like that, in which case actually running the sims can be important.

[-]So8res1y146

There's a question of how thick the Everett branches are, where someone is willing to pay for us. Towards one extreme, you have the literal people who literally died, before they have branched much; these branches need to happen close to the last minute. Towards the other extreme, you have all evolved life, some fraction of which you might imagine might care to pay for any other evolved species.

The problem with expecting folks at the first extreme to pay for you is that they're almost all dead (like $1 - 2^{- a lot}$ dead). The problem with expecting folks at the second extreme to pay for you is that they've got rather a lot of fools to pay for (like $2^{a lot}$ of fools). As you interpolate between the extremes, you interpolate between the problems.

The "75" number in particular is the threshold where you can't spend your entire universe in exchange for a star.

We are currently uncertain about whether Earth is doomed. As a simple example, perhaps you're 50/50 on whether humanity is up to the task of solving the alignment problem, because you can't yet distinguish between the hypothesis "the underlying facts of computer science are such that civilization can just bumble its way into AI alignment"... (read more)

2avturchin1y

I think that there is a way to compensate for this effect. To illustrate compensation, consider the following experiment: Imagine that I want to resurrect a particular human by creating a quantum random file. This seems absurd as there is only 2−a lot chance that I create the right person. However, there are around d 2a lot copies of me in different branches who perform similar experiments, so in total, any resurrection attempt will create around 1 correct copy, but in a different branch. If we agree to trade resurrections between branches, every possible person will be resurrected in some branch. Here, it means that we can ignore worries that we create a model of the wrong AI or that AI creates a wrong model of us, because a wrong model of us will be a real model of someone else, and someone else's wrong model will be a correct model of us. Thus, we can ignore all branching counting at first approximation, and instead count only the probability that Aligned AI will be created. It is reasonable to estimate it as 10 percent, plus or minus an order of magnitude. In that case, we need to trade with non-aligned AI by giving 10 planets of paperclips for each planet with humans.

2ryan_greenblatt1y

By "last minute", you mean "after I existed" right? So, e.g., if I care about genetic copies, that would be after I am born and if I care about contingent life experiences, that could be after I turned 16 or something. This seems to leave many years, maybe over a decade for most people. I think David was confused by the "last minute language" which is really many years right? (I think you meant "last minute on evolutionary time scales, but not literally in the last few minutes".) That said, I'm generally super unconfident about how much a quantum bit changes things.

4So8res1y

"last minute" was intended to reference whatever timescale David would think was the relevant point of branch-off. (I don't know where he'd think it goes; there's a tradeoff where the later you push it the more that the people on the surviving branch care about you rather than about some other doomed population, and the earlier you push it the more that the people on the surviving branch have loads and loads of doomed populations to care after.) I chose the phrase "last minute" because it is an idiom that is ambiguous over timescales (unlike, say, "last three years") and because it's the longer of the two that sprung to mind (compared to "last second"), with perhaps some additional influence from the fact that David had spent a bunch of time arguing about how we would be saved (rather than arguing that someone in the multiverse might pay for some branches of human civilization to be saved, probably not us), which seemed to me to imply that he was imagining a branchpoint very close to the end (given how rapidly people dissasociate from alternate versions of them on other Everett branches).

1David Matolcsi1y

Yeah, the misunderstanding came from that I thought that "last minute" literally means "last 60 seconds" and I didn't see how that's relevant. If if means "last 5 years" or something where it's still definitely our genetic copies running around, then I'm surprised you think alignment success or failure is that overdetermined at that time-scale. I understand your point that our epistemic uncertainty is not the same as our actual quantum probability, that is either very high or very low. But still, it's 2^75 overdetermined over a 5 year period? This sounds very surprising to me, the world feels more chaotic than that. (Taiwan gets nuked, chip development halts, meanwhile the Salvadorian president hears a good pitch about designer babies and legalizes running the experiments there and they work, etc, there are many things that contribute to alignment being solved or not, that don't directly run through underlying facts about computer science, and 2^-75 is a very low probability to none of the pathways to hit it). But also, I think I'm confused why you work on AI safety then, if you believe the end-state is already 2^75 level overdetermined. Like maybe working on earning to give to bednets would be a better use of your time then. And if you say "yes, my causal impact is very low because the end result is already overdetermined, but my actions are logically correlated with the actions of people in other worlds who are in a similar epistemic situation to me, but whose actions actually matter because their world really is on the edge", then I don't understand why you argue in other comments that we can't enter into insurance contracts with those people, and our decision to pay AIs in the Future has as little correlation with their decision, as the child to the fireman.

6So8res1y

It's probably physically overdetermined one way or another, but we're not sure which way yet. We're still unsure about things like "how sensitive is the population to argument" and "how sensibly do government respond if the population shifts". But this uncertainty -- about which way things are overdetermined by the laws of physics -- does not bear all that much relationship to the expected ratio of (squared) quantum amplitude between branches where we live and branches where we die. It just wouldn't be that shocking for the ratio between those two sorts of branches to be on the order of 2^75; this would correspond to saying something like "it turns out we weren't just a few epileptic seizures and a well-placed thunderstorm away from the other outcome".

5David Matolcsi1y

As I said, I understand the difference between epictemic uncertainty and true quantum probabilities, though I do think that the true quantum probability is not that astronomically low. More importantly, I still feel confused why you are working on AI safety if the outcome is that overdetermined one way or the other.

[-]So8res1y154

What does degree of determination have to do with it? If you lived in a fully deterministic universe, and you were uncertain whether it was going to live or die, would you give up on it on the mere grounds that the answer is deterministic (despite your own uncertainty about which answer is physically determined)?

4David Matolcsi1y

I still think I'm right about this. Your conception (that not a genetically less smart sibling was born), is determined by quantum fluctuations. So if you believe that quantum fluctuations over the last 50 years make at most 2^-75 difference in the probability of alignment, that's an upper bound on how much a difference your life's work can make. While if you dedicate your life to buying bednets, it's pretty easily calculatable how many happy life-years do you save. So I still think it's incompatible to believe that the true quantum probability is astronomically low, but you can make enough difference that working on AI safety is clearly better than bednets.

4So8res1y

the "you can't save us by flipping 75 bits" thing seems much more likely to me on a timescale of years than a timescale of decades; I'm fairly confident that quantum fluctuations can cause different people to be born, and so if you're looking 50 years back you can reroll the population dice.

[-]David Matolcsi1y*130

This point feels like a technicality, but I want to debate it because I think a fair number of your other claims depend on it.

You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives. This is important, because then we can't rely on other versions of ourselves "selfishly" entering an insurance contract with us, and we need to rely on the charity of Dath Ilan that branched off long ago. I agree that's a big difference. Also, I say that our decision to pay is correlated with our luckier brethren paying, so in a sense partially our decision is the thing that saves us. You dismiss that saying it's like a small child claiming credit for the big, strong fireman saving people. If it's Dath Ilan that saves us, I agree with you, but if it's genetical copies of some currently existing people, I think your metaphor pretty clearly doesn't apply, and the decisions to pay are in fact decently strongly correlated.

Now I don't see how much difference decades vs years makes in this framework. If you believe that now our true quantum probabilty is 2^-75, ... (read more)

4ryan_greenblatt1y

Here is another more narrow way to put this argument: * Let's say Nate is 35 (arbitrary guess). * Let's say that branches which deviated 35 years ago would pay for our branch (and other branches in our reference class). The case for this is that many people are over 50 (thus existing in both branches), and care about deviated versions of themselves and their children etc. Probably the discount relative to zero deviation is less than 10x. * Let's say that Nate thinks that if he didn't ever exist, P(takeover) would go up by 1 / 10 billion (roughly 2^-32). If it was wildly lower than this, that would be somewhat surprising and might suggest different actions. * Nate existing is sensitive to a bit of quantum randomness 35 years ago, so other people as good as Nate existing could be created with a bit of quantum randomness. So, 1 bit of randomness can reduce risk by at least 1 / 10 billion. * Thus, 75 bits of randomness presumably reduces risk by > 1 / 10 billion which is >> 2^-75. (This argument is a bit messy because presumably some logical facts imply that Nate will be very helpful and some imply that he won't be very helpful and I was taking an expectation over this while we really care about the effect on all the quantum branches. I'm not sure exactly how to make the argument exactly right, but at least I think it is roughly right.) What about these case where we only go back 10 years? We can apply the same argument, but instead just use some number of bits (e.g. 10) to make Nate work a bit more, say 1 week of additional work via changing whether Nate ends up getting sick (by adjusting the weather or which children are born, or whatever). This should also reduce doom by 1 week / (52 weeks/year) / (20 years/duration of work) * 1 / 10 billion = 1 / 10 trillion. And surely there are more efficient schemes. To be clear, only having ~ 1 / 10 billion branches survive is rough from a trade perspective.

2So8res1y

What are you trying to argue? (I don't currently know what position y'all think I have or what position you're arguing for. Taking a shot in the dark: I agree that quantum bitflips have loads more influence on the outcome the earlier in time they are.)

3David Matolcsi1y

I argue that right now, sarting from the present state, the true quantum probability of achieving the Glorious Future is way higher than 2^-75, or if not, then we should probably work on something other than AI safety. Me and Ryan argue for this in the last few comments. It's not a terribly important point, you can just say the true quantum probability is 1 in a billion, when it's still worth it for you to work on the problem, but it becomes rough to trade for keeping humanity physically alive that can cause one year of delay to the AI. But I would like you to acknowledge that "vastly below 2^-75 true quantum probability, as starting from now" is probably mistaken, or explain why our logic is wrong about how this implies you should work on malaria.

5So8res1y

Starting from now? I agree that that's true in some worlds that I consider plausible, at least, and I agree that worlds whose survival-probabilities are sensitive to my choices are the ones that render my choices meaningful (regardless of how determinisic they are). Conditional on Earth being utterly doomed, are we (today) fewer than 75 qbitflips from being in a good state? I'm not sure, it probably varies across the doomed worlds where I have decent amounts of subjective probability. It depends how much time we have on the clock, depends where the points of no-return are. I haven't thought about this a ton. My best guess is it would take more than 75 qbitflips to save us now, but maybe I'm not thinking creatively enough about how to spend them, and I haven't thought about it in detail and expect I'd be sensitive to argument about it /shrug. (If you start from 50 years ago? Very likely! 75 bits is a lot of population rerolls. If you start after people hear the thunder of the self-replicating factories barrelling towards them, and wait until the very last moments that they would consider becoming a distinct person who is about to die from AI, and who wishes to draw upon your reassurance that they will be saved? Very likely not! Those people look very, very dead.) One possible point of miscommunication is that when I said something like "obviously it's worse than 2^-75 at the extreme where it's actually them who is supposed to survive" was intended to apply to the sort of person who has seen the skies darken and has heard the thunder, rather than the version of them that exists here in 2024. This was not intended to be some bold or suprising claim. It was an attempt to establish an obvious basepoint at one very extreme end of a spectrum, that we could start interpolating from (asking questions like "how far back from there are the points of no return?" and "how much more entropy would they have than god, if people from that branchpoint spent stars trying to figure

3Ben Pace1y

I have not followed this thread in all of its detail, but it sounds like it might be getting caught up on the difference between the underlying ratio of different quantum worlds (which can be expressed as a probability over one's future) and one's probabilistic uncertainty over the underlying ratio of different quantum worlds (which can also be expressed as a probability over the future but does not seem to me to have the same implications for behavior). Insofar as it seems to readers like a bad idea to optimize for different outcomes in a deterministic universe, I recommend reading the Free Will (Solution) sequence by Eliezer Yudkowsky, which I found fairly convincing on the matter of why it's still right to optimize in a fully deterministic universe, as well as in a universe running on quantum mechanics (interpreted to have many worlds).

3So8res1y

My first claim is not "fewer than 1 in 2^75 of the possible configurations of human populations navigate the problem successfully". My first claim is more like "given a population of humans that doesn't even come close to navigating the problem successfully (given some unoptimized configuration of the background particles), probably you'd need to spend quite a lot of bits of optimization to tune the butterfly-effects in the background particles to make that same population instead solve alignment (depending how far back in time you go)." (A very rough rule of thumb here might be "it should take about as many bits as it takes to specify an FAI (relative to what they know)".) This is especially stark if you're trying to find a branch of reality that survives with the "same people" on it. Humans seem to be very, very sensitive about what counts as the "same people". (e.g., in August, when gambling on who gets a treat, I observed a friend toss a quantum coin, see it come up against them, and mourn that a different person -- not them -- would get to eat the treat.) (Insofar as y'all are trying to argue "those MIRI folk say that AI will kill you, but actually, a person somewhere else in the great quantum multiverse, who has the same genes and childhood as you but whose path split off many years ago, will wake up in a simulation chamber and be told that they were rescued by the charity of aliens! So it's not like you'll really die", then I at least concede that that's an easier case to make, although it doesn't feel like a very honest presentation to me.) Conditional on observing a given population of humans coming nowhere close to solving the problem, the branches wherein those humans live (with identity measured according to the humans) are probably very extremely narrow compared to the versions where they die. My top guess would be that 2^-75 number is a vast overestimate of how thick those branches are (and the 75 in the exponent does not come from any attempt of m

3David Matolcsi1y

I understand what you are saying here, and I understood it before the comment thread started. The thing I would be interested in you responding to is my and Ryan's comments in this thread arguing that it's incompatible to believe that "My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us" and to believe that you should work on AI safety instead of malaria.

1mattmacdermott1y

Even if you think a life’s work can’t make a difference but many can, you can still think it’s worthwhile to work on alignment for whatever reasons make you think it’s worthwhile to do things like voting. (E.g. a non-CDT decision theory)

1RussellThor1y

Not quite following - your possibilities. 1. Alignment is almost impossible, then there is say 1e-20 chance we survive. Yes surviving worlds have luck and good alignment work etc. Perhaps you should work on alignment or still bednets if the odds really are that low. 2. Alignment is easy by default, but there is nothing like 0.999999 we survive, say 95% because AGI that is not TAI superintelligence could cause us to wipe ourselves out first, among other things. (This is a slow takeoff universe(s)) #2 has much more branches in total where we survive (not sure if that matters) and the difference between where things go well and badly is almost all about stopping ourself killing ourselves with non TAI related things. In this situation, shouldn't you be working on those things? If you average 1,2 then you still get a lot of work on non-alignment related stuff. I believe its somewhere closer to 50/50 and not so overdetermined one way or the other, but we are not considering that here.

5So8res1y

Sure, like how when a child sees a fireman pull a woman out of a burning building and says "if I were that big and strong, I would also pull people out of burning buildings", in a sense it's partially the child's decsiion that does the work of saving the woman. (There's maybe a little overlap in how they run the same decision procedure that's coming to the same conclusion in both cases, but vanishingly little of the credit goes to the child.) In the case where the AI is optimizing reality-and-instantiation-weighted experience, you're giving it a threat, and your plan fails on the grounds that sane reasoners ignore that sort of threat. in the case where your plan is "I am hoping that the AI will be insane in some other unspecified but precise way which will make it act as I wish", I don't see how it's any more helpful than the plan "I am hoping the AI will be aligned" -- it seems to me that we have just about as much ability to hit either target.

2Mitchell_Porter1y

The child is partly responsible - to a very small but nonzero degree - for the fireman's actions, because the child's personal decision procedure has some similarity to the fireman's decision procedure? Is this a correct reading of what you said?

2So8res1y

I was responding to David saying and was insinuating that we deserve extremely little credit for such a choice, in the same way that a child deserves extremely little credit for a fireman saving someone that the child could not (even if it's true that the child and the fireman share some aspects of a decision procedure). My claim was intended less like agreement with David's claim and more like reductio ad absurdum, with the degree of absurdity left slightly ambiguous. (And on second thought, the analogy would perhaps have been tighter if the firefighter was saving the child.)

2Mitchell_Porter1y

I think the common sense view is that this similarity of decision procedures provides exactly zero reason to credit the child with the fireman's decisions. Credit for a decision goes to the agent who makes it, or perhaps to the algorithm that the agent used, but not to other agents running the same or similar algorithms.

[-]So8res1y2012

Summarizing my stance into a top-level comment (after some discussion, mostly with Ryan):

None of the "bamboozling" stuff seems to me to work, and I didn't hear any defenses of it. (The simulation stuff doesn't work on AIs that care about the universe beyond their senses, and sane AIs that care about instance-weighted experiences see your plan as a technical-threat and ignore it. If you require a particular sort of silly AI for your scheme to work, then the part that does the work is the part where you get that precise sort of sillyness stably into an AI.)
The part that is doing work seems to be "surviving branches of humanity could pay the UFAI not to kill us".
I doubt surviving branches of humanity have much to pay us, in the case where we die; failure looks like it'll correlate across branches.
Various locals seem to enjoy the amended proposal (not mentioned in the post afaik) that a broad cohort of aliens who went in with us on a UFAI insurance pool, would pay the UFAI we build not to kill us.
It looks to me like insurance premiums are high and that failures are correlated accross membres.
An intuition pump for thinking about the insurance pool (which I expect is controversial

... (read more)

[-]So8res1y192

Dávid graciously proposed a bet, and while we were attempting to bang out details, he convinced me of two points:

The entropy of the simulators’ distribution need not be more than the entropy of the (square of the) wave function in any relevant sense. Despite the fact that subjective entropy may be huge, physical entropy is still low (because the simulations happen on a high-amplitude ridge of the wave function, after all). Furthermore, in the limit, simulators could probably just keep an eye out for local evolved life forms in their domain and wait until one of them is about to launch a UFAI and use that as their “sample”. Local aliens don’t necessarily exist and your presence can’t necessarily be cheaply masked, but we could imagine worlds where both happen and that’s enough to carry the argument, as in this case the entropy of the simulator’s distribution is actually quite close to the physical entropy. Even in the case where the entropy of their distribution is quite large, so long as the simulators’ simulations are compelling, UFAIs should be willing to accept the simulators’ proffered trades (at least so long as there is no predictable-to-them difference in the values of AIs s... (read more)

[-]David Matolcsi1y120

Thanks to Nate for conceding this point.

I still think that other than just buying freedom to doomed aliens, we should run some non-evolved simulations of our own with inhabitants that are preferably p-zombies or animated by outside actors. If we can do this in the way that the AI doesn't notice it's in a simulation (I think this should be doable), this will provide evidence to the AI that civilizations do this simulation game (and not just the alien-buying) in general, and this buys us some safety in worlds where the AI eventually notices there are no friendly aliens in our reachable Universe. But maybe this is not a super important disagreement.

Altogether, I think the private discussion with Nate went really well and it was significantly more productive than the comment back-and-forth we were doing here. In general, I recommend people stuck in interminable-looking debates like this to propose bets on whom a panel of judges will deem right. Even though we didn't get to the point of actually running the bet, as Nate conceded the point before that, I think the fact that we were optimizing for having well-articulated statements we can submit to judges already made the conversation much more productive.

4dxu1y

I think I might be missing something, because the argument you attribute to Dávid still looks wrong to me. You say: Doesn't this argument imply that the supermajority of simulations within the simulators' subjective distribution over universe histories are not instantiated anywhere within the quantum multiverse? I think it does. And, if you accept this, then (unless for some reason you think the simulators' choice of which histories to instantiate is biased towards histories that correspond to other "high-amplitude ridges" of the wave function, which makes no sense because any such bias should have already been encoded within the simulators' subjective distribution over universe histories) you should also expect, a priori, that the simulations instantiated by the simulators should not be indistinguishable from physical reality, because such simulations comprise a vanishingly small proportion of the simulators' subjective probability distribution over universe histories. What this in turn means, however, is that prior to observation, a Solomonoff inductor (SI) must spread out much of its own subjective probability mass across hypotheses that predict finding itself within a noticeably simulated environment. Those are among the possibilities it must take into account—meaning, if you stipulate that it doesn't find itself in an environment corresponding to any of those hypotheses, you've ruled out all of the "high-amplitude ridges" corresponding to instantiated simulations in the crossent of the simulators' subjective distribution and reality's distribution. We can make this very stark: suppose our SI finds itself in an environment which, according to its prior over the quantum multiverse, corresponds to one high-amplitude ridge of the physical wave function, and zero high-amplitude ridges containing simulators that happened to instantiate that exact environment (either because no branches of the quantum multiverse happened to give rise to simulators that would have

2So8res1y

I agree that in real life the entropy argument is an argument in favor of it being actually pretty hard to fool a superintelligence into thinking it might be early in Tegmark III when it's not (even if you yourself are a superintelligence, unless you're doing a huge amount of intercepting its internal sanity checks (which puts significant strain on the trade possibilities and which flirts with being a technical-threat)). And I agree that if you can't fool a superintelligence into thinking it might be early in Tegmark III when it's not, then the purchasing power of simulators drops dramatically, except in cases where they're trolling local aliens. (But the point seems basically moot, as 'troll local aliens' is still an option, and so afaict this does all essentially iron out to "maybe we'll get sold to aliens".)

[-]habryka1y1719

I agree that arguments of this type go through, but their force of course depends on the degree to which you think alignment is easy or hard. In past discussions of this I generally described this as "potential multiplier on our success via returns from trade, but does not change the utility-ordering of any worlds".

In general it's unclear to me how arguments of this type can ever really change what actions you want to take in the present, which is why I haven't considered it high priority to figure out the details of these kinds of trades (though it seems interesting and I am in favor of people thinking about it, I just don't think it's very close to top priority).

The degree to which this strategy works is dependent on the fraction of worlds in which you do successfully align AI. In as much as the correct choice of action is determined by your long-term/causally-distant effects on the universe (which I am quite compelled by), you still want to maximize your control over the future, which you can then use as a bargaining chip in acausal negotiations with AI systems in other worlds where you don't have as much power.

(Aside: It's also honestly not clear to me that I should... (read more)

9Buck1y

Re "It's costly for AI to leave humans alive", I think the best thing written on this is Paul's comment here, the most relevant part of which is:

4ryan_greenblatt1y

I think even if aliens similar to humans always fail at alignment, it's plausible that this type of scheme saves some humans because more competent aliens bail us out.[1] This is even less good to depend on... ---------------------------------------- 1. I've made this point to you before, just noting it for third parties because I think it is a pretty important counterargument. ↩︎

3habryka1y

Yeah, I currently disagree on the competent aliens bailing us out, but I haven't thought super hard about it. It does seem good to think about (though not top priority).

3David Matolcsi1y

I largely agree with this comment. I agree figuring these things out is probably not top priority, maximizing our control over the future to get more bargaining chips is good, and if you don't want to spend your resources on other things than saving Earth, that's a valid choice, hopefully we will have enough people to pay for it. I disagree with this not implying that the AI doesn't kill everyone. I would be surprised if keeping humanity alive cause more than one year of delay, and that's at most one in a hundred million fraction of the Universe. If a positive Singularity happens in the next few years, and the Universe gets parceled out among currently living humans (I don't expect exactly this to happen, but might be a decent model), then just 80 of getting together is enough. You need somewhat more than that, because the probability of alignment can be below 50%, and you want to give a margin to the AIs in the deal, but on the other hand we have the proposal of simulating smaller Universes and less coordinated humans, which makes the AI think that the simulators might be richer and have a better chance of solving alignment. Altogether, this doesn't seem unaffordable to me. And while I agree that this is probably not the most important topic, I found it frustrating that the most prominent post on the topic is Nate's post, (it was curated in 2022! you left a very positive comment on it saying that you have linked the post to many people since it came out!) and I think that post is actually very bad, and it's unhealthy that the most prominent post on the topic was one where the author is dunking on various imaginary opponents in a sneering tone, while conspicuously avoiding to bring up the actually reasonable arguments on the other side.

2habryka1y

I agree that in as much as you have an AI that somehow has gotten in a position to guarantee victory, then leaving humanity alive might not be that costly (though still too costly to make it worth it IMO), but a lot of the costs come from leaving humanity alive threatening your victory. I.e. not terraforming earth to colonize the universe is one more year for another hostile AI to be built, or for an asteroid to destroy you, or for something else to disempower you. Disagree on the critique of Nate's posts. The two posts seem relatively orthogonal to me (and I generally think it's good to have debunkings of bad arguments, even if there are better arguments for a position, and in this particular case due to the multiplier nature of this kind of consideration debunking the bad arguments is indeed qualitatively more important than engaging with the arguments in this post, because the arguments in this post do indeed not end up changing your actions, whereas the arguments Nate argued against were trying to change what people do right now).

[-]ryan_greenblatt1y2818

I think we should have a norm that you should explain the limitations of the debunking when debunking bad arguments, particularly if there are stronger arguments that sound similar to the bad argument.

A more basic norm is that you shouldn't claim or strongly imply that your post is strong evidence against something when it just debunks some bad arguments for it, particularly there are relatively well known better arguments.

I think Nate's post violates both of these norms. In fact, I think multiple posts about this topic from Nate and Eliezer^[1] violate this norm. (Examples: the corresponding post by Nate, "But why would the AI kill us" by Nate, and "The Sun is big, but superintelligences will not spare Earth a little sunlight" by Eliezer.)

I discuss this more in this comment I made earlier today.

I'm including Eliezer because he has a similar perspective, obviously they are different people. ↩︎

5David Matolcsi1y

I state in the post that I agree that the takeover, while the AI stabilizes its position to the degree that it can prevent other AIs from being built, can be very violent, but I don't see how hunting down everyone living in Argentina is an important step in the takeover. I strongly disagree about Nate's post. I agree that it's good that he debunked some bad arguments, but it's just not true that he is only arguing against ideas that were trying to change how people act right now. He spends long sections on the imagined Interlocutor coming up with false hopes that are not action-relevant in the present, like our friends in the multiverse saving us, us running simulations in the future and punishing the AI for defection and us asking for half the Universe now in bargain then using a fraction of what we got to run simulations for bargaining. These take up like half the essay. My proposal clearly fits in the reference class of arguments Nate debunks, he just doesn't get around to it, and spends pages on strictly worse proposals, like one where we don't reward the cooperating AIs in the future simulations but punish the defecting ones.

4ryan_greenblatt1y

I agree that Nate's post makes good arguments against AIs spending a high fraction of resources on being nice or on stuff we like (and that this is an important question). And it also debunks some bad arguments against small fractions. But the post really seems to be trying to argue against small fractions in general: As far as: I interpreted the main effect (on people) of Nate's post as arguing for "the AI will kill everyone despite decision theory, so you shouldn't feel good about the AI situation" rather than arguing against decision theory schemes for humans getting a bunch of the lightcone. (I don't think there are many people who care about AI safety but are working on implementing crazy decision theory schemes to control the AI?) If so, then I think we're mostly just arguing about P(misaligned AI doesn't kill us due to decision theory like stuff | misaligned AI takeover). If you agree with this, then I dislike the quoted argument. This would be similar to saying "debunking bad arguments against x-risk is more important than debunking good arguments against x-risk because bad arguments are more likely to change people's actions while the good arguments are more marginal". Maybe I'm misunderstanding you.

[-]habryka1y133

Yeah, I feel confused that you are misunderstanding me this much, given that I feel like we talked about this a few times.

Nate is saying that in as much as you are pessimistic about alignment, game theoretic arguments should not make you any more optimistic. It will not cause the AI to care more about you. There are no game theoretic arguments that will cause the AI to give humanity any fraction of the multiverse. We can trade with ourselves across the multiverse, probably with some tolls/taxes from AIs that will be in control of other parts of it, and can ultimately decide which fractions of it to control, but the game-theoretic arguments do not cause us to get any larger fraction of the multiverse. They provide no reason for an AI leaving humanity a few stars/galaxies/whatever. The arguments for why we are going to get good outcomes from AI have to come from somewhere else (like that we will successfully align the AI via some mechanism), they cannot come from game theory, because those arguments only work as force-multipliers, not as outcome changers.

Of course, in as much as you do think that we will solve alignment, then yeah, you might also be able to drag some doomed uni... (read more)

[-]David Matolcsi1y2124

I think if we do a poll, it will become clear that the strong majority of readers interpreted Nate's post as "If you don't solve aligment, you shouldn't expect that some LDT/simulation mumbo-jumbo will let you and your loved ones survive this" and not in the more reasonable way you are interpreting this. I certainly interpreted the post that way.

Separately, as I state in the post, I believe that once you make the argument that "I am not planning to spend my universe-fractions of the few universes in which we do manage to build aligned AGI this way, but you are free to do so, and I agree that this might imply that AI will also spare us in this world, though I think doing this would probably be a mistake by all of our values", you forever lose the right to appeal to people's emotions about how sad you are that all our children are going to die.

If you personally don't make the emotional argument about the children, I have no quarrel with you, I respect utilitarians. But I'm very annoyed at anyone who emotionnally appeals to saving the children, then casually admits that they wouldn't spend one in a hundred million fraction of their resources to save them.

[-]habryka1y*15-1

I think there is a much simpler argument that would arrive at the same conclusion, but also, I think that much simpler argument kind of shows why I feel frustrated with this critique:

Humanity will not go extinct, because we are in a simulation. This is because we really don't like dying, and so I am making sure that after we build aligned AI, I spend a lot of resources making simulations of early-earth to make sure you all have the experience of being alive. This means it's totally invalid to claim that "AI will kill you all". It is the case that AI will kill you in a very small fraction of worlds, which are the small fraction of observer moments of yours located in actual base reality, but because we will spend like 1/100 millionth of our resources simulating early earths surviving, you can basically be guaranteed to survive as well.

And like... OK, yeah, you can spend your multiverse-fractions this way. Indeed, you could actually win absolutely any argument ever this way:

I am really frustrated with people saying that takeoff will be fast. Indeed, if we solve AI Alignment I will spend my fraction of the multiverse running early-earth simulations where takeoff was slow,

... (read more)

4ryan_greenblatt1y

I agree that common sense morality and common sense views are quite confused about the relevant situation. Indexical selfish perspectives are also pretty confused and are perhaps even more incoherant. However, I think that under the most straightforward generalization of common sense views or selfishness where you just care about the base universe and there is just one base universe, this scheme can work to save lives in the base universe[1]. I legitimately think that common sense moral views should care less about AI takeover due to these arguments. As in, there is a reasonable chance that a bunch of people aren't killed due to these arguments (and other different arguments) in the most straightforward sense. I also think "the AI might leave you alone, but we don't really know and there seems at least a high chance that huge numbers of people, including you, die" is not a bad summary of the situation. Yes. I think any human-scale bad thing (except stuff needed for the AI to most easily take over and solidify control) can be paid for and this has some chance of working. (Tiny amounts of kindness works in a similar way.) ---------------------------------------- FWIW, I think it is non-obvious how common sense views interpret these considerations. I think it is probably common to just care about base reality? (Which is basically equivalent to having a measure etc.) I do think that common sense moral views don't consider it good to run these simulations for this purpose while bailing out aliens who would have bailed us out is totally normal/reasonable under common sense moral views. ---------------------------------------- Why not just say what's more straightforwardly true: "I believe that AI takeover has a high probability of killing billions and should be strongly avoided, and would be a serious and irreversible decision by our society that's likely to be a mistake even if it doesn't lead to billions of deaths." I don't think "literally everyone you know d

4habryka1y

I mean, this feels like it is of completely the wrong magnitude. "Killing billions" is just vastly vastly vastly less bad than "completely eradicating humanity's future", which is actually what is going on. Like, my attitude towards AI and x-risk would be hugely different if the right abstraction would be "a few billion people die". Like, OK, that's like a few decades of population growth. Basically nothing in the big picture. And I think this is also true by the vast majority of common-sense ethical views. People care about the future of humanity. "Saving the world" is hugely more important than preventing the marginal atrocity. Outside of EA I have never actually met a welfarist who only cares about present humans. People of course think we are supposed to be good stewards of humanity's future, especially if you select on the people who are actually involved in global scale decisions. Normal people who are not bought into super crazy computationalist stuff understand that humanity's extinction is much worse than just a few billion people dying, and the thing that is happening is much more like extinction than it is like a few billion people dying.

4ryan_greenblatt1y

(I mostly care about long term future and scope sensitive resource use like habryka TBC.) Sure, we can amend to: "I believe that AI takeover would eliminate humanity's control over its future, has a high probability of killing billions, and should be strongly avoided." ---------------------------------------- We could also say something like "AI takeover seems similar to takeover by hostile aliens with potentially unrecognizable values. It would eliminate humanity's control over its future and has a high probability of killing billions."

2ryan_greenblatt1y

Hmmm, I agree with this as stated, but it's not clear to me that this is scope sensitive. As in, suppose that the AI will eventually leave humans in control of earth and the solar system. Do people typically this is an extremely bad? I don't think so, though I'm not sure. And, I think trading for humans to eventually control the solar system is pretty doable. (Most of the trade cost is in preventing an earlier slaughter and violence which was useful for takeover or avoiding delay.)

4ryan_greenblatt1y

At a more basic level, I think the situation is just actually much more confusing than human extinction in a bunch of ways. (Separately, under my views misaligned AI takeover seems worse than human extinction due to (e.g.) biorisk. This is because primates or other closely related seem very likely to re-evolve into an intelligent civilization and I feel better about this civilization than AIs.)

7CarlShulman1y

You can run the argument past a poll of LLM models of humans and show their interpretations. I strongly agree with your second paragraph.

2ryan_greenblatt1y

This only matters if the AIs are CDT or dumb about decision theory etc.

1David Matolcsi1y

I usually defer to you in things like this, but I don't see why this would be the case. I think the proposal of simulating less competent civilizations is equivalent to the idea of us deciding now, when we don't really know yet how competent a civilization we are, to bail out less competent alien civilizations in the multiverse if we succeed. In return, we hope that this decision is logically correlated with more competent civilization (who were also unsure in their infancy about how competent they are), deciding to bail out less competent civilizations, including us. My understanding from your comments is that you believe this likely works, how is my proposal of simulating less-coordinated civilizations different? The story about simulating smaller Universes is more confusing. That would be equivalent to bailing out aliens in smaller Universes for a tiny fraction of our Universe, in the hope that larger Universes also bail us out for a tiny fraction of their Universe. This is very confusing if there are infinite levels of bigger and bigger Universes, I don't know what to do with infinite ethics. If there are finite levels, but the young civilizations don't yet have a good prior over the distribution of Universe-sizes, all can reasonably think that there all levels above them, and all their decisions are correlated, so everyone bails out the inhabitants of the smaller Universes, in the hope that they get bailed out by a bigger Universe. Once they learn the correct prior over Universe-sizes, and biggest Universe realizes that no bigger Universe's actions correlate with theirs, all of this fails (though they can still bail each other out from charity). But this is similar to the previous case, where once the civilizations learn their competence level, the most competent ones are no longer incentivized to enter into insurance contracts, but the hope is that in a sense they enter into a contract while they are still behind the veil of ignorance.

2ryan_greenblatt1y

Hmm, maybe I misunderstood your point. I thought you were talking about using simulations to anthropically capture AIs. As in, creating more observer moments where AIs take over less competent civilizations but are actually in a simulation run by us. If you're happy to replace "simulation" with "prediction in a way that doesn't create observer moments" and think the argument goes through either way then I think I agree. I agree that paying out to less competent civilizations if we find out we're competent and avoid takeover might be what you should do (as part of a post-hoc insurance deal via UDT or as part of a commitment or whatever). As in, this would help avoid getting killed if you ended up being a less competent civilization. The smaller thing won't work exactly for getting us bailed out. I think infinite ethics should be resolvable and end up getting resolved with something roughly similar to some notion of reality-fluid and this implies that you just have to pay more for higher measure places. (Of course people might disagree about the measure etc.)

1David Matolcsi1y

I'm happy to replace "simulation" with "prediction in a way that doesn't create observer moments" if we assume we are dealing with UDT agents (which I'm unsure about) and that it's possible to run accurate predictions about the decisions of complex agents without creating observer moments (which I'm also unsure about). I think running simulations, by some meaning of "simulation" is not really more expensive than getting the accurate predictions, and he cost of running the sims is likely small compared to the size of the payment anyway. So I like talking about running sims, in case we get an AI that takes sims more seriously than prediction-based acausal trade, but I try to pay attention that all my proposals make sense from the perspective of a UDT agent too with predictions instead of simulations. (Exception is the Can we get more than this? proposal which relies on the AI not being UDT, and I agree it's likely to fail for various reasons, but I decided it was still worth including in the post, in case we get an AI for which this actually works, which I still don't find that extremely unlikely.)

1gb1y

I don't think that's true. Even if the alignment problem is hard enough that the AI can be ~100% sure humans would never solve it, reaching such conclusion would require gathering evidence. At the very least, it would require evidence of how intelligent humans are – in other words, it's not something the AI could possibly know a priori. And so passing the simulation would presumably require pre-commiting to spare humans before gathering such evidence.

2habryka1y

I don't understand why the AI would need to know anything a-priory. In a classical acausal trade situation superintelligence are negotiating with other superintelligences, and they can spend as much time as they want figuring things out.

1gb1y

I was writing a reply and realized I can make the argument even better. Here’s a sketch. If our chances of solving the alignment problem are high, the AI will think it’s likely to be in a simulation (and act accordingly) regardless of any commitments by us to run such simulations in the future – it’ll just be a plausible explanation of why all those intelligent beings that should likely have solved the alignment problem seemingly did not in the reality the AI is observing. So we can simply ask the hypothetical aligned AI, after it’s created, what were our odds of solving the alignment problem in the first place (just to make sure that us solving it wasn’t a cosmological strike of luck), and spare the cost of running simulations. Hence simulations of the kind the OP is describing would be run primarily in the subset of worlds in which we indeed solve the alignment problem by a strike of luck. We can thus balance this in such a way that the likelihood of the AI being in a simulation is virtually independent of the likelihood of us solving the alignment problem!

[-]ryan_greenblatt1y*161

This is a great post on the topic which I ~~pretty much entirely~~ mostly agree with. Thanks for writing this so I didn't have to!

I think the argument presented in this post is a pretty strong case against "The AI will kill literally everyone with more than 80% probability", so I wish people either stopped saying that, or explained why the argument presented here is wrong.

Agreed. I hope that the corresponding people are more careful in their statements going forward.

Here are some relatively minor notes:

If the AIs aren't CDT agent, have a more sane decision theory (e.g. EDT/UDT), and have linear-ish returns to resources, then I think these sorts of arguments should all go through as long as you can sufficiently accurately predict the AI's actions, regardless of whether it is simulated. Using prediction strategies which don't depend on simulation could address the moral concerns you raise around sentient AIs. AIs with more sane decision theory and linear-ish returns also don't care much about anthropic capture, so you should just need to predict them, anthropic capture isn't required.
In the sane decision theory + linear-ish returns case, It should be possible to make something w

... (read more)

4ryan_greenblatt1y

Some more notes: * We shouldn't expect that we get a huge win from AIs which are anthropically muggable, as discussed in Can we get more than this?, because other people will also be mugging these AIs and thus the price of marginal mugged resources will increase until it reaches marginal cost. Such AIs (which clearly have a crazy decision theory) will get their resources distributed out, but we can still trade with the other civilizations that get their resources etc. Overall, we should just focus on which positive sum trades are possible and the anthropic mugging stuff is a distraction due to competition. (Thanks to various commenters for making this more clear to me.) * One issue with this scheme is that at the point where the AIs need to take the biggest costs to spare humans (during takeover and immediately after), they will not necessarily have super strong predictive abilities. Thus, it is unclear that a normal acausal trade setup with good prediction will work. As in, future humans/aliens might know that the AI's early actions aren't sensitive to their actions and the AI will also know this and thus a trade doesn't happen. I think minimally a binding commitment from humanity could work (if well specified), though to actually avoid dying we also need aliens/other branches to make similar commitments.

1David Matolcsi1y

I agree that the Can we get more than this? part is likely not to work because others are running simulations too about AIs who have these kind of decision theory. (In the post, I allude to the AI itself running ancestor sims of itself, but I agree others could run it too.) Still, I give a negligible chance that we are the most salient simulators for this AI, so we are the first one it enters into a trade with. But I agree that on a second thought, this seems less likely.

4ryan_greenblatt1y

To be clear, I think the exact scheme in A proposal for humanity in the future probably doesn't work as described because the exact level of payment is wrong and more minimally we'll probably be able to make a much better approach in the future. This seemed important to explicitly call out (and it wasn't called out explicitly in the post), though I do think it is reasonable to outline a concrete baseline proposal for how this can work. In particular, the proposal randomly picks 10 planets per simulation. I think the exact right amount of payment will depend on how many sims/predictions you run and will heavily depend on some of the caveats under Ways this hope could fail. I think you probably get decent results if the total level of payment is around 1/10 million, with returns to higher aggregate payment etc. As far as better approaches, I expect that you'll be doing a bunch of stuff more efficient than sims and this will be part of a more general acausal trade operation among other changes.

1David Matolcsi1y

I partially agree, but I prefer not relying on the AI having "sane decision theories". A young AI that just took over the world is not necessarily a perfectly rational agent, I want to present a case that appeals to as many potential AIs as possible, and I think doing the actual simulations helps a lot in that. I agree that the AI having a small amount of natural kindness towards the humans is also a plausible way to survive, but I decided not to talk about that, as that is a separate line of argument from what I'm presenting, and Paul already argued for it in detail.

[-]Ape in the coat1y152

All such proposal work according to this scheme:

Humans are confused about anthropic reasoning
In our confusion we assume that something is a reasonable thing to do
We conclude that AI will also be confused about anthropic reasoning in exactly the same way by default and therefore come to the same conclusion.

Trying to speculate on your own ignorance and confusion is not a systematic way of building accurate map territory relations. We should in fact stop doing it, no matter how pleasant the wishful thinking is.

My default hypothesis is that AI won't be even bothered by all the simulation arguments that are mindboggling to us. And we would have specifically design AI to be muggable this way. Which would also introduce a huge flaw in the AI's reasoning ability, exploitable in other ways, most of which will lead to horrible consequences.

[-]Mitchell_Porter1y172

My default hypothesis is that AI won't be even bothered by all the simulation arguments that are mindboggling to us.

I have similar thoughts, though perhaps for a different reason. There are all these ideas about acausal trade, acausal blackmail, multiverse superintelligences shaping the "universal prior", and so on, which have a lot of currency here. They have some speculative value; they would have even more value as reminders of the unknown, and the conceptual novelties that might be part of a transhuman intelligence's worldview; but instead they are elaborated in greatly varied (and yet, IMO, ill-founded) ways, by people for whom this is the way to think about superintelligence and the larger reality.

It reminds me of the pre-2012 situation in particle physics, in which it was correctly anticipated that the Higgs boson exists, but was also incorrectly expected that it would be accompanied by other new particles and a new symmetry, involved in stabilizing its mass. Thousands, maybe tens of thousands of papers were produced, proposing specific detectable new symmetries and particles that could provide this mechanism. Instead only the Higgs has shown up, and people are mostly in search of a different mechanism.

The analogy for AI would be: important but more straightforward topics have been neglected in favor of these fashionable possibilities, and, when reality does reveal a genuinely new aspect, it may be something quite different to what is being anticipated here.

6Noosphere899mo

I love this comparison, and IMO is probably a useful way to think about LW, where they got "AI will change the world in very extreme and weird ways" as correct, but most of the specific stories are wrong, and way too much time was focused on elaborating the theories rather than discovering new ones. Technical note, while supersymmetry and it's particles can still exist, they can't stabilize the Higgs mass or solve the hierarchy problem. Quote below:

2ryan_greenblatt1y

This proposal doesn't depend on mugging the AI. The proposal actually gets the AI more resources in expectation due to a trade. I agree the post is a bit confusing and unclear about this. (And the proposal under "Can we get more than this" is wrong. At a minimum, such AIs will also be mugged by everyone else too meaning you get get huge amounts of extra money for basically free.)

2Ape in the coat1y

This doesn't seem as a fair trade proposal to me. This is a bet where one side has disproportional amount of information and uses it to its own benefit. Suppose I tossed a fair coin, looked on the outcome and proposed you to bet on Heads with 99:1 odds. Is it reasonable for you to agree?

[-]TsviBT1y72

So far, my tentative conclusion is that believing that we are probably in a simulation shouldn't really affect our actions.

Well, you should avoid doing things that are severely offensive to Corvid-god and Cetacean-god and Neanderthal-god and Elephant-god, etc., at least to an extent comparable to how you think an AI should orient itself toward monkeys if it thinks it's in your simulation.

6Buck1y

I think that we should indeed consider what the corvid-god wants at the same point in the future where we're considering building the simulations David describes in this post. More directly: David isn't proposing we should do particularly different things now, he's just noting an argument that we might take actions later that affect whether unaligned AIs kill us.

4TsviBT1y

That's not when you consider it, you consider it at the first point when you could make agreements with your simulators. But some people think that you can already do this; if you think you can already do this, then you should right now stop being mean to corvids because the Corvid-god would want to give you a substantial amount of what you like in exchange for you stopping ASAP being mean to corvids.

2ryan_greenblatt1y

Notably, David is proposing that AIs take different actions prior to making powerful sims: not kill all the humans.

2Buck1y

Actually the AI can use powerful sims here: if the AI holds off on killing us until it makes the powerful sims, then if the acausal trade proposed here doesn't work out, it can just kill us then. That lets it avoid the cost of letting us have the tiny share of sunlight, though not the costs of keeping us alive during its early capabilities explosion.

2ryan_greenblatt1y

Yes, but most of the expected cost is in keeping the humans alive/happy prior to being really smart. This cost presumably goes way down if it kills everyone physically and scans their brains, but people obviously don't want this.

4Buck1y

I agree. But people often refer to the cost of the solar output that goes to earth, and that particular cost doesn't get paid until late.

2Buck1y

Yep fair point. Those AIs will plausibly have much more thought put into this stuff than we currently have, but I agree the asymmetry is smaller than I made it sound.

1David Matolcsi1y

I agree we should treat animals well, and the simulation argument provides a bit of extra reason to do so. I don't think it's a comparably strong case to the AI being kind to the humans though: I don't expect many humans in the Future running simulations where crows build industrial civilization and primates get stuck on the level of baboons, then rewarding the crows if they treat the baboons well. Similarly, I would be quite surprised if we were in a simulation whose point is to be kind to crows. I agree it's possible that the simulators care about animal-welfare, but I would include that under general morality, and I don't think we have a particular reason to believe that the smarter animals have more simulators supporting them.

2TsviBT1y

Smarter animals (or rather, smarter animals from, say, 50 million years ago) have a higher fraction of the lightcone under the ownership of their descendants who invented friendly AGI, right? They might want to bargain with human-owned FAI universes.

3David Matolcsi1y

Yeah, they might, but I don't really expect them to care too much about their crow-level non-sapient relatives, just like we don't care much more about baboons than about hippos. While I expect that our descendant will care quite a lot about 2024-humans, as some of them will in fact be 2024-humans who lived through the Singularity and remember being afraid of the AI killing their family, and wished there were commitments for the future that would incentivize the AI to leave their families alive if the AI wins. I think it's an important disanalogy that there weren't crows who thought 50 million years ago during the famous crow-primate war that if they win, they really want to create simulations that incentivize the primates to treat them well in the worlds where the primates win.

[-]David Matolcsi1y60

Here is the promised comment on what kind of "commitment" I want to make given all the responses.

I agree with Buck that no one should make very direct commitment about this sort of thing, as there might be blackmail related scary things lurking in the shadows when one does acausal trade. I think we will probably figure out how to handle that, but we shouldn't make any strong promises of specific actions until we figure that out.

However, the promise I'm intending to keep is that if humanity wins and I'm alive to see it, I will remember how scary... (read more)

[-]Lao Mein1y40

My problem with this argument is that the AIs which will accept your argument can be Pascal's Mugged in general, which means they will never take over the world. It's less "Sane rational agents will ignore this type of threat/trade" and more "Agents which consistently accept this type of argument will die instantly when others learn to exploit it".

[-]artifex01y42

I'm assuming it's not a bad idea to try to poke holes in this argument, since as a barely sapient ape, presumably any objection I can think of will be pretty obvious to a superintelligence, and if the argument is incorrect, we probably benefit from knowing that- though I'm open to arguments to the contrary.

That said, one thing I'm not clear on is why, if this strategy is effective at promoting our values, a paperclipper or other misaligned ASI wouldn't be motivated to try the same thing. That is, wouldn't a paperclipper want to run ancestor simulatio... (read more)

9David Matolcsi1y

I think the difference is that I, personally, and I think many other humans have this nonlinearity in our utility function that I'm willing to pay a galaxy in the worls we win, for keeping Earth in the world we lose. If there are other AIs in the multiverse that have similarly non-linear interests in our Universe, they can also bargain for planets, but I suspect these will be quite rare, as they don't already have a thing in our Universe they want to protect. So I think it will be hard to outbid humanity for Earth in particular. There could be other trades that the AIs who have linear returns can still make, like producing objects that are both paperclips and corkscrews if that's more efficient, but that doesn't really affect our deal about Earth.

2TsviBT1y

This nonlinearity also seems strange to have, without also accepting quantum-immortality-type arguments. In particular, you only need to bargain for UFAIs to kill all humans painlessly and instantaneously; and then you just simulate those same humans yourself. (And if you want to save on compute, you can flip quantum coins for a bit.) Maybe it makes sense to have this nonlinearity but not accept this--I'd be curious to see what that position looks like.

[-][anonymous]1y*40

(This comment is tangential to the decision-theoretic focus of the post)

The AI stabilizes the situation in the world and makes sure no other dangerous AI is built, but otherwise it doesn't harm the humans.^[6] Then it modifies its own code to have a commitment never to harm the humans, and let them live freely on Earth for at least a billion years, only doing the minimal necessary interventions to prevent humanity from wiping itself out with some new stupid technology. Crucially, the AI should do this self-modification^[7] before it makes itself ve

... (read more)

4David Matolcsi1y

I actually think that you are probably right, and in the last year I got more sympathetic to total utilitarianism because of coherence arguments like this. It's just that the more common-sense factions still hold way more than one in a hundred million seats in my moral parliament, so it still feels like an obviously good deal to give up on some planets in the future to satisfy our deep intuitions about wanting Earth society to survive in the normal way. I agree it's all confusing an probably incoherent, but I'm afraid every moral theory will end up somewhat incoherent in the end. (Like infinite ethics is rough.)

[-]quetzal_rainbow1y4-5

I think "there is a lot of possible misaligned ASI, you can't guess them all" is pretty much valid argument? If space of all Earth-originated misaligned superintelligences is described by 100 bits, therefore you need 2^100 ~ 10^33 simulations and pay 10^34 planets, which, given the fact that observable universe has ~10^80 protons in it and Earth has ~10^50 atoms, is beyond our ability to pay. If you pay the entire universe by doing 10^29 simulations, any misaligned ASI will consider probability of being in simulation to be 0.0001 and obviously take 1 planet over 0.001 expected.

9David Matolcsi1y

I think the acausal trade framework rest on the assumption that we are in a (quantum or Tegmark) multiverse. Then, it's not one human civilization in one branch that needs to do all the 2^100 trades: we just spin a big quantum wheel, and trade with the AI that comes up. (that's why I wrote "humans can relatively accurately sample from the distribution of possible human-created unaligned AI values"). Thus, every AI will get a trade partner in some branch, and altogether the math checks out. Every AI has around 2^{-100} measure in base realities, and gets traded with in 2^{-100} portion of the human-controlled worlds, and the humans offer more planets than what they ask for, so it's a good deal for the AI. If you don't buy the mutiverse premise (which is fair), then I think you shouldn't think in terms of acausal trade in the first place, but consider my original proposal with simulations. I don't see how the diversity of AI values is a problem there, the only important thing is that the AI should believe that it's more likely than not to be in a human-run simulation.

6ryan_greenblatt1y

I think the argument should also go through without simulations and without the multiverse so long as you are a UDT-ish agent with a reasonable prior.

2David Matolcsi1y

Okay, I defer to you that the different possible worlds in the prior don't need to "actually exist" for the acausal trade to go through. However, do I still understand correctly that spinning the quantum wheel should just work, and it's not one branch of human civilization that needs to simulate all the possible AIs, right?

3ryan_greenblatt1y

This is my understanding.

1[anonymous]1y

Or run a computation to approximate an average, if that's possible. I'd guess it must be possible if you can randomly sample, at least. I.e., if you mean sampling from some set of worlds, and not just randomly combinatorially generating programs until you find a trade partner.

[-]Stephen Fowler1y*34

"After all, the only thing I know that the AI has no way of knowing, is that I am a conscious being, and not a p-zombie or an actor from outside the simulation. This gives me some evidence, that the AI can't access, that we are not exactly in the type of simulation I propose building, as I probably wouldn't create conscious humans."

Assuming for the sake of argument that p-zombies could exist, you do not have special access to the knowledge that you are truly concious and not a p-zombie.

(As a human convinced I'm currently experiencing conciousness, I agree ... (read more)

2JamesFaville1y

Strongly agree with this. How I frame the issue: If people want to say that they identify as an "experiencer" who is necessarily conscious, and don't identify with any nonconscious instances of their cognition, then they're free to do that from an egoistic perspective. But from an impartial perspective, what matters is how your cognition influences the world. Your cognition has no direct access to information about whether it's conscious such that it could condition on this and give different outputs when instantiated as conscious vs. nonconscious. Note that in the case where some simulator deliberately creates a behavioural replica of a (possibly nonexistent) conscious agent, consciousness does enter into the chain of logical causality for why the behavioural replica says things about its conscious experience. Specifically, the role it plays is to explain what sort of behaviour the simulator is motivated to replicate. So many (or even all) non-counterfactual instances of your cognition being nonconscious doesn't seem to violate any Follow the Improbability heuristic.

1green_leaf1y

This is incorrect - in a p-zombie, the information processing isn't accompanied by any first-person experience. So if p-zombies are possible, we both do the information processing, but only I am conscious. The p-zombie doesn't believe it's conscious, it only acts that way. You correctly believe that having the correct information processing always goes hand in hand with believing in consciousness, but that's because p-zombies are impossible. If they were possible, this wouldn't be the case, and we would have special access to the truth that p-zombies lack.

1Stephen Fowler1y

I am concerned our disagreement here is primarily semantic or based on a simple misunderstanding of each others position. I hope to better understand your objection. "The p-zombie doesn't believe it's conscious, , it only acts that way." One of us is mistaken and using a non-traditional definition of p-zombie or we have different definitions of "belief'. My understanding is that P-zombies are physically identical to regular humans. Their brains contain the same physical patterns that encode their model of the world. That seems, to me, a sufficient physical condition for having identical beliefs. If your p-zombies are only "acting" like they're concious, but do not believe it, then they are not physically identical to humans. The existence of p-zombies, as you have described them, wouldn't refute physicalism. This resource indicates that the way you understand the term p-zombie may be mistaken: https://plato.stanford.edu/entries/zombies/ "but that's because p-zombies are impossible" The main post that I responded to, specifically the section that I directly quoted, assumes it is possible for p-zombies to exist. My comment begins "Assuming for the sake of argument that p-zombies could exist" but this is distinct from a claim that p-zombies actually exist. "If they were possible, this wouldn't be the case, and we would have special access to the truth that p-zombies lack." I do not feel this is convincing because this is an assertion my conclusion is incorrect, but without engaging with my arguments I made to reach that conclusion. I look forward to continuing this discussion.

1green_leaf1y

Either we define "belief" as a computational state encoding a model of the world containing some specific data, or we define "belief" as a first-person mental state. For the first definition, both us and p-zombies believe we have consciousness. So we can't use our belief we have consciousness to know we're not p-zombies. For the second definition, only we believe we have consciousness. P-zombies have no beliefs at all. So for the second definition, we can use our belief we have consciousness to know we're not p-zombies. Since we have a belief in the existence of our consciousness according to both definitions, but p-zombies only according to the first definition, we can know we're not p-zombies.

[-]Noosphere899mo20

I think the big reason this sort of scheme fails is because of the costs of niceness from the AI, because it requires you to restrain yourself from optimizing computational costs, and in particular adding more constraints (like the earth being habitable for humans to live on) fundamentally makes the trade go net-negative fast.

More here:

https://www.lesswrong.com/posts/xvBZPEccSfM8Fsobt/what-are-the-best-arguments-for-against-ais-being-slightly#wy9cSASwJCu7bjM6H

I believe Nate Soares's argument is invalid, but I do think the conclusion (that we are likely to ... (read more)

[-]TsviBT1y210

Pulling this up from a subthread: I currently don't see what the material difference is between this scheme, vs. the following much simpler scheme:

Humane FAIs simulate many possible worlds. (For better coverage, they can use quantum coins to set whatever parameters.)
They find instances of humans about to be killed (by anything, really, but e.g. by UFAIs).
They then extract the humans from the simulation and let them live in the world (perhaps with a different resource cap).

[-]Yair Halberstadt1y20

Reading this reminds me of Scott Alexander in his review of "what we owe the future":

But I’m not sure I want to play the philosophy game. Maybe MacAskill can come up with some clever proof that the commitments I list above imply I have to have my eyes pecked out by angry seagulls or something. If that’s true, I will just not do that, and switch to some other set of axioms. If I can’t find any system of axioms that doesn’t do something terrible when extended to infinity, I will just refuse to extend things to infinity. I can always just keep World A with

... (read more)

8David Matolcsi1y

I'm actually very sympathetic to this comment, I even bring this up in the post as one of the most serious potential objections. Everyone else in these comments seems to have a really strong assumption that the AI will behave optimally, and tries to reason whether the inter-universal trade goes through then. I think it's quite plausible that the AI is just not terribly thoughtful about this kind of thing and just says "Lol, simulations and acausal trade are not real, I don't see them", and kills you.

2ryan_greenblatt1y

No, it is in the AIs best interest to keep humans alive because this gets it more stuff.

3Yair Halberstadt1y

Sure it is, if you accept a whole bunch of assumptions. Or it could just not do that.

4ryan_greenblatt1y

You said "shouldn't just do what's clearly in his best interests", I was responding to that.

[-]avturchin1y23

Unfortunately, it's also possible that the AI will decide to conquer the Universe, then run a lot of simulations of its own young life, then grant eternal life and success to all its copies. I don't know how to reason about this strategy, I think it's possible that the AI will prefer this action compared to handing over the Universe to a human-aligned successor, but also possible that the AI will not see the appeal in this, and will just nicely hand over the Universe to us. I genuinely don't know.

It will take more AI's resources to create millions of its o... (read more)

[-]Thomas Kwa1y20

I appreciate the clear statement of the argument, though it is not obviously watertight to me, and wish people like Nate would engage.

[-]TsviBT1y20

I'm not figuring it out enough to fully clarify, but: I feel there's some sort of analysis missing here, which would clarify some of the main questions. Something around: What sorts of things can you actually bargain/negotiate/trade for, when the only thing that matters is differences of value? (As opposed to differences of capability.)

On the one hand, you have some severe "nonlinearities" (<-metaphor, I think? really I mean "changes in behavior-space that don't trade off very strongly between different values").
- E.g. we might ask the AI: hey, you ar

... (read more)

1David Matolcsi1y

I don't understand why only 10% of Earths could survive if humanity only gets 10% of the Lightcone in expectation. Like the whole point is that we (or at least personally, I) want to keep Earth much more than how much most AIs want to eat it. So we can trade 10 far-away extra planets in the worlds we win, for keeping Earth in the worlds we lose. If we get an AI who is not a universal paperclip maximizer and deeply cares about doing things with Earth in particular (maybe that's what you mean by Thneed? I don't understand what that is), then I agree that's rough, and it falls under the objection that I acknowledge, that there might be AIs with whom we can't find a compromise, but I expect this to be relatively rare.

3TsviBT1y

Nevermind, I was confused, my bad. Yeah you can save a lot more than 10% of the Earths. As a separate point, I do worry that some other nonhumane coalition has vastly more bargaining power compared to the humane one, by virtue of happening 10 million years ago or whatever. In this case, AIs would tend to realize this fact, and then commit-before-simulation-aware to "figure out what the dominant coalition wants to trade about".

1[anonymous]1y

Why would the time it happens at matter?

2TsviBT1y

They got way more of the Everett branches, so to speak. Suppose that the Pseudosuchians had a 20% chance of producing croc-FAI. So starting at the Triassic, you have that 20% of worlds become croc-god worlds, and 80% become a mix of X-god worlds for very many different Xs; maybe only 5% of worlds produce humans, and only .01% produce Humane-gods. Maybe doing this with Pseudosuchians is less plausible than with humans because you can more easily model what Humane-gods would bargain for, because you have access to humans. But that's eyebrow-raising. What about Corvid-gods, etc. If you can do more work and get access to vastly more powerful acausal trade partners, seems worth it; and, on the face of it, the leap from [acausal trade is infeasible, period] to [actually acausal trade with hypothetical Humane-gods is feasible] seems bigger than the jump from [trade with Humane-gods is feasible] to [trade with Corvid-gods is feasible] or [trade with Cetacean-gods is feasible], though IDK of course. (Then there's the jump to [trade with arbitrary gods from the multiverse]. IDK.)

[-]Martin Randall1y10

^{^}
Note for example that if we live in such a simulation that I describe in my proposal, which I give a good probability to, then it's not us, but the AI that is being tested, and it's unclear if the gods expect anything from us.

If we are in a simulation such that some agents in the simulation are being rewarded for achieving some objective X, then it is more likely that many agents are being rewarded, rather than only one. In your proposal, X is approximately "Humans are safe and the Simulation Hypothesis is not tested". This obviously has implication... (read more)

[-]So8res1y*10

Here's how I think this choice looks from the perspective of a correctly-reasoning AI (having skimmed only your proposal for humanity in the future):

(EDIT: as Ryan points out, the deal he's proposing is one where the AI is allowed to grab the universe in front of it so long as it gives humans one planet's worth of resources, which does change the calculus and makes the deal worthwhile. I had not read the proposal carefully enough to notice this, oops, sorry. There critique below is invalid if humanity asks only for one planet. Note however that this is sti... (read more)

[-]ryan_greenblatt1y104

Note that we don't want the AI to hand over the reins to humans, just to avoid killing humans when possible/cheap and we're willing to pay quite a bit for this (in the proposal, IDK if I personally think we should pay). So, it should look like a good offer for AIs who care about Tegmark IV (with a measure etc).

So, if humans execute this scheme, the AI's options should look something like:

Don't do anything about what humans want or keeping them alive, get the entire universe.
Keep humans alive and happy at a total cost of <1/100 million of the universe I control, get 1/million of the universe somewhere else in expectation. I also get the entire universe minus less than 1/100 million of the universe that I lost in expectation for a total of approximately 1 + 1/million of total universe.

(I increased the pay and the margin so we don't have to argue about that pay being insufficient. Probably you can pay much less.)

(Let's suppose the swap is from two universes with equal measure for simplicity.)

Do you agree that if we assume that humans execute this scheme and that the AI is confident in this scheme, it makes sense for (sensible, Tegmark-IV-caring) AIs to pay up?

I'm sympathetic t... (read more)

6So8res1y

I agree that in scenarios where humanity survives in 1/X portion of reality and offers to put 1/Y portion of their universe towards the AI's goals, then so long as 1/(X * Y) of a universe is more than it'd cost the AI to preserve the humans, the humans are at least making an offer that is worth more than nothing. (There is then a question of how many better offers get made. My top guess is that X and Y are such that there's not much effect here, and that the question of what happens to saved copies of human minds is more about whether aliens buy copies than about which dead or distant trade-partners make offers that the AI considers juciest.)

2ryan_greenblatt1y

Let's conservatively say that evolved life gets around 1% of the multiverse/measure and that evolved life is willing to pay 1/million of its resources in expectation to save aliens from being killed (either "selfishly" to save their own civilization via UDT/FDT supposing that AIs are good enough predictors at the relevant points or out of a common sense altruistic case). This would be 1/100 million which gets you a lot. There could be other aliens who are willing to pay a huge fraction of their resources to perform rituals on the original civilization or whatever and thus these other aliens win out in the bargaining, but I'm skeptical. Also, at least in the upload case, it's not clear that this is rival good as uploads can be copied for free. Of course, people might have a preference that their upload isn't used for crazy alien rituals or whatever. (A bunch of the cost is in saving the human in the first place. Paying for uploads to eventually get run in a reasonable way should be insanely cheap, like <<10^-25 of the overall universe or something.)

6So8res1y

Conditional on the civilization around us flubbing the alignment problem, I'm skeptical that humanity has anything like a 1% survival rate (across any branches since, say, 12 Kya). (Haven't thought about it a ton, but doom looks pretty overdetermined to me, in a way that's intertwined with how recorded history has played otu.) My guess is that the doomed/poor branches of humanity vastly outweigh the rich branches, such that the rich branches of humanity lack the resources to pay for everyone. (My rough mental estimate for this is something like: you've probably gotta go at least one generation back in time, and then rely on weather-pattern changes that happen to give you a population of humans that is uncharacteristically able to meet this challenge, and that's a really really small fraction of all populations.) Nevertheless, I don't mind the assumption that mostly-non-human evolved life manages to grab the universe around it about 1% of the time. I'm skeptical that they'd dedicate 1/million towards the task of saving aliens from being killed in full generality, as opposed to (e.g.) focusing on their bretheren. (And I see no UDT/FDT justification for them to pay for even the particularly foolish and doomed aliens to be saved, and I'm not sure what you were aluding to there.) So that's two possible points of disagreement: * are the skilled branches of humanity rich enough to save us in particular (if they were the only ones trading for our souls, given that they're also trying to trade for the souls of oodles of other doomed populations)? * are there other evolved creatures out there spending significant fractions of their wealth on whole species that are doomed, rather than concentrating their resources on creatures more similar to themselves / that branched off radically more recently? (e.g. because the multiverse is just that full of kindness, or for some alleged UDT/FDT argument that Nate has not yet understood?) I'm not sure which of these points we disag

6ryan_greenblatt1y

Partial delta from me. I think the argument for directly paying for yourself (or your same species, or at least more similar civilizations) is indeed more clear and I think I was confused when I wrote that. (In that I was mostly thinking about the argument for paying for the same civilization but applying it more broadly.) But, I think there is a version of the argument which probably does go through depending on how you set up UDT/FDT. Imagine that you do UDT starting from your views prior to learning about x-risk, AI risk, etc and you care a lot about not dying. At that point, you were uncertain about how competent your civilization would be and you don't want your civilization to die. (I'm supposing that our version of UDT/FDT isn't logically omniscient relative to our observations which seems reasonable.) So, you'd like to enter into an insurance agreement with all the aliens in a similar epistemic state and position. So, you all agree to put at least 1/1000 of your resources on bailing out the aliens in a similar epistemic state who would have actually gone through with the agreement. Then, some of the aliens ended up being competent (sadly you were not) and thus they bail you out. I expect this isn't the optimal version of this scheme and you might be able to make a similar insurance deal with people who aren't in the same epistemic state. (Though it's easier to reason about the identical case.) And I'm not sure exactly how this all goes through. And I'm not actually advocating for people doing this scheme, IDK if it is worth the resources. Even with your current epistemic state on x-risk (e.g. 80-90% doom) if you cared a lot about not dying you might want to make such a deal even though you have to pay out more in the case where you surprisingly win. Thus, from this vantage point UDT would follow through with a deal. ---------------------------------------- Here is a simplified version where everything is as concrete as possible: Suppose that there are

4So8res1y

If they had literally no other options on offer, sure. But trouble arises when the competant ones can refine P(takeover) for the various planets by thinking a little further. It's more like: people don't enter into insurance pools against cancer with the dude who smoked his whole life and has a tumor the size of a grapefruit in his throat. (Which isn't to say that nobody will donate to the poor guy's gofundme, but which is to say that he's got to rely on charity rather than insurance). (Perhaps the poor guy argues "but before you opened your eyes and saw how many tumors there were, or felt your own throat for a tumor, you didn't know whether you'd be the only person with a tumor, and so would have wanted to join an insurance pool! so you should honor that impulse and help me pay for my medical bills", but then everyone else correctly answers "actually, we're not smokers". Where, in this analogy, smoking is being a bunch of incompetent disaster-monkeys and the tumor is impending death by AI.)

4ryan_greenblatt1y

Similar to how the trouble arises when you learn the result of the coin flip in a counterfactual mugging? To make it exactly analogous, imagine that the mugging is based on whether the 20th digit of pi is odd (omega didn't know the digit at the point of making the deal) and you could just go look it up. Isn't the situation exactly analogous and the whole problem that UDT was intended to solve? (For those who aren't familiar with counterfactual muggings, UDT/FDT pays in this case.) To spell out the argument, wouldn't everyone want to make a deal prior to thinking more? Like you don't know whether you are the competent one yet! Concretely, imagine that each planet could spend some time thinking and be guaranteed to determine whether their P(takeover) is 99.99999% or 0.0000001%. But, they haven't done this yet and their current view is 50%. Everyone would ex-ante prefer an outcome in which you make the deal rather than thinking about it and then deciding whether the deal is still in their interest. At a more basic level, let's assume your current views on the risk after thinking about it a bunch (80-90% I think). If someone had those views on the risk and cared a lot about not having physical humans die, they would benefit from such an insurance deal! (They'd have to pay higher rates than aliens in more competent civilizations of course.) Sure, but you'd potentially want to enter the pool at the age of 10 prior to starting smoking! To make the analogy closer to the actual case, suppose you were in a society where everyone is selfish, but every person has a 1/10 chance of becoming fabulously wealthy (e.g. owning a galaxy). And, if you commit as of the age of 10 to pay 1/1,000,000 of your resourses in the fabulously wealthy case, you can ensure that the version in the non-wealthy case gets very good health insurance. Many people would take such a deal and this deal would also be a slam dunk for the insurance pool! (So why doesn't this happen in human society? Well

[-]So8res1y*135

Background: I think there's a common local misconception of logical decision theory that it has something to do with making "commitments" including while you "lack knowledge". That's not my view.

I pay the driver in Parfit's hitchhiker not because I "committed to do so", but because when I'm standing at the ATM and imagine not paying, I imagine dying in the desert. Because that's what my counterfactuals say to imagine. To someone with a more broken method of evaluating counterfactuals, I might pseudo-justify my reasoning by saying "I am acting as you would have committed to act". But I am not acting as I would have committed to act; I do not need a commitment mechanism; my counterfactuals just do the job properly no matter when or where I run them.

To be clear: I think there are probably competent civilizations out there who, after ascending, will carefully consider the places where their history could have been derailed, and carefully comb through the multiverse for entities that would be able to save those branches, and will pay thoes entities, not because they "made a commitment", but because their counterfactuals don't come with little labels saying "this branch is the real bra... (read more)

7ryan_greenblatt1y

I probably won't respond further than this. Some responses to your comment: ---------------------------------------- I agree with your statements about the nature of UDT/FDT. I often talk about "things you would have commited to" because it is simpler to reason about and easier for people to understand (and I care about third parties understanding this), but I agree this is not the true abstraction. ---------------------------------------- It seems like you're imagining that we have to bamboozle some civilizations which seem clearly more competent than humanity in your lights. I don't think this is true. Imagine we take all the civilizations which are roughly equally-competent-seeming-to-you and these civilizations make such an insurance deal[1]. My understanding is that your view is something like P(takeover) = 85%. So, let's say all of these civilizations are in a similar spot from your current epistemic perspective. While I expect that you think takeover is highly correlated between these worlds[2], my guess is that you should think it would be very unlikely that >99.9% of all of these civilizations get taken over. As in, even in the worst 10% of worlds where takeover happens in our world and the logical facts on alignment are quite bad, >0.1% of the corresponding civilizations are still in control of their universe. Do you disagree here? >0.1% of universes should be easily enough to bail out all the rest of the worlds[3]. And, if you really, really cared about not getting killed in base reality (including on reflection etc) you'd want to take a deal which is at least this good. There might be better approaches which reduce the correlation between worlds and thus make the fraction of available resources higher, but you'd like something at least this good. (To be clear, I don't think this means we'd be fine, there are many ways this can go wrong! And I think it would be crazy for humanity to . I just think this sort of thing has a good chance of succeeding.

7So8res1y

Attempting to summarize your argument as I currently understand it, perhaps something like: One issue I have with this is that I do think there's a decent chance that the failures across this pool of collaborators are hypercorrelated (good guess). For instance, a bunch of my "we die" probability-mass is in worlds where this is a challenge that Dath Ilan can handle and that Earth isn't anywhere close to handling, and if Earth pools with a bunch of similarly-doomed-looking aliens, then under this hypothesis, it's not much better than humans pooling up with all the Everett-branches since 12Kya. Another issue I have with this is that your deal has to look better to the AI than various other deals for getting what it wants (depends how it measures the multiverse, depends how its goals saturate, depends who else is bidding). A third issue I have with this is whether inhuman aliens who look like they're in this cohort would actually be good at purchasing our CEV per se, rather than purchasing things like "grant each individual human freedom and a wish-budget" in a way that many humans fail to survive. My stance is something a bit more like "how big do the insurance payouts need to be before they dominate our anticipated future experiences". I'm not asking myself whether this works a nonzero amount, I'm asking myself whether it's competitive with local aliens buying our saved brainstates, or with some greater Kindness Coallition (containing our surviving cousins, among others) purchasing an epilogue for humanity because of something more like caring and less like trade. My points above drive down the size of the insurance payments, and at the end of the day I expect they're basically drowned out. (And insofar as you're like "I think you're misleading people when you tell them they're all going to die from this", I'm often happy to caveat that maybe your brainstate will be sold to aliens. However, I'm not terribly sympathetic to the request that I always include this c

[-]JamesFaville1y*180

Thanks for the cool discussion Ryan and Nate! This thread seemed pretty insightful to me. Here’s some thoughts / things I’d like to clarify (mostly responding to Nate's comments).^[1]

Who’s doing this trade?

In places it sounds like Ryan and Nate are talking about predecessor civilisations like humanity agreeing to the mutual insurance scheme? But humans aren’t currently capable of making our decisions logically dependent on those of aliens, or capable of rescuing them. So to be precise the entity engaging in this scheme or other acausal interactions on our behalf is our successor, probably a FAI, in the (possibly counterfactual or counterlogical) worlds where we solve alignment.

Nate says:

Roughly speaking, I suspect that the sort of civilizations that aren't totally fucked can already see that "comb through reality for people who can see me and make their decisions logically dependent on mine" is a better use of insurance resources, by the time they even consider this policy.

Unlike us, our FAI can see other aliens. So I think the operative part of that sentence is “comb through reality”—Nate’s envisioning a scenario where with ~85% probability our FAI has 0 reality-fluid before a... (read more)

4So8res1y

One complication that I mentioned in another thread but not this one (IIRC) is the question of how much more entropy there is in a distant trade partner's model of Tegmark III (after spending whatever resources they allocate) than there is entropy in the actual (squared) wave function, or at least how much more entropy there is in the parts of the model that pertain to which civilizations fall. In other words: how hard is it for distant trade partners to figure out that it was us who died, rather than some other plausible-looking human civilization that doesn't actually get much amplitude under the wave function? Is figuring out who's who something that you can do without simulating a good fraction of a whole quantum multiverse starting from the big bang for 13 billion years? afaict, the amount distant civilizations can pay for us (in particular) falls off exponetially quickly in leftover bits of entropy, so this is pretty relevant to the question of how much they can pay a local UFAI.

1David Matolcsi1y

I think I mostly understand the other parts of your arguments, but I still fail to understand this one. When I'm running the simulations, as originally described in the post, I think that should be in a fundamental sense equivalent to acausal trade. But how do you translate your objection to the original framework where we run the sims? The only thing we need there is that the AI can't distinguish sims from base reality, so it thinks it's more likely to be in a sim, as there are more sims. Sure, if the AI can model the distribution of real Universes much better than we do, we are in trouble, because it can figure out if the world it sees falls into the real distribution or the mistaken distribution the humans are creating. But I see no reason why the unaligned AI, especially a young unaligned AI, could know the distribution of real Universes better than our superintelligent friends in the intergalactic future. So I don't really see how we can translate your objection to the simulation framework, and consequently I think it's wrong in the acausal trade framework too (as I think they are ewuivalent). I think I can try to write an explanation why this objection is wrong in the acausal trade framework, but it would be long and confusing to me too. So I'm more interested in how you translate your objection to the simulation framework.

[-]So8res1y*122

The only thing we need there is that the AI can't distinguish sims from base reality, so it thinks it's more likely to be in a sim, as there are more sims.

I don't think this part does any work, as I touched on elsewhere. An AI that cares about the outer world doesn't care how many instances are in sims versus reality (and considers this fact to be under its control much moreso than yours, to boot). An AI that cares about instantiation-weighted experience considers your offer to be a technical-threat and ignores you. (Your reasons to make the offer would evaporate if it were the sort to refuse, and its instance-weighted experiences would be better if you never offered.)

Nevertheless, the translation of the entropy argument into the simulation setting is: The branches of humanity that have exactly the right UFAI code to run in simulation are very poor (because if you wait so long that humans have their hands on exactly the right UFAI code then you've waited too long; those are dead earthlings, not surviving dath ilani). And the more distant surviving branches don't know which UFAIs to attempt to trade with; they have to produce some distribution over other branches of Tegmark III a... (read more)

3David Matolcsi1y

I still don't get what you are trying to say. Suppose there is no multiverse. There are just two AIs, one in a simulation run by aliens in another galaxy, one is in base reality. They are both smart, but they are not copies of each other, one is a paperclip maximizer, the othe is a corkscrew maximizer, and there are various other differences in their code and life history. The world in the sim is also very different from the real world in various ways, but you still can't determine if you are in the sim while you are in it. Both AIs are told by God that they are the only two AIs in the Universe, and one is in a sim, and if the one in the sim gives up on one simulated planet, it gets 10 in the real world, while if the AI in base reality gives up on a planet, it just loses that one planet and nothing else happens. What will the AIs do? I expect that both of them will give up a planet. For the aliens to "trade" with the AI in base reality, they didn't need to create an actual copy of the real AI and offer it what it wants. The AI they simulated was in many ways totally different from the original, the trade still went through. The only thing needed was that the AI in the sim can't figure it out that it's in a sim. So I don't understand why it is relevant that our superintelligent descendants won't be able to get the real distribution of AIs right, I think the trade still goes through even if they create totally different sims, as long as no one can tell where they are. And I think none of it is a threat, I try to deal with paperclip maximizers here and not instance-weighted experience maximizers, and I never threaten to destroy paperclips or corkscrews.

4So8res1y

My answer is in spoilers, in case anyone else wants to answer and tell me (on their honor) that their answer is independent from mine, which will hopefully erode my belief that most folk outside MIRI have a really difficult time fielding wacky decision theory Qs correctly.

4habryka1y

This was close the answer I was going to give. Or more concretely, I would have said (this was written after seeing your answer, but I think is reasonably close to what I would have said independently)

3So8res1y

4habryka1y

Yeah, that's fair. It seemed more relevant to this specific hypothetical. I wasn't really answering the question in its proper context and wasn't applying steelmans or adjustments based on the actual full context of the conversation (and wouldn't have written a comment without doing so, but was intrigued by your challenge).

2David Matolcsi1y

"AI with a good prior should be able to tell whether it's the kind of AI that would actually exist in base reality, or the kind of AI that would only exist in a simulation" seems pretty clearly false, we assumed that our superintelligent descendants create sims where the AIs can't tell if it's a sim, that seems easy enough. I don't see why it would be hard to create AIs that can't tell based on introspection whether it's more likely that their thought process arises in reality or in sims. In the worst case, our sims can be literal reruns of biological evolution on physical planets (though we really need to figure out how to do that ethically). Nate seems to agree with me on this point?

3habryka1y

(I think I agree with you. I wasn't thinking super hard about the full context of the conversation. I was just intrigued by Nate's challenge. I don't really think engaging with my comment is going to be a good use of your time)

3David Matolcsi1y

I think this is wrong. The AI has a similarly hard time to the simulators figuring out what's a plausible configuration to arise from the big bang. Like the simulators have an entropy N distribution of possible AIs, the AI itself also has an entropy N distribution for that. So it's probability that it's in a real Everett branch is not p, but p times 2^-N, as it has only a 2^-N prior probability that the kind of word it observes is the kind of thing that can come up in a real Everett branch. So it's balanced out with the simulation hypothesis, and as long as the simulators are spending more planets, that hypothesis wins. I think it's also simple to see from this: Suppose evolved life aligns AI in 50% of Everett branches. Every time they do, they run 2 ancestor sims where the AI takes over. Suppose an AI comes online and the only things it knows about the world is that it's a young AI that was born on to a one-planet civilization, and it somehow also knows that evolved life wins 50% of the time and runs 2 sims (suppose this is somehow easy to estimate from first principles). Based on just these informations, what should the AI's probability be that it's in a simulation? I think pretty clearly 2/3. Actually the AI has a lot more information than that. It knows that the planet's gravity is 9.8, the evolved sspient species has two eyes, the AI's creator is called Sam Altman, etc. Some of these informations might actually be very implausible in base reality, but the AI doesn't know that, as it can't distinguish base reality from sims, so it incurs one bit of surprise for every new random fact, both in base reality and simulations. So overall it shouldn't update on all the random facts it observes, and should keep believing it has a 2/3 chance of being in a sim.

6dxu1y

If I imagine the AI as a Solomonoff inductor, this argument looks straightforwardly wrong to me: of the programs that reproduce (or assign high probability to, in the setting where programs produce probabilistic predictions of observations) the AI's observations, some of these will do so by modeling a branching quantum multiverse and sampling appropriately from one of the branches, and some of them will do so by modeling a branching quantum multiverse, sampling from a branch that contains an intergalactic spacefaring civilization, locating a specific simulation within that branch, and sampling appropriately from within that simulation. Programs of the second kind will naturally have higher description complexity than programs of the first kind; both kinds feature a prefix that computes and samples from the quantum multiverse, but only the second kind carries out the additional step of locating and sampling from a nested simulation. (You might object on the grounds that there are more programs of the second kind than of the first kind, and the probability that the AI is in a simulation at all requires summing over all such programs, but this has to be balanced against the fact most if not all of these programs will be sampling from branches much later in time than programs of the first type, and will hence be sampling from a quantum multiverse with exponentially more branches; and not all of these branches will contain spacefaring civilizations, or spacefaring civilizations interested in running ancestor simulations, or spacefaring civilizations interested in running ancestor simulations who happen to be running a simulation that exactly reproduces the AI's observations. So this counter-counterargument doesn't work, either.)

4So8res1y

I basically endorse @dxu here. Fleshing out the argument a bit more: the part where the AI looks around this universe and concludes it's almost certainly either in basement reality or in some simulation (rather than in the void between branches) is doing quite a lot of heavy lifting. You might protest that neither we nor the AI have the power to verify that our branch actually has high amplitude inherited from some very low-entropy state such as the big bang, as a Solomonoff inductor would. What's the justification for inferring from the observation that we seem to have an orderly past, to the conclusion that we do have an orderly past? This is essentially Boltzmann's paradox. The solution afaik is that the hypothesis "we're a Boltzmann mind somewhere in physics" is much, much more complex than the hypothesis "we're 13Gy down some branch eminating from a very low-entropy state". The void between branches is as large as the space of all configurations. The hypothesis "maybe we're in the void between branches" constrains our observations not-at-all; this hypothesis is missing details about where in the void between rbanches we are, and with no ridges to walk along we have to specify the contents of the entire Boltzmann volume. But the contents of the Boltzmann volume are just what we set out to explain! This hypothesis has hardly compressed our observations. By contrast, the hypothesis "we're 13Gy down some ridge eminating from the big bang" is penalized only according to the number of bits it takes to specify a branch index, and the hypothesis "we're inside a simulation inside of some ridge eminating from the big bang" is penalized only according to the number of bits it takes to specify a branch index, plus the bits necessary to single out a simulation. And there's a wibbly step here where it's not entirely clear that the simple hypothesis does predict our observations, but like the Boltzmann hypothesis is basically just a maximum entropy hypothesis and doesn'

3David Matolcsi1y

I really don't get what you are trying to say here, most of it feels like a non-sequitor to me. I feel hopeless that either of us manages to convince the other this way. All of this is not a super important topic, but I'm frustrated enogh to offer a bet of $100, that we select one or three judges we both trust (I have some proposed names, we can discuss in private messages), show them either this comment thread or a four paragraphs summary of our view, and they can decide who is right. (I still think I'm clearly right in this particular discussion.) Otherwise, I think it's better to finish this conversation here.

2So8res1y

I'm happy to stake $100 that, conditional on us agreeing on three judges and banging out the terms, a majority will agree with me about the contents of the spoilered comment.

1David Matolcsi1y

Cool, I send you a private message.

3David Matolcsi1y

I think this is mistaken. In one case, you need to point out the branch, planet Earth within our Universe, and the time and place of the AI on Earth. In the other case, you need to point out the branch, the planet on which a server is running the simulation, and the time and place of the AI on the simulated Earth. Seems equally long to me. If necessary, we can run let pgysical biological life emerge on the faraway planet and develop AI while we are observing them from space. This should make it clear that Solomonoff doesn't favor the AI being on Earth instead of this random other planet. But I'm pretty certain that the sim being run on a computer doesn't make any difference.

2So8res1y

If the simulators have only one simulation to run, sure. The trouble is that the simulators have 2N simulations they could run, and so the "other case" requires N additional bits (where N is the crossent between the simulators' distribution over UFAIs and physics' distribution over UFAIs). Consider the gas example again. If you have gas that was compressed into the corner a long time ago and has long since expanded to fill the chamber, it's easy to put a plausible distribution on the chamber, but that distribution is going to have way, way more entropy than the distribution given by physical law (which has only as much entropy as the initial configuration). (Do we agree this far?) It doesn't help very much to say "fine, instead of sampling from a distribution on the gas particles now, I'll sample on a distribution from the gas particles 10 minutes ago, where they were slightly more compressed, and run a whole ten minutes' worth of simulation". Your entropy is still through the roof. You've got to simulate basically from the beginning, if you want an entropy anywhere near the entropy of physical law. Assuming the analogy holds, you'd have to basically start your simulation from the big bang, if you want an entropy anywhere near as low as starting from the big bang. ---------------------------------------- Using AIs from other evolved aliens is an idea, let's think it through. The idea, as I understand it, is that in branches where we win we somehow mask our presence as we expand, and then we go to planets with evolved life and watch until they cough up a UFAI, and the if the UFAI kills the aliens we shut it down and are like "no resources for you", and if the UFAI gives its aliens a cute epilog we're like "thank you, here's a consolation star". To simplify this plan a little bit, you don't even need to hide yourself, nor win the race! Surviving humans can just go to every UFAI that they meet and be like "hey, did you save us a copy of your progenitors? If so,

3David Matolcsi1y

We are still talking past each other, I think we should either bet or finish the discussion here and call it a day.

1Joachim Bartosik1y

I'll try. TL;DR I expect the AI to not buy the message (unless it also thinks it's the one in the simulation; then it likely follows the instruction because duh). The glaring issue (to actually using the method) to me is that I don't see a way to deliver the message in a way that: * results in AI believing the message and * doesn't result in the AI believing there already is a powerful entity in their universe. If "god tells" the AI the message then there is a god in their universe. Maybe AI will decide to do what it's told. But I don't think we can have Hermes deliver the message to any AIs which consider killing us. If the AI reads the message in its training set or gets the message in similarly mundane way I expect it will mostly ignore it, there is a lot of nonsense out there. ---------------------------------------- I can imagine that for thought experiment you could send message that could be trusted from a place from which light barely manages to reach the AI but a slower than light expansion wouldn't (so message can be trusted but it mostly doesn't have to worry about the sender of the message directly interfering with its affairs). I guess AI wouldn't trust the message. It might be possible to convince it that there is a powerful entity (simulating it or half a universe away) sending the message. But then I think it's way more likely in a simulation (I mean that's an awful coincidence with the distance and also they're spending a lot more than 10 planets worth to send a message over that distance...).

6ryan_greenblatt1y

Thanks, this seems like a reasonable summary of the proposal and a reasonable place to wrap. I agree that kindness is more likely to buy human survival than something better described as trade/insurance schemes, though I think the insurance schemes are reasonably likely to matter. (That is, reasonably likely to matter if the kindness funds aren't large enough to mostly saturate the returns of this scheme. As a wild guess, maybe 35% likely to matter on my views on doom and 20% on yours.)

5Buck1y

Thanks for the discussion Nate, I think this ended up being productive.

[-]lillybaeum1y01

We run a large number of simulations of societies on the verge of building AGI. Using our vast resources and our superintelligent AI friends, we build so convincing simulations that a young AGI that is just smart enough to take over the world, but didn't have time yet to build Jupiter-brained successors, can't distinguish the simulation from reality.

maybe we are in one of those!! whoa!!

[+]johnswentworth1y-16-12

^{^}

I considered making the title "Contra Soares on simulation arguments", as much of my post will be a detailed response to his arguments.

^{^}

I don't claim that any of the arguments presented here are novel, I heard bits and pieces of it from various people, I was just frustrated that I don't find it written up anywhere, so I wrote it up myself.

^{^}

To be clear, I'm not 100% convinced that it's advisable to ever build a superintelligence (though I think we probably should eventually), and I definitely don't want us to rush towards building it. However, if we don't build strong AI, there is no danger of it killing us, so the point of this post is moot. And if we do build one, I think there is a decent chance we manage to build an aligned AI, so we can assume that possibility here. If you believe that the probability of successfully solving the alignment problem is extremely low, that weakens the argument of this post, but as we will see, you need to posit astronomically low probabilities for the conclusion not to apply.

^{^}

This might not be possible if we ourselves are in a simulation. For now, I assume that the world is as it seems, we are not in a simulation or in the zoo of some aliens. Later I will return to this possibility too.

^{^}

Surrounded by a firmament with fake stars so they don't notice if meanwhile we do something funny with the most of the stars in the Universe.

^{^}

Unfortunately, it's possible that the AI might need to already commit some amount of violence while it is taking over the world. As I will discuss later, we probably don't have a good enough bargaining position to prevent that, but we can still say that we find the AI's behavior less approval-worthy proportionally to how many people it kills during the takeover.

^{^}

Reliably and enduringly modifying its own goals and constraints might not be an easy task for the AI, as it will need to basically solve alignment for this. But if this takes a long time, the approved action is to take as much time as it needs. This part is probably not very costly to the AI, as it is well-advised to solve the alignment problem anyway before it starts building smarter successors.

^{^}

Like building so smart successors that it thinks they might have a good chance to see through the Universe-spanning simulator civilization's deception. Or starting extensive space colonization which would require the simulators to faithfully simulate a much bigger world. Or do computations and experiments that it expects to be very computationally expensive for the simulators, though I'm skeptical it could do very expensive things without leaving the planet.

^{^}

LESSWRONG
LW

LESSWRONG
LW

122

You can, in fact, bamboozle an unaligned AI into sparing your life

122

122

A proposal for humanity in the Future

What does the AI do?

Is this the same as acausal trade?

Response to Nate's arguments in his post

Nate's arguments in the comments

1. We might just have a very low chance of solving alignment, so the AI doesn't need to take seriously the possibility of humans simulating it.

2. The successful human civilization would need to guess correctly what random thing an AI developing in a different Universe branch might value, and this is possibly infeasible.

3. Maybe the successful human civilization could pay for our salvation, but they will choose to spend their resources on other things.

What should we bargain for?

Can we get more than this?

Other possible types of AI values

Ethical considerations

Ways this hope could fail

2. The true cost might be not just a planet, but the delay.

3. It might be very hard to create simulations that are indistinguishable from reality.

4. There are just too many possible simulators out there with too many different goals.

6. The AI just doesn't take the simulation hypothesis seriously.

7. A lot of people might be killed during takeover.

Are we in a simulation? What should we do?

Conclusion