Dawn Drescher

I’m working on Impact Markets – markets to trade nonexcludable goods.

If you’re also interested in less directly optimific things – such as climbing around and on top of boulders or amateurish musings on psychology – then you may enjoy some of the posts I don’t cross-post from my blog, Impartial Priorities.

Pronouns: Ideally they. But he/she and gender-neutral neopronouns are fine too.

Wiki Contributions


My perhaps a bit naive take (acausal stuff, other grabby aliens, etc.) is that a conflict needs at least two, and humans are too weak and uncoordinated to be much of an adversary. Hence I’m not so worried about monopolar takeoffs. Not sure, though. Maybe I should be more worried about those too.

I expect that if you make a superintelligence it won’t need humans to tell it the best bargaining math it can use

I’m not a fan of idealizing superintelligences. 10+ years ago that was the only way to infer any hard information about worst-case scenarios. Assume perfect play from all sides, and you end up with a fairly narrow game tree that you can reason about. But now it’s a pretty good guess that superintelligences will be more advanced successors of GPT-4 and such. That tells us a lot about the sort of training regimes through which they might learn bargaining, and what sorts of bargaining solutions they might completely unreflectedly employ in specific situations. We can reason about what sorts of training regimes will instill which decision theories in AIs, so why not the same for bargaining.

If we think we can punt the problem to them, then we need to make sure they reflect on how they bargain and the game theoretic implication of that. We may want to train them to seek out gains from trade like it’s useful in a generally cooperative environment, rather than seek out exploits as it would be useful in a more hostile environment.

If we find that we can’t reliably punt the problem to them, we now still have the chance to decide on the right (or a random) bargaining solution and train enough AIs to adopt it (more than 1/3rd? Just particularly prominent projects?) to make it the Schelling point for future AIs. But that window will close when they (OpenAI, DeepMind, vel sim.) finalize the corpus of the training data for the AIs that’ll take over the world.

I don’t care about wars between unaligned AIs, even if they do often have them

Okay. I’m concerned with scenarios where at least one powerful AI is at least as (seemingly) well aligned as GPT-4.

Secondly, you need to assume that the pessimization of the superintelligence’s values would be bad, but in fact I expect it to be just as neutral as the optimization.

Can you rephrase? I don’t follow. It’s probably “pessimization” that throws me off?

why would either of them start the war?

Well, I’m already concerned about finite versions of that. Bad enough to warrant a lot of attention in my mind. But there are different reasons why that could happen. The one that starts the war could’ve made any of a couple different mistakes in assessing their opponent. It could make mistakes in the process of readying its weapons. Finally, the victim of the aggression could make mistakes assessing the aggressor. Naturally, that’s implausible if superintelligences are literally so perfect that they cannot make mistakes ever, but that’s not my starting point. I assume that they’re going to be about as flawed as the NSA, DoD, etc., only in different ways.

Sorry for glossing over some of these. E.g., I’m not sure if you consider ems to be “scientifically implausible technologies.” I don’t, but I bet there are people who could make smart arguments for why they are far off.

Reason 5 is actually a reason to prioritize some s-risk interventions. I explain why in the “tractability” footnote.

No, just a value-neutral financial instrument such as escrow. If two people can fight or trade, but they can’t trade, because they don’t trust each other, they’ll fight. That loses out on gains from trade, and one of them ends up dead. But once you invent escrow, there’s suddenly, in many cases, an option to do the trade after all, and both can live!

I’ve thought a bunch about acausal stuff in the context of evidential cooperation in large worlds, but while I think that that’s super important in and of itself (e.g., it could solve ethics), I’d be hard pressed to think of ways in which it could influence thinking about s-risks. I rather prefer to think of the perfectly straightforward causal conflict stuff that has played out a thousand times throughout history and is not speculative at all – except applied to AI conflict.

But more importantly it sounds like you’re contradicting my “tractability“ footnote? In it I argue that if there are solutions to some core challenges of cooperative AI – and finding them may not be harder than solving technical alignment – then there is no deployment problem: You can just throw the solutions out there and it’ll be in the self-interest of every AI, aligned or not, to adopt them.

I'm confused what you're saying, and curious. I would predict that this attitude toward suicide would indeed correlate with being open to discussing S-risks. Are you saying you have counter-data, or are you saying you don't have samples that would provide data either way?

I was just agreeing. :-3 In mainstream ML circles there is probably a taboo around talking about AI maybe doing harm or AI maybe ending up uncontrollable etc. Breaking that taboo was, imo, a good thing because it allowed us to become aware of the dangers AI could pose. Similarly, breaking a taboo around talking about things worse than death can be helpful to become aware of ways in which we may be steering toward s-risks.

It's basically like this 

I see! I have a bunch of friends who would probably consider their lives not worth living. They often express the wish to not have been born or at least consider their current well-being level to be negative. But I think only one of them might be in such a negative feedback loop, and I’m probably misdiagnosing her here. Two of them are bedridden due to Long Covid and despite their condition have amassed a wealth of knowledge on virus-related medicine, probably by googling things on their phones while lying down for ten minutes at a time. Others have tried every depression drug under the sun. Other have multiple therapists. They are much more held back by access and ability than by motivation, even though motivation is probably also hard to come by in that state.

Surely you can see that this isn't common, and the normal response is to just be broken until you die.

Idk, Harold and Maude is sort of like that. I’ve actually done a back-of-the-envelope calculation, which is perhaps uncommon, but the general spirit of the idea seems normal enough to me? Then again I could easily be typical-minding.

I’d prefer to keep these things separate, i.e. (1) your moral preference that “a single human death is worse than trillions of years of the worst possible suffering by trillions of people” and (2) that there is a policy-level incentive problem that implies that we shouldn’t talk about s-risks because that might cause a powerful idiot to take unilateral action to increase x-risk.

I take it that statement 1 is a very rare preference. I, for one, would hate for it to be applied to me. I would gladly trade any health state that has a DALY disability weight > 0.05 or so for a reduction of my life span by the same duration. I’m not saying that you shouldn’t live forever, but I only want to if my well-being is sufficiently high (around or a bit higher than my current level).

Statement 2 is more worrying to me if taken at face value – but I’m actually not so worried about it in practice. What’s much more common is that people seek power for themselves. Some of them are very successful with it – Ozymandias, Cyrus the Great, Alexander the Great, Jesus, Trajan, … – but they are so much fewer than all the millions and millions of narcissistic egomaniacs that try. Our civilization seems to be pretty resilient against such power grabs.

Corollary: We should keep our civilization resilient. That’s equally important to me because I wouldn’t want someone to assume power and undemocratically condemn all of us to hell to eke out the awful kind of continued existence that comes with it.

The example I was thinking of is this one. (There’s a similar thread here.) So in this case it’s the first option – they don’t think they’ll prefer death. But my “forever” was an extrapolation. It’s been almost three years since I read the comment.

I’m the ECL type of intersubjective moral antirealist. So in my mind, whether they really want what they want is none of my business, but what that says about what is desirable as a general policy for people we can’t ask is a largely empirical question that hasn’t been answered yet. :-3

Load More