Wei Dai

Comments

If someone cares a lot about a strictly zero-sum resource, like land, how do you convince them to 'move out of the zero-sum setting by finding "win win" resolutions'? Like what do you think Ukraine or its allies should have done to reduce the risk of war before Russia invaded? Or what should Taiwan or its allies do now?

Also to bring this thread back to the original topic, what kinds of interventions do you think your position suggests with regard to AI?

Rather conflicts arise when various individuals and populations (justifiably or not) perceive that they are in zero-sum games for limited resources. The solution for this is not “philosophical progress” as much as being able to move out of the zero-sum setting by finding “win win” resolutions for conflict or growing the overall pie instead of arguing how to split it.

I think many of today's wars are at least as much about ideology (like nationalism, liberalism, communism, religion) as about limited resources. I note that Russia and Ukraine both have below replacement birth rates and are rich in natural resources (more than enough to support their declining populations, with Russia at least being one of the biggest exporters of raw materials in the world).

The solution for this is not “philosophical progress” as much as being able to move out of the zero-sum setting by finding “win win” resolutions for conflict or growing the overall pie instead of arguing how to split it.

I think this was part of the rationale for Europe to expand trade relations with Russia in the years before the Ukraine war (e.g. by building/allowing the Nordstream pipelines), but it ended up not working. Apparently Putin was more interested in some notion of Russian greatness than material comforts for his people.

Similarly the US, China, and Taiwan are deeply enmeshed in positive sum trade relationships that a war would destroy, which ought to make war unthinkable from your perspective, but the risk of war has actually increased (compared to 1980, say, when trade was much less). If China did end up invading Taiwan I think we can assign much of the blame to valuing nationalism (or caring about the "humiliation" of not having a unified nation) too much, which seems a kind of philosophical error to me.

(To be clear, I'm not saying that finding “win win” resolutions for conflict or growing the overall pie are generally not good solutions or not worth trying, just that having wrong values/philosophies clearly play a big role in many modern big conflicts.)

One way that things could go wrong, not addressed by this playbook: AI may differentially accelerate intellectual progress in a wrong direction, or in other words create opportunities for humanity to make serious mistakes (by accelerating technological progress) faster than wisdom to make right choices (philosophical progress). Specific to the issue of misalignment, suppose we get aligned human-level-ish AI, but it is significantly better at speeding up AI capabilities research than the kinds of intellectual progress needed to continue to minimize misalignment risk, such as (next generation) alignment research and coordination mechanisms between humans, human-AI teams, or AIs aligned to different humans.

I think this suggests the intervention of doing research aimed at improving the philosophical abilities of the AIs that we'll build. (Aside from misalignment risk, it would help with many other AI-related x-risks that I won't go into here, but which collectively outweigh misalignment risk in my mind.)

I guess part of the problem is that the people who are currently most receptive to my message are already deeply enmeshed in other x-risk work, and I don't know how to reach others for whom the message might be helpful (such as academic philosophers just starting to think about AI?). If on reflection you think it would be worth spending some of your time on this, one particularly useful thing might be to do some sort of outreach/field-building, like writing a post or paper describing the problem, presenting it at conferences, and otherwise attracting more attention to it.

(One worry I have about this is, if someone is just starting to think about AI at this late stage, maybe their thinking process just isn't very good, and I don't want them to be working on this topic! But then again maybe there's a bunch of philosophers who have been worried about AI for a while, but have stayed away due to the overton window thing?)

Here's a link to the part of interview where that quote came from: https://youtu.be/GyFkWb903aU?t=4739 (No opinion on whether you're missing redeeming context; I still need to process Nesov's and your comments.)

Even at 10% p(doom), which I consider to be unreasonably low, it would probably be worth delaying a few years.

Someone with with 10% p(doom) may worry that if they got into a coalition with others to delay AI, they can't control the delay precisely, and it could easily become more than a few years. Maybe it would be better not to take that risk, from their perspective.

And lots of people have p(doom)<10%. Scott Aaronson just gave 2% for example, and he's probably taken AI risk more seriously than most (currently working on AI safety at OpenAI), so probably the median p(doom) (or effective p(doom) for people who haven't thought about it explicitly) among the whole population is even lower.

I’m just not as confident as you are I guess. Like, maybe the answers to the problems you describe are fairly objective, fairly easy for smart AIs to see, and so all we need to do is make smart AIs that are honest and then proceed cautiously and ask them the right questions.

I think I've tried to take into account uncertainties like this. It seems that in order for my position (that the topic is important and too neglected) to be wrong, one has to reach high confidence that these kinds of problems will be easy for AIs (or humans or AI-human teams) to solve, and I don't see how that kind of conclusion could be reached today. I do have some specific arguments for why the AIs we'll build may be bad at philosophy, but I think those are not very strong arguments so I'm mostly relying on a prior that says we should be worried about and thinking about this until we see good reasons not to. (It seems hard to have strong arguments either way today, given our current state of knowledge about metaphilosophy and future AIs.)

Another argument for my position is that humans have already created a bunch of opportunities for ourselves to make serious philosophical mistakes, like around nuclear weapons, farmed animals, AI, and we can't solve those problems by just asking smart honest humans the right questions, as there is a lot of disagreement between philosophers on many important questions.

I’m not confident in this skepticism and could imagine becoming much more convinced simply by thinking or hearing about the topic more.

What's stopping you from doing this, if anything? (BTW, beyond the general societal level of neglect, I'm especially puzzled by the lack of interest/engagement on this topic from the many people in EA with formal philosophy backgrounds. If you're already interested in AI and x-risks and philosophy, how is this not an obvious topic to work on or think about?)

Wei Dai12dΩ91810

Why is 1 important? It seems like something we can defer discussion of until after (if ever) alignment is solved, no?

If aging was solved or looked like it will be solved within next few decades, it would make efforts to stop or slow down AI development less problematic, both practically and ethically. I think some AI accelerationists might be motivated directly by the prospect of dying/deterioration from old age, and/or view lack of interest/progress on that front as a sign of human inadequacy/stagnation (contributing to their antipathy towards humans). At the same time, the fact that pausing AI development has a large cost in lives of current people means that you have to have a high p(doom) or credence in utilitarianism/longtermism to support it (and risk committing a kind of moral atrocity if you turn out to be wrong).

2 is arguably in that category also, though idk.

2 is important because as tech/AI capabilities increase, the possibilities to "make serious irreversible mistakes due to having incorrect answers to important philosophical questions" seem to open up exponentially. Some examples:

  • premature value lock-in
  • value drift,
  • handing over too much control/resources to alien/unaligned agents due to negotiation mistakes
  • mistakes related to commitment races
  • the process of creating/aligning AI might be unethical or creates a costly obligation
  • failure to prevent mindcrime inside AIs
  • intentionally doing horrible things at astronomical scale due to having wrong values/philosophies

If your point is that we could delegate solving these problems to aligned AI once we have them, my worry is that AI, including aligned AI, will be much better at creating new philosophical problems (opportunities to make mistakes) than at solving them. The task of reducing this risk (e.g., by solving metaphilosophy or otherwise making sure AIs' philosophical abilities keep up with or outpace their other intellectual abilities) seems super neglected, in part because very few people seem to acknowledge the importance of avoiding errors like the ones listed above.

(BTW I was surprised to see your skepticism about 2, since it feels like I've been talking about it on LW like a broken record, and I don't recall seeing any objections from you before. Would be curious to know if anything I said above is new to you, or you've seen me say similar things before but weren't convinced.)

Setting that aside, it reads to me like the frame-clash happening here is (loosely) between “50% extinction, 50% not-extinction” and “50% extinction, 50% utopia”

Yeah, I think this is a factor. Paul talked a lot about "1/trillion kindness" as the reason for non-extinction, but 1/trillion kindness seems to directly imply a small utopia where existing humans get to live out long and happy lives (even better/longer lives than without AI) so it seemed to me like he was (maybe unintentionally) giving the reader a frame of “50% extinction, 50% small utopia”, while still writing other things under the “50% extinction, 50% not-extinction” frame himself.

I do explicitly flag the loss of control over the future in that same sentence.

In your initial comment you talked a lot about AI respecting the preferences of weak agents (using 1/trillion of its resources) which implies handing back control of a lot of resources to humans, which from the selfish or scope insensitive perspective of typical humans probably seems almost as good as not losing that control in the first place.

I don’t think the much worse outcomes are closely related to unaligned AI so I don’t think they seem super relevant to my comment or Nate’s post.

If people think that (conditional on unaligned AI) in 50% of worlds everyone dies and the other 50% of worlds typically look like small utopias where existing humans get to live out long and happy lives (because of 1/trillion kindness), then they're naturally going to think that aligned AI can only be better than that. So even if s-risks apply almost equally to both aligned and unaligned AI, I still want people to talk about it when talking about unaligned AIs, or take some other measure to ensure that people aren't potentially misled like this.

(It could be that I'm just worrying too much here, that empirically people who read your top-level comment won't get the impression that close to 50% of worlds with unaligned AIs will look like small utopias. If this is what you think, I guess we could try to find out, or just leave the discussion here.)

where is the upside to the AI from spite during training?

Maybe the AI develops it naturally from multi-agent training (intended to make the AI more competitive in the real world) or the AI developer tried to train some kind of morality (e.g. sense of fairness or justice) into the AI.

I regret mentioning "lie-to-children" as it seems a distraction from my main point. (I was trying to introspect/explain why I didn't feel as motivated to express disagreement with the OP as you, not intending to advocate or endorse anyone going into "the business of telling lies-told-to-children to adults".)

My main point is that I think "misaligned AI has a 50% chance of killing everyone" isn't alarming enough, given what I think happens in the remaining 50% of worlds, versus what a typical person is likely to infer from this statement, especially after seeing your top-level comment where you talk about "kindness" at length. Can you try to engage more with this concern? (Apologies if you already did, and I missed your point instead.)

I think “misaligned AI has a 50% chance of killing everyone” is practically as alarming as “misaligned AI has a 95% chance of killing everyone,” while being a much more reasonable best guess.

(Addressing this since it seems like it might be relevant to my main point.) I find it very puzzling that you think “misaligned AI has a 50% chance of killing everyone” is practically as alarming as “misaligned AI has a 95% chance of killing everyone”. Intuitively it seems obvious that the latter should be almost twice as alarming as the former. (I tried to find reasons why this intuition might be wrong, but couldn't.) The difference also seems practically relevant (if by "practically as alarming" you mean the difference is not decision/policy relevant). In the grandparent comment I mentioned that the 50% case "might not seem so bad compared to keeping AI development on hold indefinitely which potentially implies a high probability of death from old age" but you didn't seem to engage with this.

Load More