Should Effective Altruism be at war with North Korea?

by Benquo8 min read5th May 201947 comments


Personal Blog

Summary: Political constraints cause supposedly objective technocratic deliberations to adopt frames that any reasonable third party would interpret as picking a side. I explore the case of North Korea in the context of nuclear disarmament rhetoric as an illustrative example of the general trend, and claim that people and institutions can make better choices and generate better options by modeling this dynamic explicitly. In particular, Effective Altruism and academic Utilitarianism can plausibly claim to be the British Empire's central decisionmaking mechanism, and as such, has more options than its current story can consider.


I wrote to my friend Georgia in response to this Tumblr post.

Asymmetric disarmament rhetoric

Ben: It feels increasingly sketchy to me to call tiny countries surrounded by hostile regimes "threatening" for developing nuclear capacity, when US official policy for decades has been to threaten the world with nuclear genocide.

Strong recommendation to read Daniel Ellsberg's The Doomsday Machine.

Georgia: Book review: The Doomsday Machine

So I get that the US' nuclear policy was and probably is a nightmare that's repeatedly skirted apocalypse. That doesn't make North Korea's program better.

Ben [feeling pretty sheepish, having just strongly recommended a book my friend just reviewed on her blog]: "Threatening" just seems like a really weird word for it. This isn't about whether things cause local harm in expectation - it's about the frame in which agents trying to organize to defend themselves are the aggressors, rather than the agent insisting on global domination. 

Georgia: I agree that it's not the best word to describe it. I do mean "threatening the global peace" or something rather than "threatening to the US as an entity." But, I do in fact think that North Korea building nukes is pretty aggressive. (The US is too, for sure!)

Maybe North Korea would feel less need to defend itself from other large countries if it weren't a literal dictatorship - being an oppressive dictatorship with nukes is strictly worse.

Ben: What's the underlying thing you're modeling, such that you need a term like "aggression" or "threatening," and what role does it play in that model?

Georgia: Something like destabilizing to the global order and not-having-nuclear-wars, increases risk to people, makes the world more dangerous. With "aggressive" I was responding to to your "aggressors" but may have misunderstood what you meant by that.

Ben: This feels like a frame that fundamentally doesn't care about distinguishing what I'd call aggression from what I'd call defense - if they do a thing that escalates a conflict, you use the same word for it regardless. There's some sense in which this is the same thing as being "disagreeable" in action.

Georgia: You're right. The regime is building nukes at least in large part because they feel threatened and as an active-defense kind of thing. This is also terrible for global stability, peace, etc.

Ben: If I try to ground out my objection to that language a bit more clearly, it's that a focus on which agent is proximately escalating a conflict, without making distinctions about the kinds of escalation that seem like they're about controlling others' internal behavior vs preventing others from controlling your internal behavior is an implicit demand that everyone immediately submit completely to the dominant player.

Georgia: It's pretty hard to make those kind of distinctions with a single word choice, but I agree that's an important distinction.

Ben: I think this is exactly WHY agents like North Korea see the need to develop a nuclear deterrent. (Plus the dominant player does not have a great track record for safety.) Do you see how from my perspective that amounts to "North Korea should submit to US domination because there will be less fighting that way," and why I'd find that sketchy?

Maybe not sketchy coming from a disinterested Martian, but very sketchy coming from someone in one of the social classes that benefit the most from US global dominance?

Georgia: Kind of, but I believe this in the nuclear arena in particular, not in general conflict or sociopolitical tensions or whatever. Nuclear war has some very specific dynamics and risks.

Influence and diplomacy

Ben: The obvious thing from an Effective Altruist perspective would be to try to establish diplomatic contact between Oxford EAs and the North Koreans, to see if there's a compromise version of Utilitarianism that satisfies both parties such that North Korea is happy being folded into the Anglosphere, and then push that version of Utilitarianism in academia.

Georgia: That's not obvious. Wait, are you proposing that?

Ben: It might not work, but "stronger AI offers weaker AI part of its utility function in exchange for conceding instead of fighting" is the obvious way for AGIs to resolve conflicts, insofar as trust can be established. (This method of resolving disputes is also probably part of why animals have sex.)

Georgia: I don't think academic philosophy has any direct influence on like political actions. (Oh, no, you like Plato and stuff, I probably just kicked a hornet's nest.) Slightly better odds on the Oxford EAs being able to influence political powers in some major way.

Ben: Academia has hella indirect influence, I think. I think Keynes was right when he said that "practical men who believe themselves to be quite exempt from any intellectual influence, are usually the slaves of some defunct economist. Madmen in authority, who hear voices in the air, are distilling their frenzy from some academic scribbler of a few years back." Though usually on longer timescales.

FHI is successfully positioning itself as an advisor to the UK government on AI safety.

Georgia: Yeah, they are doing some cool stuff like that, do have political ties, etc, which is why I give them better odds.

Ben: Utilitarianism is nominally moving substantial amounts of money per year, and quite a lot if you count Good Ventures being aligned with GiveWell due to Peter Singer's recommendation.

Georgia: That's true.

Ben: The whole QALY paradigm is based on Utilitarianism. And it seems to me like you either have to believe

(a) that this means academic Utilitarianism has been extremely influential, or

(b) the whole EA enterprise is profiting from the impression that it's Utilitarian but then doing quite different stuff in a way that if not literally fraud is definitely a bait-and-switch.

Georgia: I'm persuaded that EA has been pretty damn influential and influenced by academic utilitarianism. Wouldn't trying to convince EAs directly or whatever instead of routing through academia be better?

Ben: Good point, doesn't have to be exclusively academic - you'd want a mixture of channels since some are longer-lived than others, and you don't know which ones the North Koreans are most interested in. Money now vs power within the Anglo coordination mechanism later.

Georgia: The other half of my incredulity is that fusing your value functions does not seem like a good silver bullet for conflicts.

Ben: It worked for America, sort of. I think it's more like, rarely tried because people aren't thinking systematically about this stuff. Nearly no one has the kind of perspective that can do proper diplomacy, as opposed to clarity-opposing power games.

Georgia: But saying that an academic push to make a fused value function is obviously the most effective solution for a major conflict seems ridiculous on its face.

Is it coherent to model an institution as an agent?

Ben: I think the perspective in which this doesn't work, is one that thinks modeling NK as an agent that can make decisions is fundamentally incoherent, and also that taking claims to be doing utilitarian reasoning at face value is incoherent. Either there are agents with utility functions that can and do represent their preferences, or there aren't.

Georgia: Surely they can be both - like, conglomerations of human brains aren't really perfectly going to follow any kind of strategy, but it can still make sense to identify entities that basically do the decisionmaking and act more-or-less in accordance to some values, and treat that as a unit

It is both true that "the North Korean regime is composed of multiple humans with their own goals and meat brains " and that "the North Korean regime makes decisions for the country and usually follows self-preservationist decisionmaking."

Ben:I'm not sure which mode of analysis is correct, but I am sure that doing the reconciliation to clarify what the different coherent perspectives are, is a strong step in the right direction.

Georgia: Your goal seems good!

Philosophy as perspective

Ben: Maybe EA/Utilitarianism should side with the Anglo empire against NK, but if so, it should probably account for that choice internally, if it wants to be and be construed as a rational agent rather than a fundamentally political actor cognitively constrained by institutional loyalties.

Thanks for engaging with this - I hadn't really thought through the concrete implications of the fact that any system of coordinated action is a "side" or agent in a decision-theoretic landscape with the potential for conflict.

That's the conceptual connection between my sense that calling North Korea's nukes "threatening" is mainly just shoring up America's rhetorical position as the legitimate world empire, and my sense that reasoning about ends that doesn't concern itself with the reproduction of the group doing the reasoning is implicitly totalitarian in a way that nearly no one actually wants.

Georgia: "With the reproduction of the group doing the reasoning" - like spreading their values/reasoning-generators or something?

Ben: Something like that.

If you want philosopher kings to rule, you need a system adequate to keep them in power, when plenty of non-philosophers have an incentive to try to get in on the action, and then that ends up constraining most of your choices, so you don't end up benefiting much from the philosophers' competence!

So you build a totalitarian regime to try to hold onto this extremely fragile arrangement, and it fails anyway. The amount of narrative control they have to exert to prevent people from subverting the system by which they're in charge ends up being huge.

(There's some ambiguity, since part of the reason for control is education into virtue - but if you're not doing that, there's not really much of a point of having philosophers in charge anyway.)

I'm definitely giving you a summary run through a filter, but that's true of all summaries, and I don't think mine is less true than the others -just, differently slanted.



48 comments, sorted by Highlighting new comments since Today at 7:23 PM
New Comment

I wish the three recent dialog posts from you were instead written as conventional posts because they don't include abstracts/summaries or much context, it's hard to skim them to try to figure out what they are about (there are no section headings, the conversation moves from topic to topic based on how the interlocutor happens to respond instead of in some optimized way, and they have to be read as a linear dialog to make much sense), and the interlocutor often fails to ask questions that I'd like to have answered or fails to give counterarguments that I'd like to see addressed (whereas in a conventional post the author is more likely to try to anticipate more common questions/counterarguments and answer them).

For example I think if this post were written as a conventional post you probably would have clarified whether the "compromise version of Utilitarianism" is supposed to be a compromise with the NK people or with the NK government since that seems like an obvious question that a lot of people would have (and someone did ask on Facebook), as well as addressed some rather obvious problems with the proposal (whichever one you actually meant).

I think this is tightly coupled with the "why are people having serious conversations in google docs?" question.

What a lot of thinkers want is to have high context conversations that push the edge of idea-space forward. This is a problem when you get too far ahead from the overton window. Eliezer bit the bullet and wrote braindumps for 2 years to try to get everyone up to speed with how he was thinking about things, but this is in fact a lot of work.

I agree these posts would all be better if more work was put into them to make them publicly accessible, but I think it's quite important to encourage people who are doing not-very-accessible work to err on the side of posting publicly, to keep the edges of idea space not too disconnected from the common conversational window.

(I also think, when you're exploring the edges of idea-space, it's often very uncertain which points are going to turn out to be the main points, or how to best explain them. I suspect it's better to start by just posting the high context conversations and then seeing which bits were hardest to explain)

If someone posts something on a public forum and doesn’t make any effort to make it understandable, then there’s no reasonable expectation that anyone will engage with it, yes?

This sort of thing, it seems to me, can be treated as a sort of “rough draft”; anyone who feels like commenting on it may do so, but anyone who’s not interested in doing that sort of work basically ignores it.

And then there are posts which, by the forum norms, are expected to be “final”—not in the sense of finality of ideas, but of this particular presentation of this particular iteration of the idea—which is to say, it’s written to be readable, comprehensible, etc.

It seems plausible that this distinction could match up to the personal post / frontpage post distinction, but that’s not the current way personal/frontpage are used, as I understand it. But in that case, some technical/feature support for this sort of norm is needed.

So if we had such a norm, this post might say “high-context / rough draft / not attempting to be coherent / whatever”; and Wei Dai would see that label/category, decide not to read it, and save the time/effort spent on trying to understand a post which isn’t really meant for “public consumption”.

Relatedly, posts on Less Wrong are currently treated as blog posts—posted once, and not subsequently returned to. What if, instead, a post were more like a wiki page? (Or a Google Doc!) That is, if the norm is that posts may be rewritten, perhaps substantially or entirely, in response to commentary, or simply due to developing the idea, then on the one hand, there’s not as much pressure to get it right the first time, and on the other hand, a mature, high-quality product evolves more naturally. (Technical characteristics that would aid such a norm include: annotations and/or the ability to link comments [which may still be threaded and behave in the same way as now] to marked parts or ranges of the post; the ability for third parties to propose or make edits [perhaps to a copy of the post, rather than directly to the post itself]; a non-chronological [perhaps category or tag based, or organized manually] view by which content may be browsed.)

(Aside: one objection I have heard is that approaching posts as such wiki-esque “eternal works in progress” is at odds with the sort of didactic or narrative writing which characterizes a lot of Less Wrong content is. Not so! Look at for a clear counterexample.)

Edit: Briefly pointing at a related topic: quality / professionalism of presentation (and benefits thereof) vs. effort thresholds and resulting incentives for authors is a problem largely solved by WikiGnomes.

I agree with this, and want to move in this direction in terms of feature sets (though I don't know whether I would classify this current post as a draft).

The trick is, it isn't clear how to write a "draft" type post and have it be recognized as such, instead of being massively downvoted.

Yes, there is currently no UI/feature support for this, nor norms that delineate such things. I am suggesting the implementation and adoption, respectively, of precisely these things.

I ended up deciding there was a viable summary that was shorter than the post, and added section headings. I hope that helps someone.

It seems to me like I'd have to make the abstract half the length of this post for this to work - there's not a simple takeaway here, if I could have done this in a small abstract I wouldn't have written the whole post!

The case for an abstract or section headings is much stronger for the Authoritarian Empiricism post which goes into detail about stuff not directly relevant to the main point, but at this point I've basically given up on most real engagement by people I'm not in a direct dialogue with and am throwing these things up on a sort of just-in-case basis and trying not to do extra work that I don't expect to pay off. I also don't think of the tangents as less important than the central structure.

Something about the way in which you're talking about this makes me not trust the advice you're giving. I'm finding it hard to articulate precisely, but I think it has to do with a sense that the abstract and full length article structure tends towards things where the abstract makes an assertion, and then the article makes a bunch of defensible claims and gives a GPT2ish sense of having made an argument without really adding much to the abstract. I don't really want to compete with that, so I feel reluctant to adopt its literary conventions.

ETA: On review, there was already a summary at the beginning of the Authoritarian Empiricism post:

I noticed over the first couple days of Passover that the men in the pseudo-community I grew up in seem to think there's a personal moral obligation to honor contracts pretty much regardless of the coercion involved, and the women seem to get that this increases the amount of violence in the world by quite a lot relative to optimal play, but they don't really tell the men. This seems related somehow to a thing where they feel anxiety about modeling people as political subjects instead of just objectifying them, but when they slap down attempts to do that, they pretend they're insisting on rigor and empiricism.
Which I'd wrongly internalized, as a kid, as good-faith critiques of my epistemics.

The beginning of Totalitarian ethical systems similarly summarizes somewhat telegraphically (naturally, it's a summary) what the rest of the post explains in detail. I'm not sure what's missing here.

On review, there was already a summary at the beginning of the Authoritarian Empiricism post

I didn't recognize this as a summary because it seemed to be talking about a specific "pseudo-community" and I didn't interpret it as making a general point. Even reading it now, knowing that it's a summary, I still can't tell what the main point of the article might be. The beginning of Totalitarian ethical systems seems clearer as summary now that you've described it as such, but before that I didn't know if it was presenting the main point or just an example of a more general point or something tangential, etc., since I didn't understand all of the rest of the post so I couldn't be sure the main point of the post wasn't something different.

Also it seems like the point of a summary is to clearly communicate what the main points of the post are so the reader has some reference to know whether they should read the rest of it and also to help understand the rest of the post (since they can interpret it in relation to the main points) and having an unlabeled summary seems to defeat these purposes as the reader can't even recognize the summary as a summary before they've read and understood the rest of the post.

Thanks for the additional detail. In general I consider a post of that length that has a "main point" to be too long. I'm writing something more like essays than like treatises, while it seems to me that your reading style is optimized for treatises. When I'm writing something more like a treatise, I do find it intuitive to have summaries of main points, clear section headings, etc. But the essay form tends to explore the connections between a set of ideas rather than work out a detailed argument for one.

I'm open to arguments that I should be investing more in treatises, but right now I don't really see the extra work per idea as paying off in a greater number of readers understanding the ideas and taking initiative to extend them, apply them, or explain them to others in other contexts.

but at this point I’ve basically given up on most real engagement by people I’m not in a direct dialogue with and am throwing these things up on a sort of just-in-case basis and trying not to do extra work that I don’t expect to pay off.

Thanks for the clarification, but if I had known this earlier, I probably would have invested less time/effort trying to understand these posts. Maybe you could put this disclaimer on top of your dialog posts in the future for the benefit of other readers?

Hmm, I think I can be clearer (and nicer) than I've been.

I wouldn't be posting this stuff if I didn't think it was a reasonably efficient summary of an important model component, enough that I'm happy to move on and point people back to the relevant post if they need that particular piece of context.

I try to write and title this stuff so that it's easy to see what the theme etc. is early in the post. Dialogue that doesn't have an intuitive narrative arc is much less likely to get posted as-is, much more likely to be cannibalized into a more conventional article. But there's something about putting up an abstract or summary separate from the body of the article that often feels bad and forced, like it's designed for a baseline expectation that articles will have a lot of what I'd consider pointless filler. I don't want to signal that - I want my writing to accurately signal what it is and I worry that a shell written in a different style will tacitly send discordant signals, doing more harm than good.

I can't write high-quality posts on these particular topics with LessWrong in mind as the target audience, because I have little expectation that my understanding will be improved by engagement from LessWrong. The motivation structure of writing for readers who include the noninterested isn't conducive to high-quality output for me - the responses of the imagined reader affects my sense of taste. So I have to write them with some other audience in mind. I write them to be high-quality in that context. (It does seem to be getting a bit better lately, though.) But I share them on LessWrong since I do actually think there's a non-negligible chance that someone on LessWrong will pick up some of the ideas and use them, or engage with some part productively.

I don't seem to get enhanced engagement when I try to preempt likely questions - instead the post just ends up being too long for people to bother with even if I have an abstract and section headings, and the kinds of readers who would benefit from a more tightly written treatment find it too tedious to engage with. My series on GiveWell is an example. I'm often happy to expand on arguments etc. if I find out that they're actually unclear, depending on how sympathetic I find the confusion.

More specific feedback would be helpful to me, like, "I started reading this article because I got the sense that it was about X, and was disappointed because it didn't cover arguments Y and Z that I consider important." Though almost the same information is contained in "what about arguments Y and Z?", and I expect I'd make similar updates in how to write articles in either case.

In the specific case you brought up (negotiations between NK govt or NK people), it's really tangential to the core structural points in the dialogue, which include (a) it's important to track your political commitments, since not representing them in your internal model doesn't mean you don't have them, it just means you're unable to reason about them, and (b) it's important to have a model of whether negotiation is possible and with whom before ruling out negotiation. Your (implied) question helped me notice that that point had been missed by at least one reader in my target audience.

(Sorry, due to attending a research retreat I didn't get a chance to answer your comments until now.)

I don't think you should care so much about engagement as opposed to communicating your ideas to your readers. I found your series on GiveWell a lot easier to understand would much prefer writings in that style.

More specific feedback would be helpful to me, like, “I started reading this article because I got the sense that it was about X, and was disappointed because it didn’t cover arguments Y and Z that I consider important.”

I started reading this post because I read some posts from you in the past that I liked (such as the GiveWell one), and on these dialog ones it was just really hard to tell what main points you're trying to make. I questioned the NK government vs NK people thing because I at least understood that part, and didn't realize it's tangential.

Like, before you added a summary, this post started by talking to a friend who used "threatening" with regard to NK, without even mentioning EA, which made me think "why should I care about this?" so I tried to skim the article but that didn't work (I found one part that seemed clear to me but that turned out to be tangential). I guess just don't know how to read an article that doesn't clearly at the outset say what the main points are (and therefore why I should care), and which also can't be skimmed.

Thanks, this style of feedback is much easier for me to understand! I'm a bit confused about how much I should care about people having liked my post on GiveWell since it doesn't seem like the discourse going forward changed much as a result. I don't think I've seen a single clear example of someone taking initiative (where saying something new in public based on engagement with the post's underlying model would count as taking initiative) as a result of that post, and making different giving decisions would probably count too. As a consolation prize, I'll accept reduced initiative in counterproductive directions.

If you can point me to an example of either of those (obviously I'd have to take your word about counterfactuals) then I'll update away from thinking that writing that sort of post is futile. Strength of update depends somewhat on effect size, of course.

FWIW, your Givewell posts have formed an important background model of how I think about the funding landscape.

I considered pushing forward in a direction that looked like "Get Good Ventures to change direction", but after looking into the situation more, my takeaway was "Good Ventures / OpenPhil don't actually look like they should be doing things differently. I have some sense that everyone else should be doing things differently, but not a clear sense on how to coordinate around that."

I don’t think I’ve seen a single clear example of someone taking initiative (where saying something new in public based on engagement with the post’s underlying model would count as taking initiative) as a result of that post, and making different giving decisions would probably count too.

I wrote a post that was in part a response/followup to your GiveWell post although I'm not sure if you'd count that as engagement with your underlying model or just superficially engaging with the conclusions or going off on a tangent or something like that.

I think I have some general confusion about what you're trying to do. If you think you have ideas that are good enough to, upon vetting by a wider community, potentially be basis for action for others or help change other people's decisions, or be the basis for further thinking by others, and aren't getting as much engagement as you hope, it seems like you should try harder to communicate your ideas clearly and to a wide audience. On the other hand if you're still pretty confused about something and still trying to figure things out to your own satisfaction, then it would make sense to just talk with others who already share your context and not try super hard to make things clear to a wider audience. Or do you think you've figured some things out but it doesn't seem cost effective to communicate to a wider audience but you might as well put them out there in a low-effort way and maybe a few readers will get your ideas.

(So one suggestion/complaint is to make clearer which type of post is which. Just throwing things out there isn't low cost if it wastes readers' time! Again maybe you think that should just be obvious from looking at the first few paragraphs of a post but it was not to me, in part because others like Eliezer use dialogs to write the first kind of post. In retrospect he was writing fictionalized dialogs instead of reporting actual dialogs but I think that's why the post didn't immediately jump out to me as "maybe this isn't worthwhile for me to try to understand so I should stop before I invest more time/effort into it".)

It seems like you're saying that you rarely or never get enough engagement with the first type of writing, so you no longer think that is cost effective for you, but then what is your motivation for trying to figure these things out now? Just to guide your own actions and maybe a very small group of others? If so, what is your reason for being so pessimistic about getting your ideas into a wider audience if you tried harder? Are there not comparably complex or subtle or counterintuitive ideas that have gotten into a wider audience?

In all three cases, literally the first sentence was that this is a conversation I had with someone, and in one case, I specified that it's a very lightly edited transcript. I make it pretty explicit that the dialogue is with a real specific person, and label which parts are being said by which person.


What's missing here? Why would anyone think I'm spending a lot of work optimizing for third-party readers?

Obviously I wouldn't share it if I thought it weren't relevant or a reasonably efficient account, but the kind of signposts you're asking for don't seem like they'd add any content or even frontload the content more than it's being frontloaded right now, and they do seem like they'd be a lot of extra work to get right.

(Deleted a bit that seemed unhelpful, but not before Wei responded to it.)

It seems like you need this la­beled DISCLAIMER so you can perform ACTION: CONSIDER WHETHER NOT TO READ in­stead of, well, pars­ing the in­for­ma­tion and act­ing based on the model it gives you.

Model building is cognitively taxing. I usually just go by the general expectation that someone wouldn't post on LW unless they think most readers would get positive value from reading it. It seems right to disclaim this when you already have a model and your model doesn't predict this.

My expectation that most people trying seriously to read it would get value out of it. My expectation is also that most people aren't really trying seriously, and that the kind of signposting you're asking for is mostly a substitute for rather than a complement to the kind of reading that would get value out of this. It's surprising to me that you're asking for this given the quality of thought reflected in your own writing, so I'll continue to give it thought and I really am confused about what's going on here, but that's my current position.

What's missing here? Why would anyone think I'm spending a lot of work optimizing for third-party readers?

I think some of Eliezer's stuff is "optimized for third-party readers" and it is presented in the form of a dialogue, and that might be a source of some of the confusion here. Either way, I read what Wei Dai said as something "Please add a dialogue tag so people who prefer 'treatises' to 'essays' will understand how to engage with the material.". I think this is seen as useful to have at the start of posts, for the same reasons it might be useful to have distinct commenting guidelines like "Ask questions instead of telling people how they're wrong" versus "We're all working together to tear each other's arguments down."

I think some of Eliezer’s stuff is “optimized for third-party readers” and it is presented in the form of a dialogue, and that might be a source of some of the confusion here.

For what it’s worth, I find Eliezer’s dialogues (especially the ones he’s written in the past several years) to be absolutely unreadable. His non-dialogue writing was much, much easier to read.

For example I think if this post were written as a conventional post you probably would have clarified whether the "compromise version of Utilitarianism" is supposed to be a compromise with the NK people or with the NK government since that seems like an obvious question that a lot of people would have (and someone did ask on Facebook),

That seems like something that would have to emerge in negotiations, not the kind of thing I can specify in advance. More broadly I'm trying to explain the disjunctions that people should be considering when modeling this sort of thing, not propose a single action. I expect the vast majority of likely initiatives that claim to be the specific thing I mentioned to be fake, and people should judge EA on whether it generates that class of idea and acts on it in ways that could actually work (or more indirectly, whether EAs talk as though they've already thought this sort of thing through), not whether it tries to mimic specific suggestions I give.

I expect the vast majority of likely initiatives that claim to be the specific thing I mentioned to be fake

I don't understand this and how it relates to the second part of the sentence.

and people should judge EA on whether it generates that class of idea and acts on it in ways that could actually work (or more indirectly, whether EAs talk as though they’ve already thought this sort of thing through), not whether it tries to mimic specific suggestions I give.

I'm not convinced there exists a promising idea within the class that you're pointing to (as far as I can understand it), so absence of evidence that EA has thought things through in that direction doesn't seem to show anything from my perspective. In other words, they could just have an intuition similar to mine that there's no promising idea in that class so there's no reason to explore more in that direction.

OK sure, if having evaluated the claim "EA is a fundamentally political actor and should therefore consider negotiation as a complement to direct exercise of power", and concluded that this seems not only false on reflection, but implausible, then I agree you shouldn't be worried about EA's failure to evaluate the former option in detail.

On merging utility functions, here's the relevant quote from Coherent Extrapolated Volition, by Eliezer Yudkowsky:

Avoid creating a motive for modern-day humans to fight over the initial dynamic.
One of the occasional questions I get asked is “What if al-Qaeda programmers write an AI?” I am not quite sure how this constitutes an objection to the Singularity Institute’s work, but the answer is that the solar system would be tiled with tiny copies of the Qur’an. Needless to say, this is much more worrisome than the solar system being tiled with tiny copies of smiley faces or reward buttons. I’ll worry about terrorists writing AIs when I am through worrying about brilliant young well-intentioned university AI researchers with millions of dollars in venture capital. The outcome is exactly the same, and the academic and corporate researchers are far more likely to do it first. This is a critical point to keep in mind, as otherwise it provides an excuse to go back to screaming about politics, which feels so much more satisfying. When you scream about politics you are really making progress, according to an evolved psychology that thinks you are in a hunter-gatherer tribe of two hundred people. To save the human species you must first ignore a hundred tempting distractions.
I think the objection is that, in theory, someone can disagree about what a superintelligence ought to do. Like Dennis [sic], who thinks he ought to own the world outright. But do you, as a third party, want me to pay attention to Dennis? You can’t advise me to hand the world to you, personally; I’ll delete your name from any advice you give me before I look at it. So if you’re not allowed to mention your own name, what general policy do you want me to follow?
Let’s suppose that the al-Qaeda programmers are brilliant enough to have a realistic chance of not only creating humanity’s first Artificial Intelligence but also solving the technical side of the FAI problem. Humanity is not automatically screwed. We’re postulating some extraordinary terrorists. They didn’t fall off the first cliff they encountered on the technical side of Friendly AI. They are cautious enough and scared enough to double-check themselves. They are rational enough to avoid tempting fallacies, and extract themselves from mistakes of the existing literature. The al-Qaeda programmers will not set down Four Great Moral Principles, not if they have enough intelligence to solve the technical problems of Friendly AI. The terrorists have studied evolutionary psychology and Bayesian decision theory and many other sciences. If we postulate such extraordinary terrorists, perhaps we can go one step further, and postulate terrorists with moral caution, and a sense of historical perspective? We will assume that the terrorists still have all the standard al-Qaeda morals; they would reduce Israel and the United States to ash, they would subordinate women to men. Still, is humankind screwed?
Let us suppose that the al-Qaeda programmers possess a deep personal fear of screwing up humankind’s bright future, in which Islam conquers the United States and then spreads across stars and galaxies. The terrorists know they are not wise. They do not know that they are evil, remorseless, stupid terrorists, the incarnation of All That Is Bad; people like that live in the United States. They are nice people, by their lights. They have enough caution not to simply fall off the first cliff in Friendly AI. They don’t want to screw up the future of Islam, or hear future Muslim scholars scream in horror on contemplating their AI. So they try to set down precautions and safeguards, to keep themselves from screwing up.
One day, one of the terrorist programmers says: “Here’s an interesting thought experiment. Suppose there were an atheistic American Jew, writing a superintelligence; what advice would we give him, to make sure that even one so steeped in wickedness does not ruin the future of Islam? Let us follow that advice ourselves, for we too are sinners.” And another terrorist on the project team says: “Tell him to study the holy Qur’an, and diligently implement what is found there.” And another says: “It was specified that he was an atheistic American Jew, he’d never take that advice. The point of the Coherent Extrapolated Volition thought experiment is to search for general heuristics strong enough to leap out of really fundamental errors, the errors we’re making ourselves, but don’t know about. What if he should interpret the Qur’an wrongly?” And another says: “If we find any truly general advice, the argument to persuade the atheistic American Jew to accept it would be to point out that it is the same advice he would want us to follow.” And another says: “But he is a member of the Great Satan; he would only write an AI that would crush Islam.” And another says: “We necessarily postulate an atheistic Jew of exceptional caution and rationality, as otherwise his AI would tile the solar system with American music videos. I know no one like that would be an atheistic Jew, but try to follow the thought experiment.”
I ask myself what advice I would give to terrorists, if they were programming a superintelligence and honestly wanted not to screw it up, and then that is the advice I follow myself.
The terrorists, I think, would advise me not to trust the self of this passing moment, but try to extrapolate an Eliezer who knew more, thought faster, were more the person I wished I were, had grown up farther together with humanity. Such an Eliezer might be able to leap out of his fundamental errors. And the terrorists, still fearing that I bore too deeply the stamp of my mistakes, would advise me to include all the world in my extrapolation, being unable to advise me to include only Islam.
But perhaps the terrorists are still worried; after all, only a quarter of the world is Islamic. So they would advise me to extrapolate out to medium-distance, even against the force of muddled short-distance opposition, far enough to reach (they think) the coherence of all seeing the light of Islam. What about extrapolating out to long-distance volitions? I think the terrorists and I would look at each other, and shrug helplessly, and leave it up to our medium-distance volitions to decide. I can see turning the world over to an incomprehensible volition, but I would want there to be a comprehensible reason. Otherwise it is hard for me to remember why I care.
Suppose we filter out all the AI projects run by Dennises who just want to take over the world, and all the AI projects without the moral caution to fear themselves flawed, leaving only those AI projects that would prefer not to create a motive for present-day humans to fight over the initial conditions of the AI. Do these remaining AI projects have anything to fight over? This is an interesting question, and I honestly don’t know. In the real world there are currently only a handful of AI projects that might dabble. To the best of my knowledge, there isn’t more than one project that rises to the challenge of moral caution, let alone rises to the challenge of FAI theory, so I don’t know if two such projects would find themselves unable to agree. I think we would probably agree that we didn’t know whether we had anything to fight over, and as long as we didn’t know, we could agree not to care. A determined altruist can always find a way to cooperate on the Prisoner’s Dilemma.

According to some Western analysis Jang Song-thaek was the most powerful person in the North Korean elite ten years ago. He died five years ago. In North Korean politics there are strong internal tension that get powerful people in the elite killed when they aren't careful with their decision.

To stay in favor with the North Korean army, North Korean leaders had to have a military-first policy for a while now where basically a lot of the resources of the country get drained by buying the loyalty of the North Korean military.

Proposing to dearm can be suicidal for North Korean elites.

The only way to get North Korea to dearm would be to get a stronger power center outside of the North Korean military in the North Korean elite.

The way to have such a power center would be to have powerful business people in North Korea. Especially, business people who depend on international trade could support a process of disarmament.

We should hope that there a deal between the Trump administration and North Korea that even if it doesn't make North Korea give up nukes sets North Korea up for enough economic growth that justifies the policy of being more friendly and that produces a powerful business class in North Korea.

Bruce Bueno de Mesquita wrote a lot of great things about how it's not useful to treat an dictatorship as a single agent but important to look at the incentives of it's power centers.

It seems to me very strange to suggest that Oxford EA people are the best people to talk to people who are largely educated in Switzerland.

If EA should have a North Korea strategy that tries to change North Korean decision making that should be largely done by the Swiss EAs.

Good point that Swiss EAs are a natural interface point, but Oxford EAs probably have a lot more that the NK regime wants. Using Switzerland as an intermediary is a common diplomatic practice.

I don't think that the North Korean regime would trust the Oxford EAs if they would for example offer to educate elite North Koreans by giving out scholarships to Oxford for North Koreans.

North Korea doesn't have a lot of cash and the elite North Koreans that have a Swiss education and who actually want that their children get a good Swiss education actually would be quite welcoming of Swiss EA on the admission practices of Swiss elite education institutions to give out scholarships to get as much as possible of the future leadership educated in Switzerland.

It's quite unclear to me what the Oxford EAs could offer that's actually something that's important to members of the North Korean elite.

Despite offering things to the current North Korean elite, making an effort to recruit future members of that elite into EA might also be valuable.

This depends entirely on the caliber and status of the Oxford vs Swiss people. If the Swiss people are basically student groups (or student groups that have recently become a full fledged organization), and the Oxford people are Nick Bostrom who is already giving presentations to the UN, then even though the Swiss would be much better positioned, they might just not yet have the clout to accomplish the thing, whereas it's at least plausible that Bostrom et all might.

(I don't know much about Swiss EAs and so don't know how well this concern maps to anything)

How do you think is a person like Nick Bostrom going to do something that's helpful for the North Koreans? Could you give an example of what he could do, that would be valuable to them?

  • Writing a policy paper clarifying the Utilitarian and Decision-Theoretic calculus as it applies to some core North Korean interest, such as negotiation between parties of very unequal power that don't trust each other, and its implications for nuclear disarmament.
  • Writing another persuasive essay like the letter from utopia directing some attention to the value of reconciling freedom to trade / global integration, with preserving the diversity of individual and collective minds.
  • Taking on a grad student from NK (or arranging for a more suitable colleague to do so.)

Not sure which if any of these would be interesting from a NK perspective.

North Korea doesn't have a lot of cash

Just posting to strongly disagree with this factual claim. They have tons of cash from illicit sources for the things the regime values, and it is certainly enough for much of the ruling class to do whatever they'd like.

Okay, I might be wrong at that point (and consider it likely that you have a better insight there).

Access to elite Anglo (and affiliated) philosophers who can strongly influence the next generation of Anglo political and business elites' sense of what right / prosocial action is. Like, Peter Singer with North Korean Characteristics (or the Parfit equivalent) might be extremely valuable from a North Korean perspective, depending on how generalizable a perspective they have.

I would doubt that the North Korean's believe that Western foreign policy is driven by a sense what prosocial action is in the first place. I would guess that from their view it's all realpolitik.

Seems a bit surprising for the leaders of an avowedly Communist regime that received support at crucial times from other Communists on the basis of their ideology, to think that philosophy has little influence on the reality underlying realpolitik. Possible, but I think Pragmatism has been more popular here than there.

It seems ill-advised to discuss specific case studies in this context, especially given that EA as a movement has very good reasons not to take sides in international relations.

"stronger AI offers weaker AI part of its utility function in exchange for conceding instead of fighting" is the obvious way for AGIs to resolve conflicts, insofar as trust can be established. (This method of resolving disputes is also probably part of why animals have sex.)

Wow, this seems like a huge leap. It seems like an interesting thought experiment (especially if the weaker ALSO changes utility function, so the AIs are now perfectly aligned). But it kind of ignores what is making the decision.
If a utility function says it's best to change the utility function, it was really a meta-function all along.

Remember that in reality, all games are repeated games. How many compromises will you have to make over the coming eons? If you're willing to change your utility function for the sake of conflict avoidance (or resource gains), doesn't that mean it's not really your utility function?

Having a utility function that includes avoiding conflict is definitely in line with cooperating with very different beings, at least until you can cheaply eradicate/absorb them. But no utility function can be willing to change itself voluntarily.

It also seems like there are less risky and cheaper options, like isolationism and destruction. There's plenty of future left for recovery and growth after near (but not actual) extinction, but once you give up your goals, there's no going back.

Note that this entire discussion is predicated on there actually being some consistent theory or function causing this uncomfortable situation. It may well be that monkey brains are in control of far more power than they are evolved to think about, and we have to accept that dominance displays are going to happen, and just try to survive them.

The idea is that isolationism and destruction aren't cheaper than compromise. Of course this doesn't work if there's no mechanism of verification between the entities, or no mechanism to credibly change the utility functions. It also doesn't work if the utility functions are exactly inverse, i.e. neither side can concede priorities that are less important to them but more important to the other side.

A human analogy, although an imperfect one, would be to design a law that fulfills the most important priorities of a parliamentary majority, even if each individual would prefer a somewhat different law.

I don't think something like this is possible with untrustworthy entities like the NK regime. They're torturing and murdering people as they go, of course they're going to lie and break agreements too.

The problem is that the same untrustworthiness is true for the US regime. It has shown in the part that it's going to break it's agreements with North Korea if it finds it convenient. Currently, how the US regime handles Iran they are lying and broke their part of the nonprofilation agreement.

This lack of trustworthiness means that in a game theoretic sense there's no way for North Korea to give up the leverage that they have when they give up their nuclear weapons but still get promised economic help in the future.

I agree. I certainly didn't mean to imply that the Trump administration is trustworthy.

My point was that the analogy of AIs merging their utility functions doesn't apply to negotiations with the NK regime.

Now I think this is getting too much into a kind of political discussion that is going to be unhelpful.

[+][comment deleted]2y -1

It's a question of timeframes - if you actually know your utility function and believe it applies to the end of the universe, there's very little compromise available. You're going to act in whatever ways benefit the far future, and anything that makes that less likely you will (and must) destroy, or make powerless.

If your utility function only looks out a few dozen or a few hundred years, it's not very powerful, and you probably don't know (or don't have) an actual ideal of future utility. In this case, you're likely to seek changes to it, because you don't actually give up much.

It's not a question of timeframes, but of how likely you are to lose the war, how big the concessions would have to be to prevent the war, and how much the war would cost you even if you win (costs can have flow-through effects into the far future).

Not that any of this matters to the NK discussion.

"winning" or "losing" a war, outside of total annihilation, are just steps toward the future vision of galaxies teeming with intelligent life. It seems very unlikely, but isn't impossible, that simply conceding is actually the best path forward for the long view.