So8res — LessWrong

Safety researchers should take a public stance

I agree that large companies are likely incoherent in this way; that's what I was addressing in my follow-on comment :-). (Short version: I think getting a warning and then pressing the issue is a great way to press the company for consistency on this (important!) issue, and I think that it matters whether the company coheres around "oh yeah, you're right, that is okay" vs whether it coheres around "nope, we do alignmentwashing here".)

With regards to whether senior figures are paying attention: my guess is that if a good chunk of alignment researchers (including high-profile ones such as yourself) are legitimately worried about alignmentwashing and legitimately considering doing your work elsewhere (and insofar as you prefer telling the media if that happens -- not as a threat but because informing the public is the right thing to do) -- then, if it comes to that extremity, I think companies are pretty likely to get the senior figures involved. And I think that if you act in a reasonable, sensible, high-integrity way throughout the process, that you're pretty likely to have pretty good effects on the internal culture (either by leaving or by causing the internal policy to change in a visible way that makes it much easier for researchers to speak about this stuff).

Safety researchers should take a public stance

So8res2mo125

For what it's worth, I think that it's pretty likely that the bureaucratic processes at (e.g.) Google haven't noticed that acknowledging that the race to superintelligence is insane has a different nature than (e.g.) talking about the climate impacts of datacenters, and I wouldn't be surprised if (e.g.) Google issued one of their researchers a warning the first time they mentioned things, not out of deliberate sketchiness but just out of bureaucratic habit. My guess is that that'd be a great opportunity to push back, spell out the reason why the cases are different, and see whether the company stands up to its alleged principles or codifies its alignmentwashing practices. If you have the opportunity to spur that conversation, I think that'd be real cool of you -- I think there's a decent chance it would spark a bunch of good internal cultural change, and also a decent chance that it would make the issues with staying at the lab much clearer (both internally, and to the public if a news story came of it).

Safety researchers should take a public stance

So8res2mo*4336

Thanks for the clarification. Yeah, from my perspective, if casually mentioning that you agree with the top scientists & lab heads & many many researchers that this whole situation is crazy causes your host company to revoke your permission to talk about your research publicly (maybe after a warning), then my take is that that's really sketchy and that contributing to a lab like that is probably substantially worse than your next best opportunity (e.g. b/c it sounds like you're engaging in alignmentwashing and b/c your next best opportunity seems like it can't be much worse in terms of direct research).

(I acknowledge that there's room to disagree about whether the second-order effect of safetywashing is outweighed by the second-order effect of having people who care about certain issues existing at the company at all. A very quick gloss of my take there: I think that if the company is preventing you from publicly acknowledging commonly-understood-among-experts key features of the situation, in a scenario where the world is desperately hurting for policymakers and lay people to understand those key features, I'm extra skeptical that you'll be able to reap the imagined benefits of being a "person on the inside".)

I acknowledge that there are analogous situations where a company would feel right to be annoyed, e.g. if someone were casually bringing up their distantly-related political stances in every podcast. I think that this situation is importantly disanalogous, because (a) many of the most eminent figures in the field are talking about the danger here; and (b) alignment research is used as a primary motivating excuse for why the incredibly risky work should be allowed to continue. There's a sense in which the complicity of alignment researchers is a key enabling factor for the race; if all alignment researchers resigned en masse citing the ridiculousness of the insanity of the race then policymakers would be much more likely to go "wait, what the heck?" In a situation like that, I think the implicit approval of alignment researchers is not something to be traded away lightly.

This is a review of the reviews

So8res2mo317

(Fwiw, I personally disclaim any social pressure that people should avoid mentioning or discussing their disagreements; that'd be silly. I am in favor of building upon areas of agreement, and I am in favor of being careful to avoid misleading the public, and I am in favor of people who disagree managing to build coalitions, but I'm not in favor of people feeling like it's time to stfu. I think the "misleading the public" thing is a little delicate, because I think it's easy for onlookers to think experts are saying "i disagree [that the current situation is reckless and crazy and a sane world would put a stop to it]" when in fact experts are trying to say "i disagree [about whether certain technical plans have a middling probability of success, though of course i agree that the current situation is reckless and crazy]", and it can be a bit tricky to grumble about this effect in a fashion that doesn't come across as telling people to stfu about their disagreements. My attempt to thread that needle is to remind people that this misunderstanding is common and important, and thus to suggest that when people have a broad audience, they work to combat this misread :-))

This is a review of the reviews

So8res2mo4514

My impression of the lesson from the Shanghai Communique is not "parties should only ever say things everyone else will agree with them on" but rather "when talking to broad audiences, say what you believe; when attempting to collaborate with potential partners, build as much collaboration as you can on areas of agreement."

I don't have much interest in trying to speak for everyone, as opposed to just for myself. Weakening the title seems to me like it only makes sense in a world where I'm trying to represent some sort of intersectional view that most everyone agrees upon, instead of just calling it like I see it. I think the world would be better off if we all just presented our own direct views. I don't think this is in tension with the idea that one should attempt to build as much collaboration as possible in areas of agreement.

For instance: if you present your views to an audience and I have an opportunity to comment, I would encourage you to present your own direct views (rather than something altered in attempts to make it palatable to me). Completely separately, if I were to comment on it, I think it'd be cool of me to emphasize the most important and relevant bits first (which, for most audiences, will be bits of agreement) before moving on to higher-orsee disagreements. (If you see me failing to do this, I'd appreciate being called out.)

(All that said, I acknowledge that the book would've looked very different -- and that the writing process would have been very different -- if we were trying to build a Coallition of the Concerned and speak for all EAs and LessWrongers, rather than trying to just blurt out the situation as we saw it ourselves. I think "I was not part of the drafting process and I disagree with a bunch of the specifics" is a fine reason to avoid socially rallying behind the book. My understanding of the OP is that it's trying to push for something less like "falsely tell the world that the book represents you, because it's close enough" (which I think would be bad), and more like "when you're interacting with a counterparty that has a lot of relevant key areas of agreement (opening China would make it richer / the AI race is reckless), it's productive to build as much as you can on areas of agreement". And fwiw, for my part, I'm very happy to form coalitions with all those who think the race is insanely reckless and would be better off stopped, even if we don't see eye to eye on the likelihood of alignment success.)

Safety researchers should take a public stance

So8res2mo2012

The thing I'm imagining is more like mentioning, almost as an aside, in a friendly tone, that ofc you think the whole situation is ridiculous and that stopping would be better (before & after having whatever other convo you were gonna have about technical alignment ideas or w/e). In a sort of "carthago delanda est" fashion.

I agree that a host company could reasonably get annoyed if their researchers went on many different podcasts to talk for two hours about how the whole industry is sick. But if casually reminding people "the status quo is insane and we should do something else" at the beginning/end is a fireable offense, in a world where lab heads & Turing award winners & Nobel laureate godfathers of the field are saying this is all ridiculously dangerous, then I think that's real sketchy and that contributing to a lab like that is substantially worse than the next best opportunity. (And similarly if it's an offense that gets you sidelined or disempowered inside the company, even if not exactly fired.)

The title is reasonable

So8res2mo115

(To answer your direct Q, re: "Have you ever seen someone prominent pushing a case for "optimism" on the basis of causal trade with aliens / acaual trade?", I have heard "well I don't think it will actually kill everyone because of acausal trade arguments" enough times that I assumed the people discussing those cases thought the argument was substantial. I'd be a bit surprised if none of the ECLW folks thought it was a substantial reason for optimism. My impression from the discussions was that you & others of similar prominence were in that camp. I'm heartened to hear that you think it's insubstantial. I'm a little confused why there's been so much discussion around it if everyone agrees it's insubstantial, but have updated towards it just being a case of people who don't notice/buy that it's washed out by sale to hubble-volume aliens and who are into pedantry. Sorry for falsely implying that you & others of similar prominence thought the argument was substantial; I update.)

Safety researchers should take a public stance

So8res2mo4136

I am personally squeamish about AI alignment researchers staying in their positions in the case where they're only allowed to both go on podcasts & keep their jobs if they never say "this is an insane situation and I wish Earth would stop instead (even as I expect it won't and try to make things better)" if that's what they believe. That starts to feel to me like misleading the Earth in support of the mad scientists who are gambling with all our lives. If that's the price of staying at one of the labs, I start to feel like exiting and giving that as the public reason is a much better option.

In part this is because I think it'd make all sorts of news stories in a way that would shift the Overton window and make it more possible for other researchers later to speak their mind (and shift the internal culture and thus shift the policymaker understanding, etc.), as evidenced by e.g. the case of Daniel Kokotajlo. And in part because I think you'd be able to do similarly good or better work outside of a lab like that. (At a minimum, my guess is you'd be able to continue work at Anthropic, e.g. b/c Evan can apparently say it and continue working there.)

The title is reasonable

So8res2mo*3230

Ty! For the record, my reason for thinking it's fine to say "if anyone builds it, everyone dies" despite some chance of survival is mostly spelled out here. Relative to the beliefs you spell out above, I think the difference is a combination of (a) it sounds like I find the survival scenarios less likely than you do; (b) it sounds like I'm willing to classify more things as "death" than you are.

For examples of (b): I'm pretty happy to describe as "death" cases where the AI makes things that are to humans what dogs are to wolves, or (more likely) makes some other strange optimized thing that has some distorted relationship to humanity, or cases where digitized backups of humanity are sold to aliens, etc. I feel pretty good about describing many exotic scenarios as "we'd die" to a broad audience, especially in a setting with extreme length constraints (like a book title). If I were to caveat with "except maybe backups of us will be sold to aliens", I expect most people to be confused and frustrated about me bringing that point up. It looks to me like most of the least-exotic scenarios are ones that rout through things that lay audience members pretty squarely call "death".

It looks to me like the even more exotic scenarios (where modern individuals get "afterlives") are in the rough ballpark of quantum immortality / anthropic immortality arguments. AI definitely complicates things and makes some of that stuff more plausible (b/c there's an entity around that can make trades and has a record of your mind), but it still looks like a very small factor to me (washed out e.g. by alien sales) and feels kinda weird and bad to bring it up in a lay conversation, similar to how it'd be weird and bad to bring up quantum immortality if we were trying to stop a car speeding towards a cliff.

FWIW, insofar as people feel like they can't literally support the title because they think that backups of humans will be sold to aliens, I encourage them to say as much in plain language (whenever they're critiquing the title). Like: insofar as folks think the title is causing lay audiences to miss important nuance, I think it's an important second-degree nuance that the allegedly-missing nuance is "maybe we'll be sold to aliens", rather than something less exotic than that.

Safety researchers should take a public stance

So8res2mo*4833

Oh yeah, I agree that (earnest and courageous) attempts to shift the internal culture are probably even better than saying your views publicly (if you're a low-profile researcher).

I still think there's an additional boost from consistently reminding people of your "this is crazy and earth should do something else" views whenever you are (e.g.) on a podcast or otherwise talking about your alignment hopes. Otherwise I think you give off a false impression that the scientists have things under control and think that the race is okay. (I think most listeners to most alignment podcasts or w/e hear lots of cheerful optimism and none of the horror that is rightly associated with >5% destruction of the whole human endeavor, and that this contributes to the culture being stuck in a bad state across many orgs.)

FWIW, it's not a crux for me whether a stop is especially feasible or the best hope to be pursuing. On my model, the world is much more likely to respond in marginally saner ways the more that decision-makers understand the problem. Saying "I think a stop would be better than what we're currently doing and beg the world to shut down everyone including us" if you believe it helps communicate your beliefs (and thus the truth, insofar as you're good at believing) even if the exact policy proposal doesn't happen. I think the equilibrium where lots and lots of people understand the gravity of the situation is probably better than the current equilibrium in lots of hard-to-articulate and hard-to-predict ways, even if the better equilibrium would not be able to pull off a full stop.

(For an intuition pump: perhaps such a world could pull off "every nation sabotages every other nation's ASI projects for fear of their own lives", as an illustration of how more understanding could help even w/out a treaty.)

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments