Is principled mass-outreach possible, for AGI X-risk?

Nicholas Kross

Over a year ago, Rohin Shah wrote this, about people trying to slow or stop AGI development through mass public outreach about the dangers of AGI:

But it really doesn't seem great that my case for wide-scale outreach being good is "maybe if we create a mass delusion of incorrect beliefs that implies that AGI is risky, then we'll slow down, and the extra years of time will help". So overall my guess is that this is net negative.
(On my beliefs, which I acknowledge not everyone shares, expecting something better than "mass delusion of incorrect beliefs that implies that AGI is risky" if you do wide-scale outreach now is assuming your way out of reality.)

I agree much more with the second paragraph than the first one.

I think there's still an angle for that few have tried in a really public way. Namely, ignorance and asymmetry. (There is definitely a better term or two for what I'm about to describe, but I forgot it. Probably from Taleb or one of the SSC posts about people being cautious in seemingly-odd ways due to their boundedness.)

One Idea

A high percentage of voting-eligible people in the US... don't vote. An even higher percentage vote in only the presidential elections, or only some presidential elections. I'd bet a lot of money that most of these people aren't working under a Caplan-style non-voting logic, but instead under something like "I'm too busy" or "it doesn't matter to me / either way / from just my vote".

Many of these people, being politically disengaged, would not be well-informed about political issues (or even have strong and/or coherent values related to those issues). What I want to see is an empirical study that asks these people "are you aware of this?" and "does that awareness, in turn, factor into you not-voting?".

I think there's a world, which we might live in, where lots of non-voters believe something akin to "Why should I vote, if I'm clueless about it? Let the others handle this lmao, just like how the ~~nice~~ smart people somewhere make my bills come in."

In a relevant sense, I think there's an epistemically-legitimate and persuasive way to communicate "AGI labs are trying to build something smarter than humans, and you don't have to be an expert (or have much of a gears-level view of what's going on) to think this is scary. If our smartest experts still disagree on this, and the mistake-asymmetry is 'unnecessary slowdown VS human extinction', then it's perfectly fine to say 'shut it down until [someone/some group] figures out what's going on'".

To be clear, there's still a ton of ways to get this wrong, and those who think otherwise are deluding themselves out of reality. I'm claiming that real-human-doable advocacy can get this right, and it's been mostly left untried.

Extreme Care Still Advised If You Do This

Most persuasion, including digital, is one-to-many "broadcast"-style; "going viral" usually just means "some broadcast happened that nobody heard of", like an algorithm suggesting a video to a lot of people at once. Given this, plus anchoring bias, you should expect and be very paranoid about the "first thing people hear = sets the conversation" thing. (Think of how many people's opinions are copypasted from the first ~~classy video essay~~ mass-market John Oliver video they saw about the subject, or the first Fox News commentary on it.)

Not only does the case for X-risk need to be made first, but it needs to be right (even in a restricted way like my above suggestion) the first time. Actually, that's another reason why my restricted-version suggestion should be prioritized, since it's more-explicitly robust to small issues.

(If somebody does this in real life, you need to clearly end on something like "Even if a minor detail like [name a specific X] or [name a specific Y] is wrong, it doesn't change the underlying danger, because the labs are still working towards Earth's next intelligent species, and there's nothing remotely strong about the 'safety' currently in place.")

In closing... am I wrong? Can we do this better?

I'm highly interested in better ideas for the goal of mass-outreach-about-AGI-X-risks, whether or not they're in the vein of my suggestion. I think alignment and EA people are too quick to jump to "mass persuasion will lead to wrong actions, or be too Dark Arts for us, or both". If it's true 90% of the time, that other 10% still seems worth aiming for!

(Few people have communications-imagination in general, and I don't think I personally have that much more of it than others here, but it seems like something that someone could have an unusually high amount of.)

And, of course, I'm (historically) likely to be missing one or more steps of logic that, if I knew it, would change my mind on the feasibility of this project. If you (a media person) want to try any of this, wait a while for contrary comments to come in, and try to interact with them.

This post is mostly copied from my own comment here.

(Epistemic status: mostly observation through heavy fog of war, partly speculation)

From your previous comment:

The "educated savvy left-leaning online person" consensus (as far as I can gather) is something like: "AI art is bad, the real danger is capitalism, and the extinction danger is some kind of fake regulatory-capture hype techbro thing which (if we even bother to look at the LW/EA spaces at all) is adjacent to racists and cryptobros".

So clearly you're aware of / agree with this being a substantial chunk of what's happening in the “mass social media” space, in which case…

Given this, plus anchoring bias, you should expect and be very paranoid about the "first thing people hear = sets the conversation" thing.

… why is this not just “お前はもう死んでいる” (that is, you are already cut off from this strategy due to things that happened before you could react) right out of the gate, at least for that (vocal, seemingly influential) subpopulation?

What I observe in many of my less-technical circles (which roughly match the above description) is that as soon as the first word exits your virtual mouth that implies that there's any substance to any underlying technology itself, good or bad or worth giving any thought to at all (and that's what gets you on the metatextual level, the frame-clinging plus some other stuff I want to gesture at but am not sure whether that's safe to do right now), beyond “mass stealing to create a class divide”, you instantly lose. At best everything you say gets interpreted as “so the flood of theft and soulless shit is going to get even worse” (and they do seem to be effectively running on a souls-based model of anticipation even if their overt dialogue isn't theistic, which is part of what creates a big inferential divide to start with). But you don't seem to be suggesting leaning into that spin, so I can't square what you're suggesting with what seem to be shared observations. Also, the less loud and angry people are still strongly focused on “AI being given responsibility it's not ready for”, so as soon as you hint at exceeding human intelligence, you lose (and you don't then get the chance to say “no no, I mean in the future”, you lose before any further words are processed).

Now, I do separately observe a subset of more normie-feeling/working-class people who don't loudly profess the above lines and are willing to e.g. openly use some generative-model art here and there in a way that suggests they don't have the same loud emotions about the current AI-technology explosion. I'm not as sure what main challenges we would run into with that crowd, and maybe that's whom you mean to target. I still think getting taken seriously would be tricky, but they might laugh at you more mirthfully instead of more derisively, and low-key repetition might have an effect. I do kind of worry that even if you start succeeding there, then the x-risk argument can get conflated with the easier-to-spread “art theft”, “laundering bias”, etc. models (either accidentally, or deliberately by adversaries) and then this second crowd maybe gets partly converted to that, partly starts rejecting you for looking too similar to that, and partly gets driven underground by other people protesting their benefiting from the current-day mundane-utility aspect.

I also observe a subset of business-oriented people who want the mundane utility a lot but often especially want to be on the hype train for capital-access or marketing reasons, or at least want to keep their friends and business associates who want that. I think they're kind of constrained in what they can openly say or do and might be receptive to strategic thinking about x-risk but ultimately dead ends for acting on it—but maybe that last part can be changed with strategic shadow consensus building, which is less like mass communication and where you might have more leeway and initial trust to work with. Obviously, if someone is already doing that, we don't necessarily see it posted on LW. There's probably some useful inferences to be drawn from events like the OpenAI board shakeup here, but I don't know what they are right now.

FWIW, I have an underlying intuition here that's something like “if you're going to go Dark Arts, then go big or go home”, but I don't really know how to operationalize that in detail and am generally confused and sad. In general, I think people who have things like “logical connectives are relevant to the content of the text” threaded through enough of their mindset tend to fall into a trap analogous to the “Average Familiarity” xkcd or to Hofstadter's Law when they try truly-mass communication unless they're willing to wrench things around in what are often very painful ways to them, and (per the analogies) that this happens even when they're specifically trying to correct for it.

Now, I do separately observe a subset of more normie-feeling/working-class people who don't loudly profess the above lines and are willing to e.g. openly use some generative-model art here and there in a way that suggests they don't have the same loud emotions about the current AI-technology explosion. I'm not as sure what main challenges we would run into with that crowd, and maybe that's whom you mean to target.

That's... basically what my proposal is? Yeah? People that aren't already terminally-online about AI, but may still use chatGPT and/or StableDiffusion for fun or even work. Or (more common) those who don't even have that much interaction, who just see AI as yet another random thingy in the headlines.

Facepalm at self. You're right, of course. I think I confused myself about the overall context after reading the end-note link there and went off at an angle.

Now to leave the comment up for history and in case it contains some useful parts still, while simultaneously thanking the site designers for letting me un-upvote myself. 😛

No worries! I make similar mistakes all the time (just check my comment history ;-;)

And I do think your comment is useful, in the same way that Rohin's original comment (which my post is responding to) is useful :)

FWIW, I have an underlying intuition here that's something like “if you're going to go Dark Arts, then go big or go home”, but I don't really know how to operationalize that in detail and am generally confused and sad. In general, I think people who have things like “logical connectives are relevant to the content of the text” threaded through enough of their mindset tend to fall into a trap analogous to the “Average Familiarity” xkcd or to Hofstadter's Law when they try truly-mass communication unless they're willing to wrench things around in what are often very painful ways to them, and (per the analogies) that this happens even when they're specifically trying to correct for it.

I disagree with the first sentence, but agree strongly with the rest of it. My whole point is that it may be literally possible to make:

mass-audience arguments
about extinction risk from AI
that don't involve lying.

Maybe we mean different things by "Dark Arts" here? I don't actually consider (going hard with messaging like) "This issue is complicated, but you [the audience member] understandably don't want to deal with it, so we should go harder on preventing risk for now based on the everyday risk-avoidance you probably practice yourself." as lying or manipulation. You could call it Dark Arts if you drew the "Dark Arts" cluster really wide, but I would disagree with that cluster-drawing.