Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(Co-written by Connor Leahy and Gabe)

We have talked to a whole bunch of people about pauses and moratoriums. Members of the AI safety community, investors, business peers, politicians, and more.

Too many claimed to pursue the following approach:

  1. It would be great if AGI progress stopped, but that is infeasible.
  2. Therefore, I will advocate for what I think is feasible, even if it is not ideal. 
  3. The Overton window being what it is, if I claim a belief that is too extreme, or endorse an infeasible policy proposal, people will take me less seriously on the feasible stuff.
  4. Given this, I will be tactical in what I say, even though I will avoid stating outright lies.

Consider if this applies to you, or people close to you.

If it does, let us be clear: hiding your beliefs, in ways that predictably leads people to believe false things, is lying. This is the case regardless of your intentions, and regardless of how it feels.

Not only is it morally wrong, it makes for a terrible strategy. As it stands, the AI Safety Community itself can not coordinate to state that we should stop AGI progress right now!

Not only can it not coordinate, the AI Safety Community is defecting, by making it more costly for people who do say it to say it.

We all feel like we are working on the most important things, and that we are being pragmatic realists.

But remember: If you feel stuck in the Overton window, it is because YOU ARE the Overton window.

1. The AI Safety Community is making our job harder

In a saner world, all AGI progress should have already stopped. If we don’t, there’s more than a 10% chance we all die.

Many people in the AI safety community believe this, but they have not stated it publicly. Worse, they have stated different beliefs more saliently, which misdirect everyone else about what should be done, and what the AI safety community believes.

To date, in our efforts to inform, motivate and coordinate with people: People in the AI Safety Community publicly lying has been one of the biggest direct obstacles we have encountered.

The newest example of this is ”Responsible Scaling Policies”, with many AI Safety people being much more vocal about their endorsement of RSPs than their private belief that in a saner world, all AGI progress should stop right now.

Because of them, we have been told many times that we are a minority voice, and that most people in the AI Safety community (understand, Open Philanthropy adjacent) disagree that we should stop all AGI progress right now.

That actually, there is an acceptable way to continue scaling! And given that this makes things easier, if there is indeed an acceptable way to continue scaling, this is what we should do, rather than stop all AGI progress right now!

Recently, Dario Amodei (Anthropic CEO), has used the RSP to frame the moratorium position as the most extreme version of an extreme position, and this is the framing that we have seen used over and over again. ARC mirrors this in their version of the RSP proposal, describing itself as a “pragmatic middle ground” between a moratorium and doing nothing.

Obviously, all AGI Racers use this against us when we talk to people.

There are very few people that we have consistently seen publicly call for a stop to AGI progress. The clearest ones are Eliezer’s “Shut it All Down” and Nate’s “Fucking stop”.

The loudest silence is from Paul Christiano, whose RSPs are being used to safety-wash scaling.

Proving me wrong is very easy. If you do believe that, in a saner world, we would stop all AGI progress right now, you can just write this publicly.

When called out on this, most people we talk to just fumble.

2. Lying for Personal Gain

We talk to many people who publicly lie about their beliefs.

The justifications are always the same: “it doesn’t feel like lying”, “we don’t state things we do not believe”, “we are playing an inside game, so we must be tactical in what we say to gain influence and power”.

Let me call this for what it is: lying for personal gain. If you state things whose main purpose is to get people to think you believe something else, and you do so to gain more influence and power: you are lying for personal gain.

The results of this “influence and power-grabbing” has many times over materialised with the safety-washing of the AGI race. What a coincidence it is that DeepMind, OpenAI and Anthropic are all related to the AI Safety community.

The only benefit we see from this politicking is the people lying gain more influence, while the time we have left to AGI keeps getting shorter.

Consider what happens when a community rewards the people who gain more influence by lying!

So many people lie, and they screw not only humanity, but one another.

Many AGI corp leaders will privately state that in a saner world, AGI progress should stop, but they will not state it because it would hurt their ability to race against each other!

Safety people will lie so that they can keep ties with labs in order to “pressure them” and seem reasonable to politicians.

Whatever: they just lie to gain more power.

“DO NOT LIE PUBLICLY ABOUT GRAVE MATTERS” is a very strong baseline. If you want to defect, you need a much stronger reason than “it will benefit my personal influence, and I promise I’ll do good things with it”.

And you need to accept the blame when you’re called out. You should not muddy the waters by justifying your lies, covering them, telling people they misunderstood, and try to maintain more influence within the community.

We have seen so many people be taken in this web of lies: from politicians and journalists, to engineers and intellectuals, all up until the concerned EA or regular citizen who wants to help, but is confused by our message when it looks like the AI safety community is ok with scaling.

Your lies compound and make the world a worse place.

There is an easy way to fix this situation: we can adopt the norm of publicly stating our true beliefs about grave matters.

If you know someone who claims to believe that in a saner world we should stop all AGI progress, tell them to publicly state their beliefs, unequivocally. Very often, you’ll see them fumbling, caught in politicking. And not that rarely, you’ll see that they actually want to keep racing. In these situations, you might want to stop finding excuses for them.

3. The Spirit of Coordination

A very sad thing that we have personally felt is that it looks like many people are so tangled in these politics that they do not understand what the point of honesty even is.

Indeed, from the inside, it is not obvious that honesty is a good choice. If you are honest, publicly honest, or even adversarially honest, you just make more opponents, you have less influence, and you can help less.

This is typical deontology vs consequentialism. Should you be honest, if from your point of view, it increases the chances of doom?

The answer is YES.

a) Politicking has many more unintended consequences than expected.

Whenever you lie, you shoot potential allies at random in the back.
Whenever you lie, you make it more acceptable for people around you to lie.

b) Your behavior, especially if you are a leader, a funder or a major employee (first 10 employees, or responsible for >10% of the headcount of the org), ripples down to everyone around you.

People lower in the respectability/authority/status ranks do defer to your behavior.
People outside of these ranks look at you.
Our work toward stopping AGI progress becomes easier whenever a leader/investor/major employee at Open AI, DeepMind, Anthropic, ARC, Open Philanthropy, etc. states their beliefs about AGI progress more clearly.
 

c) Honesty is Great.

Existential Risks from AI are now going mainstream. Academics talk about it. Tech CEOs talk about it. You can now talk about it, not be a weirdo, and gain more allies. Polls show that even non-expert citizens express diverse opinions about super intelligence.

Consider the following timeline:

  • ARC & Open Philanthropy state in a press release “In a sane world, all AGI progress should stop. If we don’t, there’s more than a 10% chance we will all die.
  • People at AGI labs working in the safety teams echo this message publicly.
  • AGI labs leaders who think this state it publicly.
  • We start coordinating explicitly against orgs (and groups within orgs) that race.
  • We coordinate on a plan whose final publicly stated goal is to get to a world state that, most of us agree is not one where humanity’s entire existence is at risk.
  • We publicly, relentlessly optimise for this plan, without compromising on our beliefs.

Whenever you lie for personal gain, you fuck up this timeline.

When you start being publicly honest, you will suffer a personal hit in the short term. But we truly believe that, coordinated and honest, we will have timelines much longer than any Scaling Policy will ever get us.

New Comment
74 comments, sorted by Click to highlight new comments since: Today at 8:52 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings
[-]paulfchristiano6moΩ6113071

Here is a short post explaining some of my views on responsible scaling policies, regulation, and pauses I wrote it last week in response to several people asking me to write something. Hopefully this helps clear up what I believe.

I don’t think I’ve ever hidden my views about the dangers of AI or the advantages of scaling more slowly and carefully. I generally aim to give honest answers to questions and present my views straightforwardly. I often point out that catastrophic risk would be lower if we could coordinate to build AI systems later and slower; I usually caveat that doing so seems costly and politically challenging and so I expect it to require clearer evidence of risk.

[-]ryan_greenblatt6moΩ4110162

I think this post is quite misleading and unnecessarily adversarial.

I'm not sure if I want to engage futher, I might give examples of this later. (See examples below)

(COI: I often talk to and am friendly with many of the groups criticized in this post.)

[-]ryan_greenblatt6moΩ6513366

Examples:

  • It seems to conflate scaling pauses (which aren't clearly very useful) with pausing all AI related progress (hardware, algorithmic development, software). Many people think that scaling pauses aren't clearly that useful due to overhang issues, but hardware pauses are pretty great. However, hardware development and production pauses would clearly be extremely difficult to implement. IMO the sufficient pause AI ask is more like "ask nvidia/tsmc/etc to mostly shut down" rather than "ask AGI labs to pause".
  • More generally, the exact type of pause which would actually be better than (e.g.) well implemented RSPs is a non-trivial technical problem which makes this complex to communicate. I think this is a major reason why people don't say stuff like "obviously, a full pause with XYZ characteristics would be better". For instance, if I was running the US, I'd probably slow down scaling considerably, but I'd mostly be interested in implementing safety standards similar to RSPs due to lack of strong international coordination.
  • The post says "many people believe" a "pause is necessary" claim[1], but the exact claim you state probably isn't actually believed by the people you cite b
... (read more)
  • The title doesn't seem supported by the content. The post doesn't argue that people are being cowardly or aren't being strategic (it does argue they are incorrect and seeking power in a immoral way, but this is different).
7Shankar Sivarajan6mo
As an aside, this seems to be a general trend: I have seen people defend misleading headlines on news articles with suggestions that the title should be judged independent of the content. I disagree.
4Eli Tyre6mo
Well, the author of an article often doesn't decide the the title of the post. The editor does that. So it can be the case that an author wrote a reasonable and nuanced piece, and then the editor added an outrageous click-bait headline.
0M. Y. Zuo6mo
Yes, but the author wasn't forced at gunpoint, presumably, to work with that particular editor. So then the question can be reframed as: why did the author choose to work with an editor that seems untrustworthy?
3Michael Levine6mo
Journalists at most news outlets do not choose which editor(s) they work with on a given story, except insofar as they choose to not quit their job. This does not feel like a fair basis on which to hold the journalist responsible for the headline chosen by their editor(s).
1M. Y. Zuo6mo
Why does it not feel like a fair basis? Maybe if they were deceived into thinking the editor was genuine and trustworthy, but otherwise if they knew they're working with someone untrustworthy , and they still choose to associate their names together publicly, then obviously it impacts their credibility.
1Michael Levine6mo
Insofar as a reporter works for an outlet that habitually writes misleading headlines, that does undermine the credibility of the reporter, but that's partly true because outlets that publish grossly misleading headlines tend to take other ethical shortcuts as well. But without that general trend or a broader assessment of an outlet's credibility, it's possible that an otherwise fair story would get a misleading headline through no fault of the reporter, and it would be incorrect to judge the reporter for that (as Eli says above).
8DanielFilan6mo
Surely if you were running the US, that would be a great position to try to get international coordination on policies you think are best for everyone?
4ryan_greenblatt6mo
Sure, but seems reasonably likely that it would be hard to get that much international coordination.
2DanielFilan6mo
Maybe - but you definitely can't get it if you don't even try to communicate the thing you think would be better.
8Joe_Collman6mo
[I agree with most of this, and think it's a very useful comment; just pointing out disagreements] I assume this would be a crux with Connor/Gabe (and I think I'm at least much less confident in this than you appear to be). * We're already in a world where stopping appears necessary. * It's entirely possible we all die before stopping was clearly necessary. * What gives you confidence that RSPs would actually trigger a pause? * If a lab is stopping for reasons that aren't based on objective conditions in an RSP, then what did the RSP achieve? * Absent objective tests that everyone has signed up for, a lab may well not stop, since there'll always be the argument "Well we think that the danger is somewhat high, but it doesn't help if only we pause". * It's far from clear that we'll get objective and sufficient conditions for safety (or even for low risk). I don't expect us to - though it'd obviously be nice to be wrong. * [EDIT: or rather, ones that allow scaling to continue safely - we already know sufficient conditions for safety: stopping] I think the objection here is more about what is loosely suggested by the language used, and what is not said - not about logical implications. What is loosely suggested by the ARC Evals language is that it's not sensible to aim for the more "extreme" end of things (pausing), and that this isn't worthy of argument. Perhaps ARC Evals have a great argument , but they don't make one. I think it's fair to say that they argue the middle ground is practical. I don't think it can be claimed they argue for pragmatic until they address both the viability of other options, and the risks of various courses. Doing a practical thing that would predictably lead to higher risk is not pragmatic. It's not clear what the right course here, but making no substantive argument gives a completely incorrect impression. If they didn't think it was the right place for such an argument, then it'd be easy to say that: that this is a c

Thanks for the response, one quick clarification in case this isn't clear.

On:

For instance, I think that well implemented RSPs required by a regulatory agency can reduce risk to <5% (partially by stopping in worlds where this appears needed).

I assume this would be a crux with Connor/Gabe (and I think I'm at least much less confident in this than you appear to be).

It's worth noting here that I'm responding to this passage from the text:

In a saner world, all AGI progress should have already stopped. If we don’t, there’s more than a 10% chance we all die.

Many people in the AI safety community believe this, but they have not stated it publicly. Worse, they have stated different beliefs more saliently, which misdirect everyone else about what should be done, and what the AI safety community believes.

I'm responding to the "many people believe this" which I think implies that the groups they are critiquing believe this. I want to contest what these people believe, not what is actually true.

Like many of therse people think policy interventions other than pause reduce X-risk below 10%.

Maybe I think something like (numbers not well considered):

  • P(doom) = 35%
  • P(doom | scaling pa
... (read more)
4Joe_Collman6mo
Ah okay - thanks. That's clarifying. Agreed that the post is at the very least not clear. In particular, it's obviously not true that [if we don't stop today, there's more than a 10% chance we all die], and I don't think [if we never stop, under any circumstances...] is a case many people would be considering at all. It'd make sense to be much clearer on the 'this' that "many people believe". (and I hope you're correct on P(doom)!)
6ryan_greenblatt6mo
Yeah, I probably want to walk back my claim a bit. Maybe I want to say "doesn't strongly imply"? It would have been better if ARC evals noted that the conclusion isn't entirely obvious. It doesn't seem like a huge error to me, but maybe I'm underestimating the ripple effects etc.

As an aside, I think it's good for people and organizations (especially AI labs) to clearly state their views on AI risk, see e.g., my comment here. So I agree with this aspect of the post.

Stating clear views on what ideal government/international policy would look like also seems good.

(And I agree with a bunch of other misc specific points in the post like "we can maybe push the overton window far" and "avoiding saying true things to retain respectability in order to get more power is sketchy".)

(Edit: from a communication best practices perspective, I wish I noted where I agree in the parent comment than here.)

(Conflict of interest note: I work at ARC, Paul Christiano's org. Paul did not ask me to write this comment. I first heard about the truck (below) from him, though I later ran into it independently online.)

There is an anonymous group of people called Control AI, whose goal is to convince people to be against responsible scaling policies because they insufficiently constraint AI labs' actions. See their Twitter account and website (also anonymous Edit: now identifies Andrea Miotti of Conjecture as the director). (I first ran into Control AI via this tweet, which uses color-distorting visual effects to portray Anthropic CEO Dario Amodei in an unflattering light, in a way that's reminiscent of political attack ads.)

Control AI has rented a truck that had been circling London's Parliament Square. The truck plays a video of "Dr. Paul Christiano (Made ChatGPT Possible; Government AI adviser)" saying that there's a 10-20% chance of an AI takeover and an overall 50% chance of doom, and of Sam Altman saying that the "bad case" of AGI is "lights out for all of us". The back of the truck says "Responsible Scaling: No checks, No limits, No control". The video of Paul seems to me to be an attack... (read more)

Reply4111

The About Us page from the Control AI website has now been updated to say "Andrea Miotti (also working at Conjecture) is director of the campaign." This wasn't the case on the 18th of October

Thumbs up for making the connection between the organizations more transparent/clear.

The video of Paul seems to me to be an attack on Paul (but see Twitter discussion here).

This doesn't seem right. As the people in the Twitter discussion you link say, it seems to mostly use Paul as a legitimate source of an x-risk probability, with maybe also a bit of critique of him having nevertheless helped build chat-GPT, but neither seems like an attack in a strictly negative sense. It feels like a relatively normal news snippet or something.

I feel confused about the truck. The video seems fine to me and seems kind of decent advocacy. The quotes used seem like accurate representations of what the people presented believe. The part that seems sad is that it might cause people to think the ones pictured also agree with other things that the responsible scaling website says, which seems misleading.

I don't particularly see a reason to dox the people behind the truck, though I am not totally sure. My bar against doxxing is pretty high, though I do care about people being held accountable for large scale actions they take.

5Eric Neyman6mo
To elaborate on my feelings about the truck: * If it is meant as an attack on Paul, then it feels pretty bad/norm-violating to me. I don't know what general principle I endorse that makes it not okay: maybe something like "don't attack people in a really public and flashy way unless they're super high-profile or hold an important public office"? If you'd like I can poke at the feeling more. Seems like some people in the Twitter thread (Alex Lawsen, Neel Nanda) share the feeling. * If I'm wrong and it's not an attack, I still think they should have gotten Paul's consent, and I think the fact that it might be interpreted as an attack (by people seeing the truck) is also relevant. (Obviously, I think the events "this is at least partially an attack on Paul" and "at least one of the authors of this post are connected to Control AI" are positively correlated, since this post is an attack on Paul. My probabilities are roughly 85% and 97%*, respectively.) *For a broad-ish definition of "connected to" That's fair. I think that it would be better for the world if Control AI were not anonymous, and I judge the group negatively for being anonymous. On the other hand, I don't think I endorse them being doxxed. So perhaps my request to Connor and Gabriel is: please share what connection you have to Control AI, if any, and share what more information you have permission to share.

Connor/Gabriel -- if you are connected with Control AI, I think it's important to make this clear, for a few reasons. First, if you're trying to drive policy change, people should know who you are, at minimum so they can engage with you. Second, I think this is particularly true if the policy campaign involves attacks on people who disagree with you. And third, because I think it's useful context for understanding this post.

This seems like a general-purpose case against anonymous political speech that contains criticism ("attacks") of the opposition. But put like that, it seems like there are lots of reasons people might want to speak anonymously (e.g. to shield themselves from unfair blowback). And your given reasons don't seem super persuasive - you can engage with people who say they agree with the message (or do broad-ranged speech of your own), reason 2 isn't actually a reason, and the post was plenty understandable to me without the context.

2[comment deleted]6mo

(Note: I work with Paul at ARC theory. These views are my own and Paul did not ask me to write this comment.)

I think the following norm of civil discourse is super important: do not accuse someone of acting in bad faith, unless you have really strong evidence. An accusation of bad faith makes it basically impossible to proceed with discussion and seek truth together, because if you're treating someone's words as a calculated move in furtherance of their personal agenda, then you can't take those words at face value.

I believe that this post violates this norm pretty egregiously. It begins by saying that hiding your beliefs "is lying". I'm pretty confident that the sort of belif-hiding being discussed in the post is not something most people would label "lying" (see Ryan's comment), and it definitely isn't a central example of lying. (And so in effect it labels a particular behavior "lying" in an attempt to associate it with behaviors generally considered worse.)

The post then confidently asserts that Paul Christiano hides his beliefs in order to promote RSPs. This post presents very little evidence presented that this is what's going on, and Paul's account seems consistent ... (read more)

5Jiro6mo
Hiding your beliefs in ways that predictably leads people to believe false things is lying.
[-]307th6mo6318

I believe you're wrong on your model of AI risk and you have abandoned the niceness/civilization norms that act to protect you from the downside of having false beliefs and help you navigate your way out of them. When people explain why they disagree with you, you accuse them of lying for personal gain rather than introspect about their arguments deeply enough to get your way out of the hole you're in.

First, this is a minor point where you're wrong, but it's also a sufficiently obvious point that it should hopefully  make clear how wrong your world model is: AI safety community in general, and DeepMind + Anthropic + OpenAI in particular, have all made your job FAR easier. This should be extremely obvious upon reflection, so I'd like you to ask yourself how on earth you ever thought otherwise. CEOs of leading AI companies publicly acknowledging AI risk has been absolutely massive for public awareness of AI risk and its credibility. You regularly bring up how CEOs of leading AI companies acknowledge AI risk as a talking point, so I'd hope that on some level you're aware that your success in public advocacy would be massively reduced in the counterfactual case where the leading A... (read more)

Reply982222111

Eighth, yes, working in AI capabilities is absolutely a reasonable alignment plan that raises odds of success immensely. I know, you're so overconfident on this point that even reading this will trigger you to dismiss my comment. And yet it's still true - and what's more, obviously so. I don't know how you and others egged each other into the position that it doesn't matter whether the people working on AI care about AI risk, but it's insane.

I agreed with most of your comment until this line. Is your argument that, there's a lot of nuance to getting safety right, we're plausibly in a world where alignment is hard but possible, and the makers of AGI deeply caring about alignment, being cautious and not racing, etc, could push us over the line of getting alignment to work? I think this argument seems pretty reasonable, but that you're overstating the case here and that this strategy could easily be net bad if you advance capabilities a lot. And "alignment is basically impossible unless something dramatically changes" also seems like a defensible position to me

4307th6mo
I don't expect most people to agree with that point, but I do believe it. It ends up depending on a lot of premises, so expanding on my view there in full would be a whole post of its own. But to try to give a short version:  There are a lot of specific reasons I think having people working in AI capabilities is so strongly +EV. But I don't expect people to agree with those specific views. The reason I think it's obvious is that even when I make massive concessions to the anti-capabilities people, these organizations... still seem +EV? Let's make a bunch of concessions: 1. Alignment will be solved by theoretical work unrelated to capabilities. It can be done just as well at an alignment-only organization with limited funding as it can at a major AGI org with far more funding.  2. If alignment is solved, that automatically means future ASI will be built using this alignment technique, regardless of whether leading AI orgs actually care about alignment at all. You just publish a paper saying "alignment solution, pls use this Meta" and Meta will definitely do it. 3. Alignment will take a significant amount of time - probably decades.  4. ASI is now imminent; these orgs have reduced timelines to ASI by 1-5 years. 5. Our best chance of survival is a total stop, which none of the CEOs of these orgs support. Even given all five of these premises... Demis Hassabis, Dario Amodei, and Sam Altman have all increased the chance of a total stop, by a lot. By more than almost anyone else on the planet, in fact. Yes, even though they don't think it's a good idea right now and have said as much (I think? haven't followed all of their statements on AI pause).  That is, the chance of a total stop is clearly higher in this world than in the counterfactual one where any of Demis/Dario/Sam didn't go into AI capabilities, because a CEO of a leading AI organization saying "yeah I think AI could maybe kill us all" is something that by default would not happen. As I said before, most
3rotatingpaguro6mo
I have the impression that the big guys started taking AI risk seriously when they saw capabilities that impressed them. So I expect that if Musk, Altman & the rest of the Dreamgrove did not embark in pushing the frontier faster than it was moving otherwise, at the same capability level AI researchers would have taken it seriously the same. Famous AI scientists already knew about the AI risk arguments; where OpenAI made a difference was not in telling them about AI risk, but shoving GPT up their nose. I think the public would then have been able to side with Distinguished Serious People raising warnings about the dangers of ultra-intellingent machines even if Big Corp claimed otherwise.

First, this is a minor point where you're wrong, but it's also a sufficiently obvious point that it should hopefully  make clear how wrong your world model is: AI safety community in general, and DeepMind + Anthropic + OpenAI in particular, have all made your job FAR easier. This should be extremely obvious upon reflection, so I'd like you to ask yourself how on earth you ever thought otherwise. CEOs of leading AI companies publicly acknowledging AI risk has been absolutely massive for public awareness of AI risk and its credibility. You regularly bring up how CEOs of leading AI companies acknowledge AI risk as a talking point, so I'd hope that on some level you're aware that your success in public advocacy would be massively reduced in the counterfactual case where the leading AI orgs are Google Brain, Meta, and NVIDIA, and their leaders were saying "AI risk? Sounds like sci-fi nonsense!" 

The fact that people disagree with your preferred method of reducing AI risk does not mean that they are EVIL LIARS who are MAKING YOUR JOB HARDER and DOOMING US ALL.

I disagree this is obviously wrong. I think you are not considering the correct counterfactual. From Connor L.'s point of... (read more)

[-]307th6mo1414

Yeah, fair enough.

But I don't think that would be a sensible position. The correct counterfactual is in fact the one where Google Brain, Meta, and NVIDIA led the field. Like, if DM + OpenAI + Anthropic didn't exist - something he has publicly wished for - that is in fact the most likely situation we would find. We certainly wouldn't find CEOs who advocate for a total stop on AI.

2307th6mo
(Ninth, I am aware of the irony of calling for more civil discourse in a highly inflammatory comment. Mea culpa)
[-]Eli Tyre6moΩ184711

Man, I agree with almost all the content of this post, but dispute the framing. This seems like maybe an opportunity to write up some related thoughts about transparency in the x-risk ecosystem.

 

A few months ago, I had opportunity to talk with a number of EA-aligned or x-risk concerned folks working in policy or policy adjacent roles as part of a grant evaluation process. My views here are informed by those conversations, but I am overall quite far from the action of AI policy stuff. I try to carefully flag my epistemic state regarding the claims below.

Omission

I think a lot of people, especially in AI governance, are...  

  1. Saying things that they think are true
  2. while leaving out other important things that they think are true, but are also so extreme or weird-sounding that they would lose credibility.

A central example is promoting regulations on frontier AI systems because powerful AI systems could develop bio-weapons that could be misused to wipe out large swaths of humanity. 

I think that most of the people promoting that policy agenda with that argumentation, do in fact think that AI-developed bioweapons are a real risk of the next 15 years. And, I guess, many to most... (read more)

[-]habryka6moΩ14378

If a person was asked point-blank about the risk AI takeover, and they gave an answer that implied the risk was lower than they think it is, in private, I would consider that a lie

[...]

That said, my guess is that many of the people that I'm thinking of, in these policy positions, if they were asked, point blank, might lie in exactly that way. I have no specific evidence of that, but it does seem like the most likely way many of them would respond, given their overall policy about communicating their beliefs. 

As a relevant piece of evidence here, Jason Matheny, when asked point-blank in a senate committee hearing about "how concerned should we be about catastrophic risks from AI?" responded with "I don't know", which seems like it qualifies as a lie by the standard you set here (which, to be clear, I don't super agree with and my intention here is partially to poke holes in your definition of a lie, while also sharing object-level relevant information).

See this video 1:39:00 to 1:43:00: https://www.hsgac.senate.gov/hearings/artificial-intelligence-risks-and-opportunities/ 

Quote (slightly paraphrased because transcription is hard): 

Senator Peters: "The last question be

... (read more)

If his beliefs are what I would have expected them to be (eg something like "agrees with the basic arguments laid out in Superintelligence, and was motivated to follow his current carrer trajectory by those arguments"), then this answer is at best, misleading and misrepresentation of his actual models. 

Seeing this particular example, I'm on the fence about whether to call it a "lie". He was asked about the state of the world, not about his personal estimates, and he answered in a way that was more about the state of knowable public knowledge rather than his personal estimate. But I agree that seems pretty hair-splitting. 

As it is, I notice that I'm confused. 

Why wouldn't he say something to the effect of the following?

I don't know; this kind of forecasting is very difficult, timelines forecasting is very difficult. I can't speak with confidence one way or the other. However, my best guess from following the literature on this topic for many years is that the catastrophic concerns are credible. I don't know how probable it is, but does not seem to me that it is merely outlandish sci fi scenario that AI will lead to human extinction, and is not out of the question that

... (read more)

I think your interpretation is fairly uncharitable. If you have further examples of this deceptive pattern from those sympathetic to AI risk I would change my perspective but the speculation in the post plus this example weren't compelling:  

I watched the video and firstly Senator Peters seems to trail off after the quoted part and ends his question by saying "What's your assessment of how fast this is going and when do you think we may be faced with those more challenging issues?". So straightforwardly his question is about timelines not about risk as you frame it. Indeed Matheny (after two minutes) literally responds "it's a really difficult question. I think whether AGI is nearer or farther than thought ..." (emphasis different to yours) so makes it likely to me Matheny is expressing uncertainty about timelines, not risk.

Overall I agree that this was an opportunity for Matheny to discuss AI x-risk and plausibly it wasn't the best use of time to discuss the uncertainty of the situation. But saying this is dishonesty doesn't seem well supported

No, the question was about whether there are apocalyptic risks and on what timeline we should be concerned about apocalyptic risks

The questioner used the term 'apocalyptic' specifically. Three people answered the question, and the first two both also alluded to 'apocalyptic' risks and sort of said that they didn't really think we need to think about that possibility. Them referring to apocalyptic risks goes to show that it was a key part of what the questioner wanted to understand — to what extent these risks are real and on what timeline we'll need to react to them. My read is not that Matheny actively misled the speaker, but that he avoided answering, which is "hiding" rather than "lying" (I don't agree with the OP that they're identical). 

I think the question was unclear so it was more acceptable to not directly address whether there is apocalyptic risk, but I think many people I know would have definitely said "Oh to be clear I totally disagree with the previous two people, there are definitely apocalyptic risks and we are not prepared for them and cannot deal with them after-the-fact (as you just mentioned being concerned about)."

———

More detail on what happened and... (read more)

2Eli Tyre6mo
Why, though?  Does he know something we don't? Does he think that if he expresses that those risks are real he'll lose political capital? People won't put him or his friends in positions of power, because he'll be branded as a kook? Is he just in the habit of side-stepping the weird possibilities? This looks to me, from the outside, like an unforced error. They were asking the question, about some core beliefs, pretty directly. It seems like it would help if, in every such instance, the EA people who think that the world might be destroyed by AGI in the next 20 years, say that they think that the world might be destroyed by AGI in the next 20 years.  
3habryka6mo
As Ben said, this seems incongruent with the responses that the other two people gave, neither of which talked that much about timelines, but did seem to directly respond to the concern about catastrophic/apocalyptic risk from AGI.  I do agree that it's plausible that Matheny somehow understood the question differently from the other two people, and interpreted it in a more timelines focused way, though he also heard the other two people talk, which makes that somewhat less likely. I do agree that the question wasn't asked in the most cogent way.
4Arthur Conmy6mo
Thanks for checking this! I mostly agree with all your original comment now (except the first part suggesting it was point blank, but we're quibbling over definitions at this point), this does seem like a case of intentionally not discussing risk
1simeon_c6mo
A few other examples off the top of my head: * ARC graph on RSPs with the "safe zone" part * Anthropic calling ASL-4 accidental risks "speculative" * the recent TIME article saying there's no trade off between progress and safety More generally, for having talked to many AI policy/safety members, I can say it's a very common pattern. At the eve of the FLI open letter, one of the most senior persons in the AI governance & policy X risk community was explaining that it was stupid to write this letter and that it would make future policy efforts much more difficult etc.
[-]evhub6moΩ6108

I agree that it is important to be clear about the potential for catastrophic AI risk, and I am somewhat disappointed in the answer above (though I think calling "I don't know" lying is a bit of a stretch). But on the whole, I think people have been pretty upfront about catastrophic risk, e.g. Dario has given an explicit P(doom) publicly, all the lab heads have signed the CAIS letter, etc.

Notably, though, that's not what the original post is primarily asking for: it's asking for people to clearly state that they agree that we should pause/stop AI development, not to clearly state that that they think AI poses a catastrophic risk. I agree that people should clearly state that they think there's a catastrophic risk, but I disagree that people should clearly state that they think we should pause.

Primarily, that's because I don't actually think trying to get governments to enact some sort of a generic pause would make good policy. Analogizing to climate change, I think getting scientists to say publicly that they think climate change is a real risk helped the cause, but putting pressure on scientists to publicly say that environmentalism/degrowth/etc. would solve the problem has substantially hurt the cause (despite the fact that a magic button that halved consumption would probably solve climate change).

[-]Joe_Collman6moΩ104423

I agree with most of this, but I think the "Let me call this for what it is: lying for personal gain" section is silly and doesn't help your case.

The only sense in which it's clear that it's "for personal gain" is that it's lying to get what you want.
Sure, I'm with you that far - but if what someone wants is [a wonderful future for everyone], then that's hardly what most people would describe as "for personal gain".
By this logic, any instrumental action taken towards an altruistic goal would be "for personal gain".

That's just silly.
It's unhelpful too, since it gives people a somewhat legitimate reason to dismiss the broader point.

Of course it's possible that the longer-term altruistic goal is just a rationalization, and people are after power for its own sake, but I don't buy that this is often true - at least not in any clean [they're doing this and only this] sense. (one could have similar altruistic-goal-is-rationalization suspicions about your actions too)

In many cases, I think overconfidence is sufficient explanation.
And if we get into "Ah, but isn't it interesting that this overconfidence leads to power gain", then I'd agree - but then I claim that you should distinguish [con... (read more)

[-]Vaniver6moΩ7170

The only sense in which it's clear that it's "for personal gain" is that it's lying to get what you want.
Sure, I'm with you that far - but if what someone wants is [a wonderful future for everyone], then that's hardly what most people would describe as "for personal gain".

If Alice lies in order to get influence, with the hope of later using that influence for altruistic ends, it seems fair to call the influence Alice gets 'personal gain'. After all, it's her sense of altruism that will be promoted, not a generic one.

[-]Joe_Collman6moΩ101917

This is not what most people mean by "for personal gain". (I'm not disputing that Alice gets personal gain)

Insofar as the influence is required for altruistic ends, aiming for it doesn't imply aiming for personal gain.
Insofar as the influence is not required for altruistic ends, we have no basis to believe Alice was aiming for it.

"You're just doing that for personal gain!" is not generally taken to mean that you may be genuinely doing your best to create a better world for everyone, as you see it, in a way that many would broadly endorse.

In this context, an appropriate standard is the post's own:
Does this "predictably lead people to believe false things"?
Yes, it does. (if they believe it)

"Lying for personal gain" is a predictably misleading description, unless much stronger claims are being made about motivation (and I don't think there's sufficient evidence to back those up).

The "lying" part I can mostly go along with. (though based on a contextual 'duty' to speak out when it's unusually important; and I think I'd still want to label the two situations differently: [not speaking out] and [explicitly lying] may both be undesirable, but they're not the same thing)
(I don't really think in terms of duties, but it's a reasonable shorthand here)

5Gabriel Alfour6mo
I think you are making a genuine mistake, and that I could have been clearer. There are instrumental actions that favour everyone (raising epistemic standards), and instrumental actions that favour you (making money). The latter are for personal gains, regardless of your end goals.   Sorry for not getting deeper into it in this comment.  This is quite a vast topic. I might instead write a longer post about the interactions of deontology & consequentialism, and egoism & altruism.
4Joe_Collman6mo
(With "this logic" I meant to refer to ["for personal gain" = "getting what you want"]. But this isn't important) If we're sticking to instrumental actions that do favour you (among other things), then the post is still incorrect: [y is one consequence of x] does not imply [x is for y] The "for" says something about motivation. Is an action that happens to be to my benefit necessarily motivated by that? No. (though more often than I'd wish to admit, of course) If you want to claim that it's bad to [Lie in such a way that you get something that benefits you], then make that claim (even though it'd be rather silly - just "lying is bad" is simpler and achieves the same thing). If you're claiming that people doing this are necessarily lying in order to benefit themselves, then you are wrong. (or at least the only way you'd be right is by saying that essentially all actions are motivated by personal gain) If you're claiming that people doing this are in fact lying in order to benefit themselves, then you should either provide some evidence, or lower your confidence in the claim.
4Joe_Collman6mo
If it's clearer with an example, suppose that the first action on the [most probable to save the world] path happens to get me a million dollars. Suppose that I take this action. Should we then say that I did it "for personal gain"? That I can only have done it "for personal gain"? This seems clearly foolish. That I happen to have gained from an instrumentally-useful-for-the-world action, does not imply that this motivated me. The same applies if I only think this path is the best for the world.
3simeon_c6mo
I think it still makes sense to have a heuristic of the form "I should have a particularly high bar of confidence If I do something deontologically bad that happens to be good for me personally"
2Joe_Collman6mo
Agreed - though I wouldn't want to trust that heuristic alone in this area, since in practice the condition won't be [if I do something deontologically bad] but rather something like [if I notice that I'm doing something that I'm inclined to classify as deontologically bad].
[-]evhub6moΩ153918

I'm happy to state on the record that, if I had a magic button that I could press that would stop all AGI progress for 50 years, I would absolutely press that button. I don't agree with the idea that it's super important to trot everyone out and get them to say that publicly, but I'm happy to say it for myself.

I would like to observe to onlookers that you did in fact say something similar in your post on RSPs. Your very first sentence was: 

Recently, there’s been a lot of discussion and advocacy around AI pauses—which, to be clear, I think is great: pause advocacy pushes in the right direction and works to build a good base of public support for x-risk-relevant regulation.

3Nathaniel Monson6mo
If I had clear lines in my mind between AGI capabilities progress, AGI alignment progress, and narrow AI progress, I would be 100% with you on stopping AGI capabilities. As it is, though, I don't know how to count things. Is "understanding why neural net training behaves as it does" good or bad? (SLT's goal). Is "determining the necessary structures of intelligence for a given architecture" good or bad? (Some strands of mech interp). Is an LLM narrow or general? How do you tell, or at least approximate? (These are genuine questions, not rhetorical)
[-]Richard_Ngo6moΩ13240

How do you feel about "In an ideal world, we'd stop all AI progress"? Or "ideally, we'd stop all AI progress"?

-1Ricardo Meneghin6mo
My interpretation of calling something "ideal" is that it presents that thing as unachievable from the start, and it wouldn't be your fault if you failed to achieve that, whereas "in a sane world" clearly describes our current behavior as bad and possible to change.

We should shut it all down.

We can't shut it all down.

The consequences of trying to shut it all down and failing, as we very likely would, could actually raise the odds of human extinction.

Therefore we don't know what to publicly advocate for.

These are the beliefs I hear expressed by most serious AI safety people. They are consistent and honest.

For instance, see https://forum.effectivealtruism.org/posts/JYEAL8g7ArqGoTaX6/ai-pause-will-likely-backfire.

That post makes two good points:

A pause would: 2) Increasing the chance of a “fast takeoff” in which one or a handful of AIs rapidly and discontinuously become more capable, concentrating immense power in their hands. 3) Pushing capabilities research underground, and to countries with looser regulations and safety requirements.

Obviously these don't apply to a permanent, complete shutdown. And they're not entirely convincing even for a pause.

My point is that the issue is complicated.

A complete shutdown seems impossible to maintain for all of humanity. Someone is going to build AGI. The question is who and how.

The call for more honesty is appreciated. We should be honest, and include "obviously we should just not do it". But you don't get many words when speaking publicly, so making those your primary point is a questionable strategy.

1William the Kiwi 6mo
Why do you personally think this is correct? Is it that humanity is unknowing of how to shut it down? Or uncapable? Or unwilling?
2Seth Herd6mo
This is a good question. It's worth examining the assumption if it's the basis of our whole plan. When I say "we", I mean "the currently listening audience", roughly the AI safety community. We don't have the power to convince humanity to shut down AI research. There are a few reasons I think this. The primary one is that humanity isn't a single individual. People have different perspectives. Some will not not be likely to change their minds. There are even some individuals for whom building an AGI would actually be a good idea. Those are people who care more about personal gain than they do about the safety or future of humanity. Sociopaths of one sort or another are thought to make up perhaps 10% of the population (the 1% diagnosed are the ones who get caught). For a sociopath, it's a good bet to risk the future of humanity against a chance of becoming the most powerful person alive. There are thought to be a lot of sociopaths in government, even in democratic countries. So, sooner or later, you're going to see a government or rich individual working on AGI with the vastly improved compute and algorithmic resources that continued advances in hardware and software will bring. The only way to enforce a permanent ban would be to ban computers, or have a global panopticon that monitors what every computer is doing. That might well lead to a repressive global regime that stays in power permanently. That is an S-risk; a scenario in which humanity suffers forever. That's arguably worse than dying in an attempt to achieve AGI. Those are weak and loose arguments, but I think that describes the core of my and probably many others' thinking on the topic.
1rotatingpaguro6mo
I am under the impression that, when counting words in public for strategic political reasons, it's better to be a crazy mogul that shouts extreme takes with confidence, to make your positions clear, even if people already know they can't take your word to the letter. But I'm not sure I know who's the strategic target here.

hiding your beliefs, in ways that predictably leads people to believe false things, is lying. This is the case regardless of your intentions, and regardless of how it feels.

I think people generally lie WAY more than we realize and most lies are lies of omission. I don't think deception is usually the immediate motivation but due to a kind of social convenience. Maintaining social equilibrium is valued over openness or honesty regarding relevant beliefs that may come up in everyday life.

2William the Kiwi 6mo
I would agree that people lie way more than they realise. Many of these lies are self-deception.

ARC & Open Philanthropy state in a press release “In a sane world, all AGI progress should stop. If we don’t, there’s more than a 10% chance we will all die.”

Could you spell out what you mean by "in a sane world"? I suspect a bunch of people you disagree with do not favor a pause due to various empirical facts about the world (e.g., there being competitors like Meta).

hiding your beliefs, in ways that predictably leads people to believe false things, is lying

I think this has got to be tempered by Grice to be accurate. Like, if I don't bring up some unusual fact about my life in a brief conversation (e.g. that I consume iron supplements once a week), this predictably leads people to believe something false about my life (that I do not consume iron supplements once a week), but is not reasonably understood as the bad type of lie - otherwise to be an honest person I'd have to tell everyone tons of minutiae about myself ... (read more)

[-]Dagon6mo7-5

Upvoted, and thanks for writing this.  I disagree on multiple dimensions - on the object level, I don't think ANY research topic can be stopped for very long, and I don't think AI specifically gets much safer with any achievable finite pause, compared to a slowdown and standard of care for roughly the same duration.  On the strategy level, I wonder what other topics you'd use as support for your thesis (if you feel extreme measures are correct, advocate for them).  US Gun Control?  Drug legalization or enforcement?  Private capital... (read more)

Counterpoint: we are better off using what political/social capital we have to advocate for more public funding in AI alignment. I think of slowing down AI capabilities research as just a means of buying time to get more AI alignment funding - but essentially useless unless combined with a strong effort to get money into AI alignment.

[-]Max H6mo5-1

Hmm, I'm in favor of an immediate stop (and of people being more honest about their beliefs) but in my experience the lying / hiding frame doesn't actually describe many people.

This is maybe even harsher than what you said in some ways, but to me it feels more like even very bright alignment researchers are often confused and getting caught in shell games with alignment, postulating that we'll be able to build "human level" AI, which somehow just doesn't do a bunch of bad things that smart humans are clearly capable of. And if even the most technical peopl... (read more)

People who think that it's deontologically fine to remain silent might not come out and say it.

Consider what happens when a community rewards the people who gain more influence by lying!

This is widely considered a better form of government than hereditary aristocracies.

let us be clear: hiding your beliefs, in ways that predictably leads people to believe false things, is lying. This is the case regardless of your intentions, and regardless of how it feels.

Not only is it morally wrong, it makes for a terrible strategy. As it stands, the AI Safety Community itself can not coordinate to state that we should stop AGI progress right now!

Some dynamics and gears in world models are protected secrets, when they should be open-sourced and researched by more people, and other gears are open-sourced and researched by too many peopl... (read more)

I don't see the practical value of a post that starts off with conjecture rather than reality; i.e., "In a saner world...." 

You clearly wish that things were different, that investors and corporate executives would simply stop all progress until ironclad safety mechanisms were in place, but wishing doesn't make it so. 

Isn't the more pressing problem what can be done in the world that we have, rather than in a world that we wish we had? 

I agree with others to a large degree about the framing/tone/specific-words not being great, though I agree with a lot the post itself, but really that's what this whole post is about: that dressing up your words and saying partial in-the-middle positions can harm the environment of discussion. That saying what you truly believe then lets you argue down from that, rather than doing the arguing down against yourself - and implicitly against all the other people who hold a similar ideal belief as you. I've noticed similar facets of what the post gestures at,... (read more)

Too many claimed to pursue the following approach:

  1. It would be great if AGI progress stopped, but that is infeasible.
  2. Therefore, I will advocate for what I think is feasible, even if it is not ideal. 
  3. The Overton window being what it is, if I claim a belief that is too extreme, or endorse an infeasible policy proposal, people will take me less seriously on the feasible stuff.
  4. Given this, I will be tactical in what I say, even though I will avoid stating outright lies.

I think if applied strictly to people identified by this list, the post is reasonable. I ... (read more)

I think politics often involves bidding for the compromise you think is feasible, rather than what you'ld ideally want.

 

whats maybe different in the AI risk case, and others like it, is how you'll be regarded when things go wrong.

 

hypothetical scenario

  1. An AI does something destructive, on the order of 9/11
  2. Government over-reacts as usual, and cracks down on the AI companies like the US did on Osama bin Laden, or Israel did on Hamas.
  3. you are like, yeah we knew that was going to happen
  4. governmet to you, what the fuck? Why didn't you tell us?