A couple of quick, loosely-related thoughts:
- I think that "doomers" were far too pessimistic about governance before ChatGPT (in ways that I and others predicted beforehand, e.g. in discussions with Ben and Eliezer). I think they should update harder from this mistake than they're currently doing (e.g. updating that they're too biased towards inside-view models and/or fast takeoff and/or high P(doom)).
I think it remains to be seen what the right level of pessimism was. It still seems pretty likely that we'll see not just useless, but actively catastrophically counter-productive interventions from governments in the next handful of years.
But you're absolutely right that I was generally pessimistic about policy interventions from 2018ish through to 2021 or so. 
My main objection was that I wasn't aware of any policies that seemed like they helped and I was unenthusiastic about the way that EAs seemed to be optimistic about getting into positions of power without (seeming to me) to be very clear-to-themselves that they didn't have policy ideas to implement. 
I felt better about people going into policy to the extent that those people had clarity for themselves, "I don't know what to recommend if I have power. I'm trying to execute one part of a two part plan that involves getting power and then using that to advocate for x-risk mitigating policies. I'm intentionally punting that question to my future self / hoping that other EAs thinking full time about this come up with good ideas." I think I still basically stand by this take. [1]
My main update is it turns out that the basic idea of this post was false. There were developments that were more alarming than "this is business as usual" to a good number of people and that really changed the landscape. 
One procedural update that I've made from that and similar mistakes is just "I shouldn't put as much trust in Eliezer's rhetoric about how the world works, when it isn't backed up by clearly articulated models. I should treat those ideas a plausible hypotheses, and mostly be much more attentive to evidence that I can see directly."  
 
Also, I think that this is one instance of the general EA failure mode of pursuing a plan which entails accruing more resources for EA (community building to bring in more people, marketing to bring in more money, politics to acquire power), without a clear personal inside view of what to do with those resources, effectively putting a ton of trust in the EA network to reach correct conclusions about which things help.
There are a bunch of people trusting the EA machine to 1) aim for good things and 2) have good epistemics. They trust it so much they'll go campaign for a guy running for political office without knowing much about him, except that he's an EA. Or they route their plan for positive impact on the world through positively impacting EA itself ("I want to do mental health coaching for EAs" or "I want to build tools for EAs" or going to do ops for this AI org, which 80k recommended (despite not knowing much about what they do).)
This is pretty scary, because it seems like a some of those people were not worthy of trust (SBF in particular, won a huge amount of veneration). 
And even in the cases the people who are, I believe, earnest geniuses, it is still pretty dangerous to mostly be deferring to them. Paul put a good deal of thought into the impacts of developing RLHF, and he thinks the overall impacts are positive. But that Paul is smart and good does not make it a foregone conclusion that his work is good not net. That's a really hard question to answer, about which I think most people should be pretty circumspect. 
It seems to me that there is an army of earnest young people who want to do the most good that they can. They've been told (and believe) the AI risk is the most important problem, but it's a confusing problem depending on technical expertise, famously fraught problems of forecasting the character of not-yet-existent technologies, and a bunch of weird philosophy. The vast majority of those young people don't know how to make progress on the core problems of AI risk directly, or even necessarily identify which work is making progress. But they still want to help, so they commit themselves to eg community building, getting more people to join, everyone taking social cues from the few people that seem to have personal traction on the problem about what kinds of object level things are good to do. 
This seems concerning to me. This kind of structure where a bunch of smart young people are building a pile of resources to be controlled mostly by deference to a status hierarchy, where you figure out which thinkers are cool by picking up on the social cues of who is regarded as cool, rather than evaluating their work for yourself...well, it's not so much that I expect it to be coopted, but I just don't expect that overall agglomerated machine to be particularly steered towards the good, whatever values it professes. 
It doesn't have a structure that binds it particularly tightly to what's true. Better than most non-profit communities, worse than many for-profit companies, probably.
It seems more concerning to the extent that many of the object level actions to which the EAs are funneling resources are not just useless, but actively bad. It turns out that being smart enough, as a community, to identify the most important problem in the world, but not smart enough to systematically know how to positively impact that problem is pretty dangerous.
eg the core impacts of people trying to impact x-risk so far includ
- (Maybe? Partially?) causing Deepmind to exist
- (Definitely) causing OpenAI to exist
- (Definitely) causing Anthropic to exist
- Inventing RLHF and accelerating the development of RLHF'd language models
It's pretty unclear to me what the sign of these interventions are. They seem bad on the face of it, but as I've watched things develop I'm not as sure. It depends on pretty complicated questions about second and third order effects, and counterfactuals.
But it seems bad to have an army of earnest young people who, in the name of their do-gooding ideology, shovel resources at the decentralized machine doing these maybe good maybe bad activities, because they're picking up on social cues of who to defer to and what those people think! That doesn't seem very high EV for the world!
(To be clear, I was one of the army of earnest young people. I spent a number of years helping recruit for a secret research program—I didn't even have the most basic information, much less the expertise to assess if it was any good—because I was taking my cues from Anna, who was taking her cues from Eliezer. 
I did that out of a combination of 1) having read Eliezer's philosophy, and having enough philosophical grounding to be really impressed by it, and 2) being ready and willing to buy into a heroic narrative to save the world, which these people were (earnestly) offering me.)
And, procedurally, all this is made somewhat more perverse, by the fact that that this community, this movement, was branded as the "carefully think through our do gooding" movement. We raised the flag of "let's do careful research and cost benefit analysis to guide our charity", but over time this collapsed into a deferral network, with ideas about what's good to do driven mostly by the status hierarchy. Cruel irony.
 
Well said. I agree with all of these except the last one and the gradual model release one (I think the update should be that letting the public interact with models is great, but whether to do it gradually or in a 'lumpy' way is unclear. E.g. arguably ChatGPT3.5 should have been delayed until 2023 alongside GPT4. That would have pushed back the acceleration of e.g. GDM a few more months, without (IMO) any harm to public wake-up.)
I especially want to reemphasize your point 2.
E.g. arguably ChatGPT3.5 should have been delayed until 2023 alongside GPT4. That would have pushed back the acceleration of e.g. GDM a few more months, without (IMO) any harm to public wake-up.)
That would have pushed back public wakeup equally though, because it was ChatGPT3.5 that caused the wakeup.
Did anyone at OpenAI explicitly say that a factor in their release cadence was getting the public to wake up about the pace of AI research and start demanding regulation? Because this seems more like a post hoc rationalization for the release policy than like an actual intended outcome.
See Sam Altman here:
As we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally.
A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low.
And Sam has been pretty vocal in pushing for regulation in general.
"One of the biggest conspiracies of the last decade" doesn't seem right. The amount of money/influence involved in FTX is dwarfed by the amount of money/influence thrown around by governments in general, and it's easier for factions within governments to enforce secrecy than for corporations to do so. More concretely, I'd say that there were probably several different "conspiratorial" things related to covid in various countries that had much bigger effects; probably several more related to ongoing Russia-Ukraine and Israel-Palestine conflicts; probably several more Trump/Biden-related things; maybe some to do with culture-war stuff; probably a few more prosaic fraud or corruption things that stole tens of billions of dollars, just less publicly (e.g. from big government contracts); a bunch of criminal gangs which also have far more money than FTX did; and almost certainly a bunch that don't fall into any of those categories. (For example, if the CIA is currently doing any stuff comparable to its historical record of messing around with South American countries, that's plausibly far bigger than FTX. Or various NSA surveillance type things are likely a much bigger deal, in terms of impact, than FTX. Oh, and stuff like NotPetya should probably count too.)
There are few programs even within the U.S. government that are larger than $10B without very extensive reporting requirements and where it's quite hard for them to be conspiratorial in the relevant ways (they might be ineffective, or the result of various bad equilibria, but I don't think you regularly get conspiracies at this scale).
To calibrate people here, the total budget of the NSA appears to be just around $10B/yr, making it so that even if you classify the whole thing as a conspiracy, at least in terms of expenditure it's still roughly the size of the FTX fraud (though I more like 10x larger if you count it over the whole last decade) .
To be clear, there is all kinds of stuff going on in the world that is bad, but in terms of things that are as clearly some kind of criminal or high-stakes government conspiracy, I think FTX stands among the biggest ones (though I totally agree there are probably other ones, though by nature its hard for me to say how many).
(In any case, I changed the word "conspiracy" here to "fraud" which I think gets the same point across, and my guess is we all agree that FTX is among the biggest frauds of the last decade)
There are over 100 companies globally with a market cap of more than 100 billion. If we're indexing on the $10 billion figure, these companies could have a bigger financial impact by doing "conspiracy-type" things that swung their value by <10%. How many of them have actually done that? No idea, but "dozens" doesn't seem implausible (especially when we note that many of them are based in authoritarian countries).
Re NSA: measuring the impact of the NSA in terms of inputs is misleading. The problem is that they're doing very highly-leveraged things like inserting backdoors into software, etc. That's true of politics more generally. It's very easy for politicians to insert clauses into bills that have >$10 billion of impact. How often are the negotiations leading up to that "conspiratorial"? Again, very hard to know.
in terms of things that are as clearly some kind of criminal or high-stakes government conspiracy, I think FTX stands among the biggest ones
This genuinely seems bizarre to me. A quick quote I found from googling:
The United Nations estimated in a 2011 report that worldwide proceeds from drug trafficking and other transnational organized crime were equivalent to 1.5 percent of global GDP, or $870 billion in 2009.
That's something like 100 FTXs per year; we mostly just don't see them. Basically I think that you're conflating legibility with impact. I agree FTX is one of the most legible ways in which people were defrauded this century; I also think it's a tiny blip on the scale of the world as a whole. (Of course, that doesn't make it okay by any means; it was clearly a big fuck-up, there's a lot we can and should learn from it, and a lot of people who were hurt.)
Does sure seem like there are definitional issues here. I do agree that drug trade and similar things bring the economic effects of conspiracy-type things up a lot, and I hadn't considered those, and agree that if you count things in that reference class FTX is a tiny blip.
I think given that, I basically agree with you that FTX isn't that close to one of the biggest conspiracies of the last decade. I do think it's at the top of frauds in the last decade, though that's a narrower category.
I do think it's at the top of frauds in the last decade, though that's a narrower category.
Nikola went from a peak market cap of $66B to ~$1B today, vs. FTX which went from ~$32B to [some unknown but non-negative number].
I also think the Forex scandal counts as bigger (as one reference point, banks paid >$10B in fines), although I'm not exactly sure how one should define the "size" of fraud.[1]
I wouldn't be surprised if there's some precise category in which FTX is the top, but my guess is that you have to define that category fairly precisely.
Wikipedia says "the monetary losses caused by manipulation of the forex market were estimated to represent $11.5 billion per year for Britain’s 20.7 million pension holders alone" which, if anywhere close to true, would make this way bigger than FTX, but I think the methodology behind that number is just guessing that market manipulation made foreign-exchange x% less efficient, and then multiplying through by x%, which isn't a terrible methodology but also isn't super rigorous.
I wasn't intending to say "the literal biggest", though I think it's a decent candidate for the literal biggest. Depending on your definitions I agree things like Nikola or Forex could come out on top. I think it's hard to define things in a way so that it isn't in the top 5.
I think the heuristic "people take AI risk seriously in proportion to how seriously they take AGI" is a very good one.
Agree. Most people will naturally buy AGI Safety if they really believe in AGI. No AGI->AGI is the hard part, not AGI->AGI Safety.
- I think that the AI safety community in general (including myself) was too pessimistic about OpenAI's strategy of gradually releasing models (COI: I work at OpenAI), and should update more on that mistake.
I agree with this!
I thought it was obviously dumb, and in retrospect, I don't know.
In the interest of saying more things publicly on this, some relevant thoughts:
In particular, I think their usage of Dario's statements on x-risk as a rhetorical weapon against RSPs creates a structural disincentive against lab heads being clear about existential risk
I’m not sure how to articulate this, exactly, but I want to say something like “it’s not on us to make sure the incentives line up so that lab heads state their true beliefs about the amount of risk they’re putting the entire world in.” Stating their beliefs is just something they should be doing, on a matter this important, no matter the consequences. That’s on them. The counterfactual world—where they keep quiet or are unclear in order to hide their true (and alarming) beliefs about the harm they might impose on everyone—is deceptive. And it is indeed pretty unfortunate that the people who are most clear about this (such as Dario), will get the most pushback. But if people are upset about what they’re saying, then they should still be getting the pushback.
When I was an SRE at Google, we had a motto that I really like, which is: "hope is not a strategy." It would be nice if all the lab heads would be perfectly honest here, but just hoping for that to happen is not an actual strategy.
Furthermore, I would say that I see the main goal of outside-game advocacy work as setting up external incentives in such a way that pushes labs to good things rather than bad things. Either through explicit regulation or implicit pressure, I think controlling the incentives is absolutely critical and the main lever that you have externally for controlling the actions of large companies.
I don't think aysja was endorsing "hope" as a strategy– at least, that's not how I read it. I read it as "we should hold leaders accountable and make it clear that we think it's important for people to state their true beliefs about important matters."
To be clear, I think it's reasonable for people to discuss the pros and cons of various advocacy tactics, and I think asking "to what extent do I expect X advocacy tactic will affect peoples' incentives to openly state their beliefs?" makes sense.
Separately, though, I think the "accountability frame" is important. Accountability can involve putting pressure on them to express their true beliefs, pushing back when we suspect people are trying to find excuses to hide their beliefs, and making it clear that we think openness and honesty are important virtues even when they might provoke criticism– perhaps especially when they might provoke criticism. I think this is especially important in the case of lab leaders and others who have clear financial interests or power interests in the current AGI development ecosystem.
It's not about hoping that people are honest– it's about upholding standards of honesty, and recognizing that we have some ability to hold people accountable if we suspect that they're not being honest.
I would say that I see the main goal of outside-game advocacy work as setting up external incentives in such a way that pushes labs to good things rather than bad things
I'm currently most excited about outside-game advocacy that tries to get governments to implement regulations that make good things happen. I think this technically falls under the umbrella of "controlling the incentives through explicit regulation", but I think it's sufficiently different from outside-game advocacy work that is trying to get labs to do things voluntarily.
I think their usage of Dario's statements on x-risk as a rhetorical weapon against RSPs creates a structural disincentive against lab heads being clear about existential risk and reduces the probability of us getting good RSPs from other labs and good RSP-based regulation.
Setting aside my personal models of Connor/Gabe/etc, the only way this action reads to me as making sense if one feels compelled to go all in on "so-called Responsible Scaling Policies are primarily a fig leaf of responsibility from ML labs, as the only viable responsible option is to regulate them / shut them down". I assign at least 10% to that perspective being accurate, so I am not personally ruling it out as a fine tactic.
I agree it is otherwise disincentivizing[1] in worlds where open discussion and publication of scaling policies (even I cannot bring myself to calling them 'responsible') is quite reasonable.
Probably Evan/others agree with this, but I want to explicitly point out that the CEOs of the labs such as Amodei and Altman and Hassabis should answer the question honestly regardless of how it's used by those they're in conflict with, the matter is too important for it to be forgivable that they would otherwise be strategically avoidant in order to prop up their businesses.
There are all kinds of benefits to acting with good faith, and people should not feel licensed to abandon good faith dialogue just because they're SUPER confident and this issue is REALLY IMPORTANT. 
When something is really serious it becomes even more important to do boring +EV things like "remember that you can be wrong sometimes" and "don't take people's quotes out of context, misrepresent their position, and run smear campaigns on them; and definitely don't make that your primary contribution to the conversation".
Like, for Connor & people who support him (not saying this is you Ben): don't you think it's a little bit suspicious that you ended up in a place where you concluded that the very best use of your time in helping with AI risk was tweet-dunking and infighting among the AI safety community? 
Like, I am quite worried that we will end up with some McCarthy-esque immune reaction to EA people in the US and the UK government where people will be like "wait, what the fuck, how did it happen that this weirdly intense social group with strong shared ideology is now suddenly having such an enormous amount of power in government? Wow, I need to kill this thing with fire, because I don't even know how to track where it is, or who is involved, so paranoia is really the only option".
This is looking increasingly prescient.
[Edit to add context]
Not saying this is happening now, but after the board decisions at OpenAI, I could imagine more people taking notice. Hopefully the sentiment then will just be open discourse and acknowledging that there's now this interesting ideology besides partisan politics and other kinds of lobbying/influence-seeking that are already commonplace. But to get there, I think it's plausible that EA has some communications- and maybe trust-building work to do.
Just for the record, if the current board thing turns out to be something like a play of power from EAs in AI Safety trying to end up more in control (by e.g. planning to facilitate a merger or a much closer collaboration with Anthropic), and the accusations of lying to the board turn out to be a nothing-burger, then I would consider this a very central example of the kind of political play I was worried would happen (and indeed involved Helen who is one of the top EA DC people).
Correspondingly I assign decently high (20-25%) probability to that indeed being what happened, in which case I would really like the people involved to be held accountable (and for us to please stop the current set of strategies that people are running that give rise to this kind of thing).
As you'd probably agree with, it's plausible that Sutskever was able to convince the board about specific concerns based on his understanding of the technology (risk levels and timelines) or his day-to-day experience at OpenAI and direct interactions with Sam Altman. If that's what happened, then it wouldn't be fair [to say that] any EA-minded board members just acted in an ideology-driven way. (Worth pointing out for people who don't know this that Sutskever has no ties to EA; it just seems like he shares concerns about the dangers from AI.)
But let's assume that it comes out that EA board members played a really significant role or were even thinking about something like this before Sutskever brought up concerns. "Play of power" evokes connotations of opportunism and there being no legitimacy for the decision other than that the board thought they could get away with it. This sort of concern you're describing would worry me a whole lot more if OpenAI had a typical board and corporate structure.
However, since they have a legal structure and mission that emphasizes benefitting humanity as a whole and not shareholders, I'd say situations like the one here are (in theory) exactly why the board was set up that way. The board's primary task is overseeing the CEO. To achieve OpenAI's mission, the CEO needs to have the type of personality and thinking habits so he will likely converge toward whatever the best-informed views are about AI risks (and benefits) and how to mitigate (and actualize) them. The CEO shouldn't be someone who is unlikely to engage in the sort of cognition that one would perform if one cared greatly about long-run outcomes rather than near-term status and took seriously the chance of being wrong about one's AI risk and timeline assumptions. Regardless of what's actually true about Altman, it seems like the board came to a negative conclusion about his suitability. In terms of how they made this update, we can envision some different scenarios, some of them would seem unfair to Altman and "ideology-driven" in a sinister way, while others would seem legitimate. (The following scenarios will take for granted that the thing that happened had elements of a "AI safety coup," as opposed to a "Sutskever coup" or "something else entirely." Again, I'm not saying that any of this is confirmed; I'm just going with the hypothesis where the EA involvement has the most potential for controversy.) So, here are three variants of how the board could have updated that Altman is not suitable for the mission:
(1) The responsible board members (could just be a subset of the ones that voted against Altman rather than all four of them) never gave him much of a chance. They learned that Altman is less concerned about AI notkilleveryoneism than they would've liked, so they took an opportunity to try to oust him. (This is bad because it's ideology-driven rather than truth-seeking.)
(2) The responsible board members did give Altman a chance initially, but he deceived them in a smoking-gun-type breach of trust.
(3) The responsible board members did gave Altman a chance initially, but they became increasingly disillusioned through a more insincere-vibes-based and gradual erosion of trust, perhaps accompanied by disappointments from empty promises/assurances about, e.g., taking safety testing more seriously for future models, avoiding racing dynamics/avoiding giving out too much info on how to speed up AI through commercialization/rollouts, etc. (I'm only speculating here with the examples I'm giving, but the point is that if the board is unusually active about looking into stuff, it's conceivable that they maybe-justifiably reached this sort of update even without any smoking-gun-type breach of trust.) 
Needless to say, (1) would be very bad board behavior and would put EA in a bad light. (2) would be standard stuff about what boards are there for, but seems somewhat unlikely to have happened here based on the board not being able to easily give more info to the public about what Altman did wrong (as well as the impression I get that they don't hold much leverage in the negotations now). (3) seems most likely to me and also quite complex to make judgments about the specifics, because lots of things can fall into (3). (3) requires an unusually "active/observant" board. This isn't necessarily bad. I basically want to flag that I see lots of (3)-type scenarios where the board acted with integrity and courage, but also (admittedly) probably displayed some inexperience by not preparing for the power struggle that results after a decision like this, and by (possibly?) massively mishandling communications, using wording that may perfectly describe what happened when the description is taken literally, but is very misleading when we apply the norms about how parting ways announcements are normally written in very tactful corporate speak. (See also Eliezer's comment here.) Alternatively, it's also possible that a (3)-type scenario happened, but the specific incremental updates were uncharitable towards Altman due to being tempted by "staging a coup," or stuff like that. It gets messy when you have to evaluate someone's leadership fit where they have a bunch of uncontested talents but also some orange flags and you have decide what sort of strengths or weaknesses are most essential for the mission.
For me the key variable is whether they took a decision that would have put someone substantially socially closer to them in charge, with some veneer of safety motivation, but where the ultimate variance in their decision would counterfactually be driven by social proximity and pre-existing alliances.
A concrete instance of this would be if the plan with the firing was to facilitate some merge with Anthropic, or to promote someone like Dario to the new CEO position, who the board members (which were chosen by Holden) have a much tighter relationship to.
My current model is that Holden chose them. Tasha in 2018, Helen in 2021 when he left and chose Helen as his successor board member.
I don't know, but I think it was close to a unilateral decision from his side (like I don't think anyone at Open AI had much reason to trust Helen outside of Holden's endorsement, so my guess is he had a lot of leeway).
Thanks! And why did Holden have the ability to choose board members (and be on the board in the first place)?
I remember hearing that this was in exchange for OP investment into OpenAI, but I also remember Dustin claiming that OpenAI didn’t actually need any OP money (would’ve just gotten the money easily from another investor).
Is your model essentially that the OpenAI folks just got along with Holden and thought he/OP were reasonable, or is there a different reason Holden ended up having so much influence over the board?
My model is that this was a mixture of a reputational trade (OpenAI got to hire a bunch of talent from EA spaces and make themselves look responsible to the world) and just actually financial ($30M was a substantial amount of money at that point in time).
Sam Altman has many times said he quite respects Holden, so that made up a large fraction of the variance. See e.g. this tweet:
(i used to be annoyed at being the villain of the EAs until i met their heroes*, and now i'm lowkey proud of it
*there are a few EA heroes i think are really great, eg Holden)
[...] reputational trade (OpenAI got to hire a bunch of talent from EA spaces and make themselves look responsible to the world) [...]
Yes, I think "reputational trade," i.e., something that's beneficial for both parties, is an important part of the story that the media hasn't really picked up on. EAs were focused on the dangers and benefits from AI way before anyone else, so it carries quite some weight when EA opinion leaders put an implicit seal of approval on the new AI company.
There's a tension between 
(1) previously having held back on natural-seeming criticism of OpenAI ("putting the world at risk for profits" or "they plan on wielding this immense power of building god/single-handedly starting something bigger than the next Industrial Revolution/making all jobs obsolete and solving all major problems") because they have the seal of approval from this public good, non-profit, beneficial-mission-focused board structure, 
and
(2) being outraged when this board structure does something that it was arguably intended to do (at least under some circumstances).
(Of course, the specifics of how and why things happened matter a lot, and maybe most people aren't outraged because the board did something, but rather because of how they did it or based on skepticism about reasons and justifications. On those later points, I sympathize more with people who are outraged or concerned that something didn't go right. But we don't know all the details yet.)
(Of course, the specifics of how and why things happened matter a lot, and maybe most people aren't outraged because the board did something, but rather because of how they did it or based on skepticism about reasons and justifications. On those later points, I sympathize more with people who are outraged or concerned that something didn't go right. But we don't know all the details yet.)
Almost all the outrage I am seeing is about how this firing was conducted. I think if the board had a proper report ready that outlined why they think OpenAI was acting recklessly, and if they had properly consulted with relevant stakeholders before doing this, I think the public reaction would be very different.
I agree there are also some random people on the internet who are angry about the board taking any action even though the company is going well in financial terms, but most of the well-informed and reasonably people I've seen are concerned about the way this was rushed and how the initial post seemed to pretty clearly imply that Sam had done some pretty serious deception, without anything to back that up with.
Okay, that's fair.
FWIW, I think it's likely that they thought about this decision for quite some time and systematically – I mean the initial announcement did mention something about a "deliberative review process by the board." But yeah, we don't get to see any of what they thought about or who (if anyone) they consulted for gathering further evidence or for verifying claims by Sutskever. Unfortunately, we don't know yet. And I concede that given the little info we have, it takes charitable priors to end up with "my view." (I put it in quotation marks because it's not like I have more than 50% confidence in it. Mostly, I want to flag that this view is still very much on the table.) 
Also, on the part about "imply that Sam had done some pretty serious deception, without anything to back that up with." I'm >75% that either Eliezer nailed it in this tweet, or they actually have evidence about something pretty serious but decided not to disclose it for reasons that have to do with the nature of the thing that happened. (I guess the third option is they self-deceived into thinking their reasons to fire Altman will seem serious/compelling [or at least defensible] to everyone to whom they give more info, when in fact the reasoning is more subtle/subjective/depends on additional assumptions that many others wouldn't share. This could then have become apparent to them when they had to explain their reasoning to OpenAI staff later on, and they aborted the attempt in the middle of it when they noticed it wasn't hitting well, leaving the other party confused. I don't think that would necessarily imply anything bad about the board members' character, though it is worth noting that if someone self-deceives in that way too strongly or too often, it makes for a common malefactor pattern, and obviously it wouldn't reflect well on their judgment in this specific instance. One reason I consider this hypothesis less likely than the others is because it's rare for several people – the four board members – to all make the same mistake about whether their reasoning will seem compelling to others, and for none of them to realize that it's better to err on the side of caution and instead say something like "we noticed we have strong differences in vision with Sam Altman," or something like that.)
My current model is that this is unlikely to have been planned long in-advance. For example, for unrelated reasons I was planning to have a call with Helen last week, and she proposed a meeting time of last Thursday (when I responded with my availability for Thursday early in the week, she did not respond). She did then not actually schedule the final meeting time and didn't respond to my last email, but this makes me think that at least early in the week, she did not expect to be busy on Thursday.
There are also some other people who I feel like I would expect to know about this if it had been planned who have been expressing their confusion and bafflement at what is going on on Twitter and various Slacks I am in. I think if this was planned, it was planned as a background thing, and then came to a head suddenly, with maybe 1-2 days notice, but it doesn't seem like more.
Adversarialness, honesty, attribution; re: how to talk in DC
I love what y'all said about this, found it a pleasure to read, and want to share some of my own thoughts, some echoing things you already said.
Let's not treat society as an adversary, rather let's be collaborators/allies and even leaders, helping and improving society and its truth-seeking processes. That doesn't mean we shouldn't have any private thoughts or plans. It does mean society gets to know who we are, who's behind what, and what we're generally up to and aiming for. Hiding attribution and intentions is IMO a way of playing into the adversarial/polarized/worst parts of our society's way of being and doing, and I agree with Oliver that doing so will likely come back to undermine us and what we care about. If we act like a victim/adversary wrt society, it won't work, including because society will see us that way. Let's instead meet society with the respect we want to see in the world, and ask it to step up and do the same for us. Let's pursue plans and intentions that we're happy standing in and being seen in.
I have only very limited experienced in DC type conversations, but my sense is there are ways of sharing your real thing, while being cooperative, which likely don't lead to dismissal and robustly don't lead to poisoning the well. Here's perhaps the start of one, which could be made more robust with some workshopping: 1) share your polaris (existential stakes) in a way that they could feasibly understand given where they're at; 2) share your proposals and how you see those as aligned with other near-term AI considerations like the ones they might have; 3) actually listen to and respect the opinions of the people you're talking with, and be willing to go into their frame, remembering that it's not your job to convince or persuade them. (Thinking it's your job to convince or persuade them is probably the main/upstream mistake folks make?) Those things seem to me to likely belong in nearly every conversation. Should you include your "mood" that things in fact are very dire? There's not a strategic/correct answer to this, because strategy/correctness is not what mood is for/about. Share your truth in a way that you feel serves mutual understanding. Share your feelings in a way that you feel serves mutual relating.
I also notice that I am just afraid of what would happen if I were to e.g. write a post that's just like "an overview over the EA-ish/X-risk-ish policy landscape" that names specific people and explains various historical plans. Like I expect it would make me a lot of enemies.
This seems like a bad idea.
Transparency is important, but ideally, we would find ways to increase this without blowing up a bunch of trust within the community. I guess I'd question whether this is really the bottleneck in terms of transparency/public trust.
I'm worried that as a response to FTX we might end up turning this into a much more adversarial space.
And then one of my current stories is that at some point, mostly after FTX when people were fed up with listening to some vague EA conservative consensus, a bunch of people started ignoring that advice and finally started saying things publicly (like the FLI letter, Eliezer's time piece, the CAIS letter, Ian Hogarth's piece). And then that's the thing that's actually been moving things in the policy space.
My impression is that this was driven by developments in AI, which created enough of a sense that other people might predict that other people would take concern seriously, because they could all just see ChatGPT. And this emboldened people. They had more of a sense of tractability.
And Eliezer, in particular, went on a podcast, and it went better than he anticipated, so he decided to do more outreach.
My impression is that this is basically 0 to do with FTX?
Eliezer and Max and Dan could give their own takes here. My guess is there was a breaking of EA consensus on taking the PR-conservative route, and then these things started happening. I think some of that breaking was due to FTX.
Eliezer started talking about high P(doom) around the Palmcone which I think was more like peak FTX hype. And it seemed like his subsequent comms were part of a trend that began with the MIRI dialogues the year before. I’d bet against FTX collapse being that causal at least for him.
I don't think Death with Dignity or the List O' Doom posts are at all FTX-collapse related. I am talking about things like the Time piece.
Yes, but I’m drawing a line from ‘MIRI dialogues’ through Death With Dignity and modeling Eliezer generally and I think the line just points roughly at the Time piece without FTX.
They often do things of the form "leaving out info, knowing this has misleading effects"
On that, here are a few examples of Conjecture leaving out info in what I think is a misleading way.
(Context: Control AI is an advocacy group, launched and run by Conjecture folks, that is opposing RSPs. I do not want to discuss the substance of Control AI’s arguments -- nor whether RSPs are in fact good or bad, on which question I don’t have a settled view -- but rather what I see as somewhat deceptive rhetoric.)
One, Control AI’s X account features a banner image with a picture of Dario Amodei (“CEO of Anthropic, $2.8 billion raised”) saying, “There’s a one in four chance AI causes human extinction.” That is misleading. What Dario Amodei has said is, “My chance that something goes really quite catastrophically wrong on the scale of human civilisation might be somewhere between 10-25%.” I understand that it is hard to communicate uncertainty in advocacy, but I think it would at least have been more virtuous to use the middle of that range (“one in six chance”), and to refer to “global catastrophe” or something rather than “human extinction”.
Two, Control AI writes that RSPs like Anthropic’s “contain wording allowing companies to opt-out of any safety agreements if they deem that another AI company may beat them in their race to create godlike AI”. I think that, too, is misleading. The closest thing Anthropic’s RSP says is:
However, in a situation of extreme emergency, such as when a clearly bad actor (such as a rogue state) is scaling in so reckless a manner that it is likely to lead to imminent global catastrophe if not stopped (and where AI itself is helpful in such defense), we could envisage a substantial loosening of these restrictions as an emergency response. Such action would only be taken in consultation with governmental authorities, and the compelling case for it would be presented publicly to the extent possible.
Anthropic’s RSP is clearly only meant to permit labs to opt out when any other outcome very likely leads to doom, and for this to be coordinated with the government, with at least some degree of transparency. The scenario is not “DeepMind is beating us to AGI, so we can unilaterally set aside our RSP”, but more like “North Korea is beating us to AGI, so we must cooperatively set aside our RSP”.
Relatedly, Control AI writes that, with RSPs, companies “can decide freely at what point they might be falling behind – and then they alone can choose to ignore the already weak” RSPs. But part of the idea with RSPs is that they are a stepping stone to national or international policy enforced by governments. For example, ARC and Anthropic both explicitly said that they hope RSPs will be turned into standards/regulation prior to the Control AI campaign. (That seems quite plausible to me as a theory of change.) Also, Anthropic commits to only updating its RSP in consultation with its Long-Term Benefit Trust (consisting of five people without any financial interest in Anthropic) -- which may or may not work well, but seems sufficiently different from Anthropic being able to “decide freely” when to ignore its RSP that I think Control AI’s characterisation is misleading. Again, I don't want to discuss the merits of RSPs, I just think Control AI is misrepresenting Anthropic's and others' positions.
Three, Control AI seems to say that Anthropic’s advocacy for RSPs is an instance of safetywashing and regulatory capture. (Connor Leahy: “The primary aim of responsible scaling is to provide a framework which looks like something was done so that politicians can go home and say: ‘We have done something.’ But the actual policy is nothing.” And also: “The AI companies in particular and other organisations around them are trying to capture the summit, lock in a status quo of an unregulated race to disaster.”) I don’t know exactly what Anthropic’s goals are -- I would guess that its leadership is driven by a complex mixture of motivations -- but I doubt it is so clear-cut as Leahy makes it out to be.
To be clear, I think Conjecture has good intentions, and wants the whole AI thing to go well. I am rooting for its safety work and looking forward to seeing updates on CoEm. And again, I personally do not have a settled view on whether RSPs like Anthropic’s are in fact good or bad, or on whether it is good or bad to advocate for them – it could well be that RSPs turn out to be toothless, and would displace better policy – I only take issue with the rhetoric.
(Disclosure: Open Philanthropy funds the organisation I work for, though the above represents only my views, not my employer’s.)
I'm surprised to hear they're posting updates about CoEm.
At a conference held by Connor Leahy, I said that I thought it was very unlikely to work, and asked why they were interested in this research area, and he answered that they were not seriously invested in it.
We didn't develop the topic and it was several months ago, so it's possible that 1- I misremember or 2- they changed their minds 3- I appeared adversarial and he didn't feel like debating CoEm. (For example, maybe he actually said that CoEm didn't look promising and this changed recently?)
Still, anecdotal evidence is better than nothing, and I look forward to seeing OliviaJ compile a document to shed some light on it.
Adding a datapoint here: I've been involved in the Control AI campaign, which was run by Andrea Miotti (who also works at Conjecture). Before joining, I had heard some integrity/honesty concerns about Conjecture. So when I decided to join, I decided to be on the lookout for any instances of lying/deception/misleadingness/poor integrity. (Sidenote: At the time, I was also wondering whether Control AI was just essentially a vessel to do Conjecture's bidding. I have updated against this– Control AI reflects Andrea's vision. My impression is that Conjecture folks other than Andrea have basically no influence over what Control AI does, unless they convince Andrea to do something.)
I've been impressed by Andrea's integrity and honesty. I was worried that the campaign might have some sort of "how do we win, even if it misleads people" vibe (in which case I would've protested or left), but there was constantly a strong sense of "are we saying things that are true? Are we saying things that we actually believe? Are we communicating clearly?" I was especially impressed given the high volume of content (it is especially hard to avoid saying untrue/misleading things when you are putting out a lot of content at a fast pace.)
In contrast, integrity/honesty/openness norms feel much less strong in DC. When I was in DC, I think it was fairly common to see people "withhold information for strategic purposes", "present a misleading frame (intentionally)", "focus on saying things you think the other person will want to hear", or "decide not to talk at all because sharing beliefs in general could be bad." It's plausible to me that these are "the highest EV move" in some cases, but if we're focusing on honesty/integrity/openness, I think DC scored much worse. (See also Olivia's missing mood point).
The Bay Area scores well on honesty/integrity IMO, but has its own problems, especially with groupthink/conformity/curiosity-killing. I think the Bay Area tends to do well on honesty/integrity norms (relative to other spaces), but I think these norms are enforced in a way that comes with important tradeoffs. For instance, I think the Bay Area tends to punish people for saying things that are imprecise or "seem dumb", which leads to a lot of groupthink/conformity and a lot of "people just withholding their beliefs so that they don't accidentally say something incorrect and get judged for it." Also, I think high-status people in the Bay Area are often able to "get away with" low openness/transparency/clarity. There are lots of cases where people are like "I believe X because Paul believes X" and then when asked "why does Paul believe X" they're like "idk". This seems like an axis separate from honesty/integrity, but it still leads to pretty icky epistemic discourse.
(This isn't to say that people shouldn't criticize Conjecture– but I think there's a sad thing that happens where it starts to feel like both "sides" are just trying to criticize each other. My current position is much closer to something like "each of these communities has some relative strengths and weaknesses, and each of them has at least 1-3 critical flaws". Whereas in the status quo I think these discussions sometimes end up feeling like members of tribe A calling tribe B low integrity and then tribe B firing back by saying Tribe A is actually low integrity in an even worse way.)
So, I'm often tempted to mention my x risk motivations only briefly, then focus on whatever's inferentially closest and still true. (Classically, this would be "misuse risks, especially from foreign adversaries and terrorists" and "bioweapon and cyberoffensive capabilities coming in the next few years".)
One heuristic that I'm tempted to adopt and recommend is the onion test: your communications don't have to emphasize your weird beliefs, but you want your communications to satisfy the criterion that if your interlocutor became aware of everything you think, they would not be surprised.
This means that I'll when I'm talking with a potential ally, I'll often mostly focus on places where we agree, while also being intentional about flagging that I have disagreements that they could double click on if they wanted.
I'm curious if your sense, Olivia, is that your communications (including the brief communication of x risk) passes the onion test. 
And if not, I'm curious what's hard about meeting that standard. Is this a heuristic that can be made viable in the contexts of eg DC?
Interesting discussion!
Probably somewhat controversially, but I've been kind of happy about the Politico pieces that have been published. We had two that basically tried to make the case there is an EA conspiracy in DC that has lots of power in a kind of unaccountable way.
Maybe someone could reach out to the author and be like "Ok, yeah, we are kind of a bit conspiratorial, sorry about that. But I think let's try to come clean, I will tell you all the stuff that I know, and you take seriously the hypothesis that we really aren't doing this to profit off of AI, but because we are genuinely concerned about catastrophic risks from AI".
I like that you imagine conversations like that in your head and that they sometimes go well there!
Seems important to select the right journalist if someone were to try this. I feel like the journalist would have to be sympathetic already or at least be a very reasonable and fair-minded person. Unfortunately, some journalists cannot think straight for the life of them and only make jumpy, shallow associations like "seeks influence, so surely this person must be selfish and greedy."
I didn't read the Politico article yet, but given that "altruism" is literally in the name with "EA," I wonder why it needs to be said "and you take seriously the hypothesis that we really aren't doing this to profit off of AI." If a journalist is worth his or her salt, and they write about a movement called EA, shouldn't a bunch of their attention go into the question of why/whether some of these people might be genuine? And if the article takes a different spin and they never even consider that question, doesn't it suggest something is off? (Again, haven't read the article yet – maybe they do consider it or at least leave it open.)
Should I talk about inferentially close things that makes them likeliest to embrace the policies I'm putting on their desk,
Here are some of my current principles around this issue.
1) It's fine to contribute to work on policies that you think are good-to-neutral that aren't your core motivation, just so you can get involved with the thing that is your core motivation.
"I am happy to be here helping the Senator achieve an agreement with foreign country Fiction-land on export taxes. I am not here because I personally care a great deal about export taxes (they seem fine to me), but because I want to build a personally strong relationship with representatives of Fiction-land, and I'm happy to help this appears-good-to-me agreement get made to do so.
2) It's not okay to contribute to work on policies that you think are harmful for the world, just so you can get involved with the thing that is your core motivation.
"I am here helping coordinate a ban on building houses in San Francisco which I expect will contribute to homelessness and substantially damage the economic growth of the city, because I would like to build a relationship with the city governance."
3) It's never okay to misrepresent what you believe.
"I am here primarily because I personally care about this policy a great deal" vs "I am here because the policy seems reasonable to me and my good ally John Smith has asked me to help him get it enacted, who I believe honestly cares about it and thinks it will make people's lives better."
4) I think it's okay to spend time talking about things that aren't your favorite thing.
"Let's talk through how we could get this policy passed that is like my 20th favorite thing" or "Let's spend a few hours figuring out how to achieve this goal that the head of our office wants even though I am not personally invested in it."
5) But you should answer honestly about your intentions.
"I'm here because I'd like to prevent us all going extinct from AI, but while I'm building up a career and reputation, in the meantime I'm happy to improve the government's understanding of what's even happening, or improve its communications with companies, or join in on what we're all working on here."
and things like "being gay" where society seems kind of transparently unreasonable about it,
Importantly "being gay" is classed for me as "a person's personal business", sort of irrespective of whether society is reasonable about it or not. I'm inclined to give people leeway to keep private personal info that doesn't have much impact on other people.
Yeah, seems like a reasonable thing to mention.
Can you say a bit more about Eleuther's involvement in the last of these papers? I thought that this was mostly done by people working at Apollo. EleutherAI is credited for providing compute (at the same level as OpenAI for providing GPT-4 credits), but I am surprised that you would claim it as work produced by EleutherAI?
Ooops. It appeared that I deleted my comment (deeming it largely off-topic) right as you were replying. I'll reproduce the comment below, and then reply to your question.
I separately had a very weird experience with them on the Long Term Future Fund where Conor Leahy applied for funding for Eleuther AI. We told him we didn't want to fund Eleuther AI since it sure mostly seemed like capabilities-research but we would be pretty interested in funding AI Alignment research by some of the same people. He then confusingly went around to a lot of people around EleutherAI and told them that "Open Phil is not interested in funding pre-paradigmatic AI Alignment research and that that is the reason why they didn't fund Eleuther AI". This was doubly confusing and misleading because Open Phil had never evaluated a grant to Eleuther AI (Asya who works at Open Phil was involved in the grant evaluation as a fund member, but nothing else), and of course the reason he cited had nothing to do with the reason we actually gave. He seems to have kept saying this for a long time even after I think someone explicitly corrected the statement to
While this anecdote is largely orthogonal to the broader piece, I remembered that this existed randomly today and wanted to mention that Open Phil has recommended a 2.6 M/3 years grant to EleutherAI to pursue interpretability research. It was a really pleasant and very easy experience: Nora Belrose (head of interpretability) and I (head of everything) talked with them about some of our recent and on-going work such as Eliciting Latent Predictions from Transformers with the Tuned Lens, Eliciting Latent Knowledge from Quirky Language Models, and Sparse Autoencoders Find Highly Interpretable Features in Language Models very interesting and once they knew we had shared areas of interest it was a really easy experience.
I had no vibes along the lines of "oh we don't like EleutherAI" or "we don't fund pre-paradigmatic research." It was a surprise to some people at Open Phil that we had areas of overlapping interest, but we spent like half an hour clarifying our research agenda and half an hour talking about what we wanted to do next and people were already excited.
We ended up talking about this in DMs, but to gist of it is:
Back in June Hoagy opened a thread in our "community research projects" channel and the work migrated there. Three of the five authors of the [eventual paper](https://arxiv.org/abs/2309.08600) chose to have EleutherAI affiliation (for any work we organize with volunteers, we tell them they're welcome to use an EleutherAI affiliation on the paper if they like) and we now have an entire channel dedicated to future work. I believe Hoagy has two separate paper ideas currently in the works and over a half dozen people working on them.
If you go with journalists, I'd want to find one who seems really truth-seeking.
I think it would be a very valuable public service to the community to have someone who’s job it is to read a journalist’s corpus and check if it seems fair and honest.
I think we could, as a community, have a policy of only talking with journalists who are honest. This seems like a good move pragmatically, because it means coverage of our stuff will be better on average, and it also universalizes really well, so long as “honest” doesn’t morph into “agrees with us about what’s important.”
It seems good and cooperative to disproportionately help high-integrity journalists get sources, and it helps us directly.
Like, a big problem with doing this kind of information management where you try to hide your connections and affiliations is that it's really hard for people to come to trust you again afterwards. If you get caught doing this, it's extremely hard to rebuild trust that you aren't doing this in the future, and I think this dynamic usually results in some pretty intense immune reactions when people fully catch up with what is happening.
I would have guessed that this is just not the level of trust people operate at. like for most things in policy people don't really act like their opposition is in good faith so there's not much to lose here. (weakly held)
A claim I've heard habryka make before (I don't know myself) is that there are actual rules to the kind of vague-deception that goes on in DC. And something like, while it's a known thing that a politician will say "we're doing policy X" when they don't end up doing policy X, if you misrepresent who you're affiliated with, this is an actual norm violation. (i.e. it's lying about the Simulacrum 3 level, which is the primary level in DC)
My guess is we maybe could have also done that at least a year earlier, and honestly I think given the traction we had in 2015 on a lot of this stuff, with Bill Gates and Elon Musk and Demis, I think there is a decent chance we could have also done a lot of Overton window shifting back then, and us not having done so is I think downstream of a strategy that wanted to maintain lots of social capital with the AI capability companies and random people in governments who would be weirded out by people saying things outside of the Overton window.
Though again, this is just one story, and I also have other stories where it all depended on Chat-GPT and GPT-4 and before then you would have been laughed out of the room if you had brought up any of this stuff (though I do really think the 2015 Superintelligence stuff is decent evidence against that). It's also plausible to me that you need a balance of inside and outside game stuff, and that we've struck a decent balance, and that yeah, having inside and outside game means there will be conflict between the different people involved in the different games, but it's ultimately the right call in the end.
I really want an analysis of this. The alignment and rationality communities were wrong about how tractable getting public & governmental buy-in to AI x-risk would be. But what exactly was the failure? That seems quite important to knowing how to alter decision making and to prevent future failures to grab low-hanging fruit.
I tried writing a fault analysis myself, but I couldn't make much progress and it seems like you more detailed models than I do. So someone other than me is probably the right person for this.
That said, the dialogues on AI governance and outreach are providing some of what I'm looking for here, and seem useful to anyone who does want to write an analysis. So thank you to everyone who's discussing these topics in public.
AI has immense potential, but also immense risks. AI might be misused by China, or get of control. We should balance the needs for innovation and safety." I wouldn't call this lying (though I agree it can have misleading effects, see Issue 1).
Not sure where this slots in, but there's also a sense in which this contains a missing positive mood about how unbelievably good (aligned) AI could or will be, and how much we're losing by not having it earlier.
On the meta level, for unfalsifiable claims (and even falsifiable claims that would take more effort to verify then a normal independent 3rd party adult could spend in say a month) it doesn't really seem to matter whether the person pushing the claims has integrity, beyond a very low bar?
They just need to have enough integrity to pass the threshold of not being some maniac/murderer/ persistent troll/etc...
But otherwise, beyond that threshold, there doesn't seem to be a much of a downside in treating folks politely while always assuming there's some hidden motivations going on behind the scenes.
And for the falsifiable claims that have reasonable prospects for independent 3rd party verification, assigning credibility, trust, integrity, etc., based on the author's track record of such claims proving to be true is more than sufficient for discussions. And without regard for what hidden motivations might there be.
Maybe this is not sufficient for the community organization/emotional support/etc. side of things, though you'd be the better judge of that.
How sympathetic to be about governance people not being open about key motivations and affiliations
Feelings & concerns about governance work by EAs
Stigmas around EA in the policy world
How can we make policy stuff more transparent?
Concerns about Conjecture
Conjecture as the flag for doomers