Ok, so we both had some feelings about the recent Conjecture post on "lots of people in AI Alignment are lying", and the associated marketing campaign and stuff.
I would appreciate some context in which I can think through that, and also to share info we have in the space that might help us figure out what's going on.
I expect this will pretty quickly cause us to end up on some broader questions about how to do advocacy, how much the current social network around AI Alignment should coordinate as a group, how to balance advocacy with research, etc.
Feelings about Conjecture post:
Questions on my mind:
I personally used to have a very similar rant to Conjecture. I'm now more sympathetic to governance people. We could try to tease out why.
This direction seems most interesting to me!
My current feelings in the space are that I am quite sympathetic to some comms-concerns that people in government have and quite unsympathetic to some other stuff, and I would also like to clarify for myself where the lines here are.
Curious whether you have any key set of observations or experiences you had that made you more sympathetic.
I've heard secondhand of at least one instance where a person brought up x risk, then their Congressional office took them less seriously. Other staffers have told me talking about x risk wouldn't play well (without citing specific evidence, but I take their opinions seriously).
I've also personally found it tricky to talk about takeover & existential risks, just because these ideas take a long time to explain, and there are many inferential steps between there and the policies I'm recommending. So, I'm often tempted to mention my x risk motivations only briefly, then focus on whatever's inferentially closest and still true. (Classically, this would be "misuse risks, especially from foreign adversaries and terrorists" and "bioweapon and cyberoffensive capabilities coming in the next few years".)
Separate point which we might want to discuss later
A thing I'm confused about is:
Should I talk about inferentially close things that makes them likeliest to embrace the policies I'm putting on their desk,
Or, should I just bite the bullet of being confusing and start many meetings with "I'm deeply concerned about humanity going extinct in the next decade because of advancing AI which might try to take over the world. It's a lot to explain but the scientists are on my side. Please help." — where the thing I'm trying to emphasize is the tone of worry.
Because I buy that we're systematically misleading people about how worried we are / they should be by not focusing on our actual concerns, and by not talking about them with a tone that conveys how worried we in fact are.
(Additional explanation added post-dialogue:
So, I think for me, I feel totally sympathetic to people finding it hard and often not worth it to explain their x-risk concerns, if they are talking to people who don't really have a handle for that kind of work yet.
Like, I have that experience all the time as well, where I work with various contractors, or do fundraising, and I try my best to explain what my work is about, but it does sure often end up rounded off to some random preconception they have (sometimes that's standard AI ethics sometimes that's cognitive science and psychology, sometimes that's people thinking I run a web-development startup that's trying to maximize engagement metrics).
The thing that I am much less sympathetic to is people being like "please don't talk about my connections to this EA/X-Risk ecosystem, please don't talk about my beliefs in this space to other people, please don't list me as having been involved with anything in this space publicly, etc."
Or like, the thing with Jason Matheny that I mentioned in a comment the other day where the senator that was asking him a question had already mentioned "apocalyptic risks from AI" and was asking Matheny how likely and how soon that would happen, and Matheny just responded with "I don't know".
Those to me don't read as people having trouble crossing an inferential distance. They read to me more as trying to be intentionally obfuscatory/kind-of-deceptive about their beliefs and affiliations here, and that feels much more like it's crossing lines.
Regarding "please don't talk about my connections or beliefs" ––
I'm not sure how bad I feel about this. I usually like openness. But I'm also usually fine with people being very private (but answering direct questions honestly). E.g. it feels weird to write online about two people dating when they'd prefer that info is private. I'm trying to sort out what the difference between that and EA-affiliation is now....
OK maybe the difference is in why they're being private: "who I'm dating" just feels personal and "EA association" is about hurting their ability do stuff". I take that point, and I share some distaste for it.
Anecdote: A few times AI governance people (FWIW: outside DC, who weren't very central to the field) told me that if someone asked how we knew each other, they would downplay our friendship, and they requested I do the same. I felt pretty sad and frustrated by the position this put me in, where I would be seen as defecting for being honest.
All this said, journalists and the political environment can be very adversarial. It seems reasonable and expected to conceal stuff when you know info will be misunderstood if not intentionally distorted. Thoughts?
I think the one thing that feels bad to me is people in DC trying to recruit fellow EAs or X-risk concerned people into important policy positions while also explicitly asking them to keep the strength of their personal connections on the down-low, and punishing people who openly talk about who they knew and their social relationships within DC. I've heard this a lot from a bunch of EAs in DC and it seems to be a relatively common phenomenon.
I am also really worried that it is the kind of thing that as part of its course masks the problems it causes (like, we wouldn't really know if this is causing tons of problems, because people are naturally incentivized to cover the problems up that are caused by it, and because making accusations of this kind of conspiratorial action is really high-stakes and hard), and so if it goes wrong, it will probably go wrong quickly and with a lot of pent-up energy.
I do totally think there are circumstances where I will hide things from other people with my friends, and be quite strategic about it. The classical examples are discussing problems with some kind of authoritarian regime while you are under it (Soviet Russia, Nazi Germany, etc.), and things like "being gay" where society seems kind of transparently unreasonable about it, and in those situations I am opting into other people getting to impose this kind of secrecy and obfuscation request on me. I also feel kind of similar about people being polyamorous, which is pretty relevant to my life, since that still has a pretty huge amount of stigma attached to it.
I do think I experienced a huge shift after the collapse of FTX where I was like "Ok, but after you caused the biggest fraud since Enron, you really lost your conspiracy license. Like, 'the people' (broadly construed) now have a very good reason for wanting to know the social relationships you have, because the last time they didn't pay attention to this it turns out to have been one of the biggest frauds of the last decade".
I care about this particularly much because indeed FTX/Sam was actually really very successful at pushing through regulations and causing governance change, but as far as I can tell he was primarily acting with an interest in regulatory capture (having talked to a bunch of people who seem quite well-informed in this space, it seems that he was less than a year away from basically getting a US sponsored monopoly on derivatives trading in the US via regulatory capture). And like, I can't distinguish the methods he was using from the methods other EAs are using right now (though I do notice some difference).
(Edited note: I use "conspiracy" and "conspiratorial" a few times in this conversation. I am not hugely happy with that choice of words since it's the kind of word that often imports negative associations without really justifying them. I do somewhat lack a better word for the kind of thing that I am talking about, and I do intend some of the negative associations, but I still want to flag that I think accusing things of being "conspiracies" is a pretty frequent underhanded strategy that people use to cause negative associations with something without there being anything particularly clear to respond to.
In this case I want to be clear that what I mean by "conspiratorial" or "conspiracy" is something like "a bunch of people trying pretty hard to hide the existence of some alliance of relatively well-coordinated people, against the wishes of other people, while doing things like lying, omitting clearly relevant information, exaggerating in misleading ways, and doing so with an aim that is not sanctioned by the larger group they are hiding themselves from".
As I mention in this post, I think some kinds of conspiracies under this definition are totally fine. As an example I bring up later, I totally support anti-nazi conspiracies during WW2.)
Yeah, I was going to say I imagine some people working on policy might argue they're in a situation where hiding it is justified because EA associations have a bunch of stigma.
But given FTX and such, some stigma seems deserved. (Sigh about the regulatory capture part — I wasn't aware of the extent.)
For reference, the relevant regulation to look up is the Digitial Commodities Consumer Protection Act
I'm curious for you to list some things governance people are doing that you think are bad or fine/understandable, so I can see if I disagree with any.
I do think that at a high level, the biggest effect I am tracking from the governance people is that in the last 5 years or so, they were usually the loudest voices that tried to get people to talk less about existential risk publicly, and to stay out of the media, and to not reach out to high-stakes people in various places, because they were worried that doing so would make us look like clowns and would poison the well.
And then one of my current stories is that at some point, mostly after FTX when people were fed up with listening to some vague EA conservative consensus, a bunch of people started ignoring that advice and finally started saying things publicly (like the FLI letter, Eliezer's time piece, the CAIS letter, Ian Hogarth's piece). And then that's the thing that's actually been moving things in the policy space.
My guess is we maybe could have also done that at least a year earlier, and honestly I think given the traction we had in 2015 on a lot of this stuff, with Bill Gates and Elon Musk and Demis, I think there is a decent chance we could have also done a lot of Overton window shifting back then, and us not having done so is I think downstream of a strategy that wanted to maintain lots of social capital with the AI capability companies and random people in governments who would be weirded out by people saying things outside of the Overton window.
Though again, this is just one story, and I also have other stories where it all depended on Chat-GPT and GPT-4 and before then you would have been laughed out of the room if you had brought up any of this stuff (though I do really think the 2015 Superintelligence stuff is decent evidence against that). It's also plausible to me that you need a balance of inside and outside game stuff, and that we've struck a decent balance, and that yeah, having inside and outside game means there will be conflict between the different people involved in the different games, but it's ultimately the right call in the end.
While you type, I'm going to reflect on my feelings about various things governance people might do:
The second biggest thing I am tracking is a kind of irrecoverable loss of trust that I am worried will happen between "us" and "the public", or something in that space.
Like, a big problem with doing this kind of information management where you try to hide your connections and affiliations is that it's really hard for people to come to trust you again afterwards. If you get caught doing this, it's extremely hard to rebuild trust that you aren't doing this in the future, and I think this dynamic usually results in some pretty intense immune reactions when people fully catch up with what is happening.
Like, I am quite worried that we will end up with some McCarthy-esque immune reaction to EA people in the US and the UK government where people will be like "wait, what the fuck, how did it happen that this weirdly intense social group with strong shared ideology is now suddenly having such an enormous amount of power in government? Wow, I need to kill this thing with fire, because I don't even know how to track where it is, or who is involved, so paranoia is really the only option".
On "CAIS/Ian/etc. finally just said they're really worried in public"
I think it's likely the governance folk wouldn't have done this themselves at that time, had no one else done it. So, I'm glad CAIS did.
I'm not convinced people could have loudly said they're worried about AI extinction from AI pre-ChatGPT without the blowback people feared.
On public concern,
I agree this is possible...
But I'm also worried about making the AI convo so crowded, by engaging with the public a lot, that x-risk doesn't get dealt with. I think a lot of the privacy isn't malicious but practical. "Don't involve a bunch of people in your project who you don't actually expect to contribute, just because they might be unhappy if they're not included".
I'm conflicted between "EA has undeserved stigma" and "after FTX, everyone should take it upon themselves to be very open"
I am kind of interested in talking about this a bit. I feel like it's a thing I've heard a lot, and I guess I don't super buy it. What is the undeserved stigma that EA is supposed to have?
Is your claim that EA's stigma is all deserved?
Laying out the stigmas I notice:
I guess the stigmas seem pretty directionally true about the negatives, and just miss that there is serious thought / positives here.
If journalists said, "This policy person has EA associations. That suggests they've been involved with the community that's thought most deeply about catastrophic AI risk and has historically tried hard and succeeded at doing lots of good. It should also raise some eyebrows, see FTX," then I'd be fine. But usually it's just the second half, and that's why I'm sympathetic to people avoiding discussing their affiliation.
Yeah, I like this analysis, and I think it roughly tracks how I am thinking about it.
I do think the bar for "your concerns about me are so unreasonable that I am going to actively obfuscate any markers of myself that might trigger those concerns" is quite high. Like I think the bar can't be at "well, I thought about these concerns and they are not true", it has to be at "I am seriously concerned that when the flags trigger you will do something quite unreasonable", like they are with the gayness and the communism-dissenter stuff.
Fair enough. This might be a case of governance people overestimating honesty costs / underestimating benefits, which I still think they often directionally do.
(I'll also note, what if all the high profile people tried defending EA? (Defending in the sense of - laying out the "Here are the good things; here are the bad things; here's how seriously I think you should take them, all things considered."))
I don't think people even have to defend EA or something. I think there are a lot of people who I think justifiably would like to distance themselves from that identity and social network because they have genuine concerns about it.
But I think a defense would definitely open the door for a conversation that acknowledges that of course there is a real thing here that has a lot of power and influence, and would invite people tracking the structure of that thing and what it might do in the future, and if that happens I am much less concerned about both the negative epistemic effects and the downside risk from this all exploding in my face.
Do you have ideas for ways to make the thing you want here happen? What does it look like? An op-ed from Person X?
Probably somewhat controversially, but I've been kind of happy about the Politico pieces that have been published. We had two that basically tried to make the case there is an EA conspiracy in DC that has lots of power in a kind of unaccountable way.
Maybe someone could reach out to the author and be like "Ok, yeah, we are kind of a bit conspiratorial, sorry about that. But I think let's try to come clean, I will tell you all the stuff that I know, and you take seriously the hypothesis that we really aren't doing this to profit off of AI, but because we are genuinely concerned about catastrophic risks from AI".
That does seem like kind of a doomed plan, but like, something in the space feels good to me. Maybe we (the community) can work with some journalists we know to write a thing that puts the cards on the table, and isn't just a puff-piece that tries to frame everything in the most positive way, but is genuinely asking hard questions.
Politico: +1 on being glad it came out actually!
I feel mixed the idea of writing something. If you go with journalists, I'd want to find one who seems really truth-seeking. Also, I could see any piece playing poorly; definitely collect feedback on your draft and avoid sharing info unilaterally.
Would you do this? Would you want some bigger name EA / governance person to do it?
I think I have probably sadly burdened myself with somewhat too much confidentiality to dance this dance correctly, though I am not sure. I might be able to get buy-in from a bunch of people so that I can be free to speak openly here, but it would increase the amount of work a lot, and also decent chance they don't say yes and then I would need to be super paranoid which bits I leak when working on this.
As in, you've agreed to keep too much secret?
If so, do you have people in mind who aren't as burdened by this (and who have the relevant context)?
Yeah, too many secrets.
I assume most of the big names have similar confidentiality burdens.
Yeah, ideally it would be one of the big names since that I think would meaningfully cause a shift in how people operate in the space.
Eliezer is great at moving Overton windows like this, but I think he is really uninterested in tracking detailed social dynamics like this, and so doesn't really know what's up.
Do you have time to have some chats with people about the idea or send around a Google doc?
I do feel quite excited about making this happen, though I do think it will be pretty aggressively shut down, and I feel both sad about that, and also have some sympathy in that it does feel like it somewhat inevitably involves catching some people in the cross-fire who were being more private for good reasons, or who are in a more adversarial context where the information here will be used against them in an unfair way, and I still think it's worth it, but it does make me feel like this will be quite hard.
I also notice that I am just afraid of what would happen if I were to e.g. write a post that's just like "an overview over the EA-ish/X-risk-ish policy landscape" that names specific people and explains various historical plans. Like I expect it would make me a lot of enemies.
Same, and some of my fear is "this could unduly make the 'good plans' success much harder"
Ok, I think I will sit on this plan for a bit. I hadn't really considered it before, and I kind of want to bring it up to a bunch of people in the next few weeks and see whether maybe there is enough support for this to make it happen.
Okay! (For what it's worth, I currently like Eliezer most, if he was willing to get into the social stuff)
Any info it'd be helpful for me to collect from DC folk?
Oh, I mean I would love any more data on how much this would make DC folk feel like some burden was lifted from their shoulders vs. it would feel like it would just fuck with their plans.
I think my actual plan here would maybe be more like an EA Forum post or something that just goes into a lot of detail on what is going on in DC, and isn't afraid to name specific names or organizations.
I can imagine directly going for an Op-ed could also work quite well, and would probably be more convincing to outsiders, though maybe ideally you could have both. Where someone writes the forum post on the inside, and then some external party verifies a bunch of the stuff, and digs a bit deeper, and then makes some critiques on the basis of that post, and then the veil is broken.
Would the DC post include info that these people have asked/would ask you to keep secret?
Definitely "would", though if I did this I would want to sign-post that I am planning to do this quite clearly to anyone I talk to.
I am also burdened with some secrets here, though not that many, and I might be able to free myself from those burdens somehow. Not sure.
Ok I shall ask around in the next 2 weeks. Ping me if I don't send an update by then
Ok, going back a bit to the top-level, I think I would still want to summarize my feelings on the Conjecture thing a bit more.
Like, I guess the thing that I would feel bad about if I didn't say it in a context like this, is to be like "but man, I feel like some of the Conjecture people were like at the top of my list of people trying to do weird epistemically distortive inside-game stuff a few months ago, and this makes them calling out people like this feel quite bad to me".
In-general a huge component of my reaction to that post was something in the space of "Connor and Gabe are kind of on my list to track as people I feel most sketched out by in a bunch of different ways, and kind of in the ways the post complaints about" and I feel somewhat bad for having dropped my efforts from a few months ago about doing some more investigation here and writing up my concerns (mostly because I was kind of hoping a bit that Conjecture would just implode as it ran out of funding and maybe the problem would go away)
(For what it's worth, Conjecture has been pretty outside-game-y in my experience. My guess is this is mostly a matter of "they think outside game is the best tactic, given what others are doing and their resources", but they've also expressed ethical concerns with the inside game approach.)
(For some context on this, Conjecture tried really pretty hard a few months ago to get a bunch of the OpenAI critical comments on this post deleted because they said it would make them look bad to OpenAI and would antagonize people at labs in an unfair way and would mess with their inside-game plans that they assured me were going very well at the time)
(I heard a somewhat different story about this from them, but sure, I still take it as is evidence that they're mostly "doing whatever's locally tactical")
Anyway, I was similarly disappointed by the post just given I think Conjecture has often been lower integrity and less cooperative than others in/around the community. For instance, from what I can tell,
I have a doc detailing my observations that I'm open to sharing privately, if people DM me.
(I discussed these concerns with Conjecture at length before leaving. They gave me substantial space to voice these concerns, which I'm appreciative of, and I did leave our conversations feeling like I understood their POV much better. I'm not going to get into "where I'm sympathetic with Conjecture" here, but I'm often sympathetic. I can't say I ever felt like my concerns were resolved, though.)
I would be interested in your concerns being written up.
I do worry about the EA x Conjecture relationship just being increasingly divisive and time-suck-y.
Here is an email I sent Eliezer on April 2nd this year with one paragraph removed for confidentiality reasons:
This is just an FYI and I don't think you should hugely update on this but I felt like I should let you know that I have had some kind of concerning experiences with a bunch of Conjecture people that currently make me hesitant to interface with them very much and make me think they are somewhat systematically misleading or deceptive. A concrete list of examples:
I had someone reach out to me with the following quote:
Mainly, I asked one of their senior people how they plan to make money because they have a lot of random investors, and he basically said there was no plan, AGI was so near that everyone would either be dead or the investors would no longer care by the time anyone noticed they weren’t seeming to make money. This seems misleading either to the investors or to me — I suspect me, because it would really just be wild if they had no plan to ever try to make money, and in fact they do actually have a product (though it seems to just be Whisper repackaged)
I separately had a very weird experience with them on the Long Term Future Fund where Conor Leahy applied for funding for Eleuther AI. We told him we didn't want to fund Eleuther AI since it sure mostly seemed like capabilities-research but we would be pretty interested in funding AI Alignment research by some of the same people.
He then confusingly went around to a lot of people around EleutherAI and told them that "Open Phil is not interested in funding pre-paradigmatic AI Alignment research and that that is the reason why they didn't fund Eleuther AI".
This was doubly confusing and misleading because Open Phil had never evaluated a grant to Eleuther AI (Asya who works at Open Phil was involved in the grant evaluation as a fund member, but nothing else), and of course the reason he cited had nothing to do with the reason we actually gave. He seems to have kept saying this for a long time even after I think someone explicitly corrected the statement to him.
Another experience I had was Gabe from Conjecture reaching out to LessWrong and trying really quite hard to get us to delete the OpenAI critical comments on this post: https://www.lesswrong.com/posts/3S4nyoNEEuvNsbXt8/common-misconceptions-about-openai
He said he thought people in-general shouldn't criticize OpenAI in public like this because this makes diplomatic relationships much harder, and when Ruby told them we don't delete that kind of criticism he escalated to me and generally tried pretty hard to get me to delete things.
[... One additional thing that's a bit more confidential but of similar nature here...]
None of these are super bad but they give me an overall sense of wanting to keep a bunch of distance from Conjecture, and trepidation about them becoming something like a major public representative of AI Alignment stuff. When I talked to employees of Conjecture about these concern the responses I got also didn't tend to be "oh, no, that's totally out of character", but more like "yeah, I do think there is a lot of naive consequentialism here and I would like your help fixing that".
No response required, happy to answer any follow-up questions. Just figured I would err more on the side of sharing things like this post-FTX.
I wish MIRI was a little more loudly active, since I think doomy people who are increasingly distrustful of moderate EA want another path, and supporting Conjecture seems pretty attractive from a distance.
Again, I'm not sure "dealing with Conjecture" is worth the time though.
Main emotional effects of the post for me
Yeah, that also roughly matches my emotional reaction. I did like the other RSP discussion that happened that week (and liked my dialogue with Ryan which I thought was pretty productive).
Yeah, I share this feeling. I am quite glad MIRI is writing more, but am also definitely worried that somehow Conjecture has positioned itself as being aligned with MIRI in a way that makes me concerned people will end up feeling deceived.
Can you say more about the feeling deceived worry?
(I didn't feel deceived having joined myself, but maybe "Conjecture could've managed my expectations about the work better" and "I wish the EAs with concerns told me so more explicitly instead of giving very vague warnings".)
Well, for better or for worse I think a lot of people seem to make decisions on the basis of "is this thing a community-sanctioned 'good thing to do (TM)'". I think this way of making decisions is pretty sus, and I feel a bit confused how much I want to take responsibility for people making decisions this way, but I think because Conjecture and MIRI look similar in a bunch of ways, and I think Conjecture is kind of explicitly is trying to carry the "doomer" flag, a lot of people will parse Conjecture as "a community-sanctioned 'good thing to do (TM)'".
I think this kind of thing then tends to fail in one of two ways:
Both make me pretty sad.
Also, even if you are following a less dumb decision-making structure, the world is just really complicated, and especially with tons of people doing hard-to-track behind the scenes work, it is just really hard to figure out who is doing real work or not, and Conjecture has been endorsed by a bunch of different parts of the community for-real (like they received millions of dollars in Jaan funding, for example, IIRC), and I would really like to improve the signal to noise ratio here, and somehow improve the degree to which people's endorsements accurately track whether a thing will be good.
Fair. People did warn me before I joined Conjecture (but it didn't feel very different from warnings I might get before working at MIRI). Also, most people I know in the community are aware Conjecture has a poor reputation.
I'd support and am open to writing a Conjecture post explaining the particulars of
Well, maybe this dialogue will help, if we edit and publish a bunch of it.
In the interest of saying more things publicly on this, some relevant thoughts:
In particular, I think their usage of Dario's statements on x-risk as a rhetorical weapon against RSPs creates a structural disincentive against lab heads being clear about existential risk
I’m not sure how to articulate this, exactly, but I want to say something like “it’s not on us to make sure the incentives line up so that lab heads state their true beliefs about the amount of risk they’re putting the entire world in.” Stating their beliefs is just something they should be doing, on a matter this important, no matter the consequences. That’s on them. The counterfactual world—where they keep quiet or are unclear in order to hide their true (and alarming) beliefs about the harm they might impose on everyone—is deceptive. And it is indeed pretty unfortunate that the people who are most clear about this (such as Dario), will get the most pushback. But if people are upset about what they’re saying, then they should still be getting the pushback.
When I was an SRE at Google, we had a motto that I really like, which is: "hope is not a strategy." It would be nice if all the lab heads would be perfectly honest here, but just hoping for that to happen is not an actual strategy.
Furthermore, I would say that I see the main goal of outside-game advocacy work as setting up external incentives in such a way that pushes labs to good things rather than bad things. Either through explicit regulation or implicit pressure, I think controlling the incentives is absolutely critical and the main lever that you have externally for controlling the actions of large companies.
I don't think aysja was endorsing "hope" as a strategy– at least, that's not how I read it. I read it as "we should hold leaders accountable and make it clear that we think it's important for people to state their true beliefs about important matters."
To be clear, I think it's reasonable for people to discuss the pros and cons of various advocacy tactics, and I think asking "to what extent do I expect X advocacy tactic will affect peoples' incentives to openly state their beliefs?" makes sense.
Separately, though, I think the "accountability frame" is important. Accountability can involve putting pressure on them to express their true beliefs, pushing back when we suspect people are trying to find excuses to hide their beliefs, and making it clear that we think openness and honesty are important virtues even when they might provoke criticism– perhaps especially when they might provoke criticism. I think this is especially important in the case of lab leaders and others who have clear financial interests or power interests in the current AGI development ecosystem.
It's not about hoping that people are honest– it's about upholding standards of honesty, and recognizing that we have some ability to hold people accountable if we suspect that they're not being honest.
I would say that I see the main goal of outside-game advocacy work as setting up external incentives in such a way that pushes labs to good things rather than bad things
I'm currently most excited about outside-game advocacy that tries to get governments to implement regulations that make good things happen. I think this technically falls under the umbrella of "controlling the incentives through explicit regulation", but I think it's sufficiently different from outside-game advocacy work that is trying to get labs to do things voluntarily.
I think their usage of Dario's statements on x-risk as a rhetorical weapon against RSPs creates a structural disincentive against lab heads being clear about existential risk and reduces the probability of us getting good RSPs from other labs and good RSP-based regulation.
Setting aside my personal models of Connor/Gabe/etc, the only way this action reads to me as making sense if one feels compelled to go all in on "so-called Responsible Scaling Policies are primarily a fig leaf of responsibility from ML labs, as the only viable responsible option is to regulate them / shut them down". I assign at least 10% to that perspective being accurate, so I am not personally ruling it out as a fine tactic.
I agree it is otherwise disincentivizing in worlds where open discussion and publication of scaling policies (even I cannot bring myself to calling them 'responsible') is quite reasonable.
Probably Evan/others agree with this, but I want to explicitly point out that the CEOs of the labs such as Amodei and Altman and Hassabis should answer the question honestly regardless of how it's used by those they're in conflict with, the matter is too important for it to be forgivable that they would otherwise be strategically avoidant in order to prop up their businesses.
There are all kinds of benefits to acting with good faith, and people should not feel licensed to abandon good faith dialogue just because they're SUPER confident and this issue is REALLY IMPORTANT. When something is really serious it becomes even more important to do boring +EV things like "remember that you can be wrong sometimes" and "don't take people's quotes out of context, misrepresent their position, and run smear campaigns on them; and definitely don't make that your primary contribution to the conversation".Like, for Connor & people who support him (not saying this is you Ben): don't you think it's a little bit suspicious that you ended up in a place where you concluded that the very best use of your time in helping with AI risk was tweet-dunking and infighting among the AI safety community?
A couple of quick, loosely-related thoughts:
Well said. I agree with all of these except the last one and the gradual model release one (I think the update should be that letting the public interact with models is great, but whether to do it gradually or in a 'lumpy' way is unclear. E.g. arguably ChatGPT3.5 should have been delayed until 2023 alongside GPT4. That would have pushed back the acceleration of e.g. GDM a few more months, without (IMO) any harm to public wake-up.)I especially want to reemphasize your point 2.
E.g. arguably ChatGPT3.5 should have been delayed until 2023 alongside GPT4. That would have pushed back the acceleration of e.g. GDM a few more months, without (IMO) any harm to public wake-up.)
That would have pushed back public wakeup equally though, because it was ChatGPT3.5 that caused the wakeup.
Did anyone at OpenAI explicitly say that a factor in their release cadence was getting the public to wake up about the pace of AI research and start demanding regulation? Because this seems more like a post hoc rationalization for the release policy than like an actual intended outcome.
See Sam Altman here:
As we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally.A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low.
As we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally.
A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low.
And Sam has been pretty vocal in pushing for regulation in general.
It would have pushed it back, but then the extra shock of going straight to ChatGPT4 would have made up for it I think. Not sure obviously.
then chatgpt4 would still have had low rate limits, so most people would still be more informed by ChatGPT3.5
"One of the biggest conspiracies of the last decade" doesn't seem right. The amount of money/influence involved in FTX is dwarfed by the amount of money/influence thrown around by governments in general, and it's easier for factions within governments to enforce secrecy than for corporations to do so. More concretely, I'd say that there were probably several different "conspiratorial" things related to covid in various countries that had much bigger effects; probably several more related to ongoing Russia-Ukraine and Israel-Palestine conflicts; probably several more Trump/Biden-related things; maybe some to do with culture-war stuff; probably a few more prosaic fraud or corruption things that stole tens of billions of dollars, just less publicly (e.g. from big government contracts); a bunch of criminal gangs which also have far more money than FTX did; and almost certainly a bunch that don't fall into any of those categories. (For example, if the CIA is currently doing any stuff comparable to its historical record of messing around with South American countries, that's plausibly far bigger than FTX. Or various NSA surveillance type things are likely a much bigger deal, in terms of impact, than FTX. Oh, and stuff like NotPetya should probably count too.)
There are few programs even within the U.S. government that are larger than $10B without very extensive reporting requirements and where it's quite hard for them to be conspiratorial in the relevant ways (they might be ineffective, or the result of various bad equilibria, but I don't think you regularly get conspiracies at this scale).
To calibrate people here, the total budget of the NSA appears to be just around $10B/yr, making it so that even if you classify the whole thing as a conspiracy, at least in terms of expenditure it's still roughly the size of the FTX fraud (though I more like 10x larger if you count it over the whole last decade) .
To be clear, there is all kinds of stuff going on in the world that is bad, but in terms of things that are as clearly some kind of criminal or high-stakes government conspiracy, I think FTX stands among the biggest ones (though I totally agree there are probably other ones, though by nature its hard for me to say how many).
(In any case, I changed the word "conspiracy" here to "fraud" which I think gets the same point across, and my guess is we all agree that FTX is among the biggest frauds of the last decade)
There are over 100 companies globally with a market cap of more than 100 billion. If we're indexing on the $10 billion figure, these companies could have a bigger financial impact by doing "conspiracy-type" things that swung their value by <10%. How many of them have actually done that? No idea, but "dozens" doesn't seem implausible (especially when we note that many of them are based in authoritarian countries).
Re NSA: measuring the impact of the NSA in terms of inputs is misleading. The problem is that they're doing very highly-leveraged things like inserting backdoors into software, etc. That's true of politics more generally. It's very easy for politicians to insert clauses into bills that have >$10 billion of impact. How often are the negotiations leading up to that "conspiratorial"? Again, very hard to know.
in terms of things that are as clearly some kind of criminal or high-stakes government conspiracy, I think FTX stands among the biggest ones
This genuinely seems bizarre to me. A quick quote I found from googling:
The United Nations estimated in a 2011 report that worldwide proceeds from drug trafficking and other transnational organized crime were equivalent to 1.5 percent of global GDP, or $870 billion in 2009.
That's something like 100 FTXs per year; we mostly just don't see them. Basically I think that you're conflating legibility with impact. I agree FTX is one of the most legible ways in which people were defrauded this century; I also think it's a tiny blip on the scale of the world as a whole. (Of course, that doesn't make it okay by any means; it was clearly a big fuck-up, there's a lot we can and should learn from it, and a lot of people who were hurt.)
Does sure seem like there are definitional issues here. I do agree that drug trade and similar things bring the economic effects of conspiracy-type things up a lot, and I hadn't considered those, and agree that if you count things in that reference class FTX is a tiny blip.
I think given that, I basically agree with you that FTX isn't that close to one of the biggest conspiracies of the last decade. I do think it's at the top of frauds in the last decade, though that's a narrower category.
I do think it's at the top of frauds in the last decade, though that's a narrower category.
Nikola went from a peak market cap of $66B to ~$1B today, vs. FTX which went from ~$32B to [some unknown but non-negative number].
I also think the Forex scandal counts as bigger (as one reference point, banks paid >$10B in fines), although I'm not exactly sure how one should define the "size" of fraud.
I wouldn't be surprised if there's some precise category in which FTX is the top, but my guess is that you have to define that category fairly precisely.
Wikipedia says "the monetary losses caused by manipulation of the forex market were estimated to represent $11.5 billion per year for Britain’s 20.7 million pension holders alone" which, if anywhere close to true, would make this way bigger than FTX, but I think the methodology behind that number is just guessing that market manipulation made foreign-exchange x% less efficient, and then multiplying through by x%, which isn't a terrible methodology but also isn't super rigorous.
I wasn't intending to say "the literal biggest", though I think it's a decent candidate for the literal biggest. Depending on your definitions I agree things like Nikola or Forex could come out on top. I think it's hard to define things in a way so that it isn't in the top 5.
I think the heuristic "people take AI risk seriously in proportion to how seriously they take AGI" is a very good one.
Agree. Most people will naturally buy AGI Safety if they really believe in AGI. No AGI->AGI is the hard part, not AGI->AGI Safety.
I agree with all of these (except never felt worried about being quoted by Conjecture)
This is looking increasingly prescient.
[Edit to add context]
Not saying this is happening now, but after the board decisions at OpenAI, I could imagine more people taking notice. Hopefully the sentiment then will just be open discourse and acknowledging that there's now this interesting ideology besides partisan politics and other kinds of lobbying/influence-seeking that are already commonplace. But to get there, I think it's plausible that EA has some communications- and maybe trust-building work to do.
Just for the record, if the current board thing turns out to be something like a play of power from EAs in AI Safety trying to end up more in control (by e.g. planning to facilitate a merger or a much closer collaboration with Anthropic), and the accusations of lying to the board turn out to be a nothing-burger, then I would consider this a very central example of the kind of political play I was worried would happen (and indeed involved Helen who is one of the top EA DC people).
Correspondingly I assign decently high (20-25%) probability to that indeed being what happened, in which case I would really like the people involved to be held accountable (and for us to please stop the current set of strategies that people are running that give rise to this kind of thing).
As you'd probably agree with, it's plausible that Sutskever was able to convince the board about specific concerns based on his understanding of the technology (risk levels and timelines) or his day-to-day experience at OpenAI and direct interactions with Sam Altman. If that's what happened, then it wouldn't be fair that any EA-minded board members just acted in an ideologically-driven way. (Worth pointing out for people who don't know this that Sutskever has no ties to EA; it just seems like he shares concerns about the dangers from AI.)
But let's assume that it comes out that EA board members played a really significant role or were even thinking about something like this before Sutskever brought up concerns. "Play of power" evokes connotations of opportunism and there being no legitimacy for the decision other than that the board thought they could get away with it. This sort of concern you're describing would worry me a whole lot more if OpenAI had a typical board and corporate structure.However, since they have a legal structure and mission that emphasizes benefitting humanity as a whole and not shareholders, I'd say situations like the one here are (in theory) exactly why the board was set up that way. The board's primary task is overseeing the CEO. To achieve OpenAI's mission, the CEO needs to have the type of personality and thinking habits so he will likely converge toward whatever the best-informed views are about AI risks (and benefits) and how to mitigate (and actualize) them. The CEO shouldn't be someone who is unlikely to engage in the sort of cognition that one would perform if one cared greatly about long-run outcomes rather than near-term status and took seriously the chance of being wrong about one's AI risk and timeline assumptions. Regardless of what's actually true about Altman, it seems like the board came to a negative conclusion about his suitability. In terms of how they made this update, we can envision some different scenarios, some of them would seem unfair to Altman and "ideology-driven" in a sinister way, while others would seem legitimate. (The following scenarios will take for granted that the thing that happened had elements of a "AI safety coup," as opposed to a "Sutskever coup" or "something else entirely." Again, I'm not saying that any of this is confirmed; I'm just going with the hypothesis where the EA involvement has the most potential for controversy.) So, here are three variants of how the board could have updated that Altman is not suitable for the mission:
(1) The responsible board members (could just be a subset of the ones that voted against Altman rather than all four of them) never gave him much of a chance. They learned that Altman is less concerned about AI notkilleveryoneism than they would've liked, so they took an opportunity to try to oust him. (This is bad because it's ideology-driven rather than truth-seeking.)
(2) The responsible board members did give Altman a chance initially, but he deceived them in a smoking-gun-type breach of trust.
(3) The responsible board members did gave Altman a chance initially, but they became increasingly disillusioned through a more insincere-vibes-based and gradual erosion of trust, perhaps accompanied by disappointments from empty promises/assurances about, e.g., taking safety testing more seriously for future models, avoiding racing dynamics/avoiding giving out too much info on how to speed up AI through commercialization/rollouts, etc. (I'm only speculating here with the examples I'm giving, but the point is that if the board is unusually active about looking into stuff, it's conceivable that they maybe-justifiably reached this sort of update even without any smoking-gun-type breach of trust.) Needless to say, (1) would be very bad board behavior and would put EA in a bad light. (2) would be standard stuff about what boards are there for, but seems somewhat unlikely to have happened here based on the board not being able to easily give more info to the public about what Altman did wrong (as well as the impression I get that they don't hold much leverage in the negotations now). (3) seems most likely to me and also quite complex to make judgments about the specifics, because lots of things can fall into (3). (3) requires an unusually "active/observant" board. This isn't necessarily bad. I basically want to flag that I see lots of (3)-type scenarios where the board acted with integrity and courage, but also (admittedly) probably displayed some inexperience by not preparing for the power struggle that results after a decision like this, and by (possibly?) massively mishandling communications, using wording that may perfectly describe what happened when the description is taken literally, but is very misleading when we apply the norms about how parting ways announcements are normally written in very tactful corporate speak. (See also Eliezer's comment here.) Alternatively, it's also possible that a (3)-type scenario happened, but the specific incremental updates were uncharitable towards Altman due to being tempted by "staging a coup," or stuff like that. It gets messy when you have to evaluate someone's leadership fit where they have a bunch of uncontested talents but also some orange flags and you have decide what sort of strengths or weaknesses are most essential for the mission.
For me the key variable is whether they took a decision that would have put someone substantially socially closer to them in charge, with some veneer of safety motivation, but where the ultimate variance in their decision would counterfactually be driven by social proximity and pre-existing alliances.
A concrete instance of this would be if the plan with the firing was to facilitate some merge with Anthropic, or to promote someone like Dario to the new CEO position, who the board members (which were chosen by Holden) have a much tighter relationship to.
Clarification/history question: How were these board members chosen?
My current model is that Holden chose them. Tasha in 2018, Helen in 2021 when he left and chose Helen as his successor board member.
I don't know, but I think it was close to a unilateral decision from his side (like I don't think anyone at Open AI had much reason to trust Helen outside of Holden's endorsement, so my guess is he had a lot of leeway).
Thanks! And why did Holden have the ability to choose board members (and be on the board in the first place)?
I remember hearing that this was in exchange for OP investment into OpenAI, but I also remember Dustin claiming that OpenAI didn’t actually need any OP money (would’ve just gotten the money easily from another investor).
Is your model essentially that the OpenAI folks just got along with Holden and thought he/OP were reasonable, or is there a different reason Holden ended up having so much influence over the board?
My model is that this was a mixture of a reputational trade (OpenAI got to hire a bunch of talent from EA spaces and make themselves look responsible to the world) and just actually financial ($30M was a substantial amount of money at that point in time).
Sam Altman has many times said he quite respects Holden, so that made up a large fraction of the variance. See e.g. this tweet:
(i used to be annoyed at being the villain of the EAs until i met their heroes*, and now i'm lowkey proud of it *there are a few EA heroes i think are really great, eg Holden)
[...] reputational trade (OpenAI got to hire a bunch of talent from EA spaces and make themselves look responsible to the world) [...]
Yes, I think "reputational trade," i.e., something that's beneficial for both parties, is an important part of the story that the media hasn't really picked up on. EAs were focused on the dangers and benefits from AI way before anyone else, so it carries quite some weight when EA opinion leaders put an implicit seal of approval on the new AI company.
There's a tension between (1) previously having held back on natural-seeming criticism of OpenAI ("putting the world at risk for profits" or "they plan on wielding this immense power of building god/single-handedly starting something bigger than the next Industrial Revolution/making all jobs obsolete and solving all major problems") because they have the seal of approval from this public good, non-profit, beneficial-mission-focused board structure,
(2) being outraged when this board structure does something that it was arguably intended to do (at least under some circumstances).
(Of course, the specifics of how and why things happened matter a lot, and maybe most people aren't outraged because the board did something, but rather because of how they did it or based on skepticism about reasons and justifications. On those later points, I sympathize more with people who are outraged or concerned that something didn't go right. But we don't know all the details yet.)
Almost all the outrage I am seeing is about how this firing was conducted. I think if the board had a proper report ready that outlined why they think OpenAI was acting recklessly, and if they had properly consulted with relevant stakeholders before doing this, I think the public reaction would be very different.
I agree there are also some random people on the internet who are angry about the board taking any action even though the company is going well in financial terms, but most of the well-informed and reasonably people I've seen are concerned about the way this was rushed and how the initial post seemed to pretty clearly imply that Sam had done some pretty serious deception, without anything to back that up with.
Okay, that's fair.
FWIW, I think it's likely that they thought about this decision for quite some time and systematically – I mean the initial announcement did mention something about a "deliberative review process by the board." But yeah, we don't get to see any of what they thought about or who (if anyone) they consulted for gathering further evidence or for verifying claims by Sutskever. Unfortunately, we don't know yet. And I concede that given the little info we have, it takes charitable priors to end up with "my view." (I put it in quotation marks because it's not like I have more than 50% confidence in it. Mostly, I want to flag that this view is still very much on the table.) Also, on the part about "imply that Sam had done some pretty serious deception, without anything to back that up with." I'm >75% that either Eliezer nailed it in this tweet, or they actually have evidence about something pretty serious but decided not to disclose it for reasons that have to do with the nature of the thing that happened. (I guess the third option is they self-deceived into thinking their reasons to fire Altman will seem serious/compelling [or at least defensible] to everyone to whom they give more info, when in fact the reasoning is more subtle/subjective/depends on additional assumptions that many others wouldn't share. This could then have become apparent to them when they had to explain their reasoning to OpenAI staff later on, and they aborted the attempt in the middle of it when they noticed it wasn't hitting well, leaving the other party confused. I don't think that would necessarily imply anything bad about the board members' character, though it is worth noting that if someone self-deceives in that way too strongly or too often, it makes for a common malefactor pattern, and obviously it wouldn't reflect well on their judgment in this specific instance. One reason I consider this hypothesis less likely than the others is because it's rare for several people – the four board members – to all make the same mistake about whether their reasoning will seem compelling to others, and for none of them to realize that it's better to err on the side of caution and instead say something like "we noticed we have strong differences in vision with Sam Altman," or something like that.)
My current model is that this is unlikely to have been planned long in-advance. For example, for unrelated reasons I was planning to have a call with Helen last week, and she proposed a meeting time of last Thursday (when I responded with my availability for Thursday early in the week, she did not respond). She did then not actually schedule the final meeting time and didn't respond to my last email, but this makes me think that at least early in the week, she did not expect to be busy on Thursday.
There are also some other people who I feel like I would expect to know about this if it had been planned who have been expressing their confusion and bafflement at what is going on on Twitter and various Slacks I am in. I think if this was planned, it was planned as a background thing, and then came to a head suddenly, with maybe 1-2 days notice, but it doesn't seem like more.
Adversarialness, honesty, attribution; re: how to talk in DC
I love what y'all said about this, found it a pleasure to read, and want to share some of my own thoughts, some echoing things you already said.
Let's not treat society as an adversary, rather let's be collaborators/allies and even leaders, helping and improving society and its truth-seeking processes. That doesn't mean we shouldn't have any private thoughts or plans. It does mean society gets to know who we are, who's behind what, and what we're generally up to and aiming for. Hiding attribution and intentions is IMO a way of playing into the adversarial/polarized/worst parts of our society's way of being and doing, and I agree with Oliver that doing so will likely come back to undermine us and what we care about. If we act like a victim/adversary wrt society, it won't work, including because society will see us that way. Let's instead meet society with the respect we want to see in the world, and ask it to step up and do the same for us. Let's pursue plans and intentions that we're happy standing in and being seen in.
I have only very limited experienced in DC type conversations, but my sense is there are ways of sharing your real thing, while being cooperative, which likely don't lead to dismissal and robustly don't lead to poisoning the well. Here's perhaps the start of one, which could be made more robust with some workshopping: 1) share your polaris (existential stakes) in a way that they could feasibly understand given where they're at; 2) share your proposals and how you see those as aligned with other near-term AI considerations like the ones they might have; 3) actually listen to and respect the opinions of the people you're talking with, and be willing to go into their frame, remembering that it's not your job to convince or persuade them. (Thinking it's your job to convince or persuade them is probably the main/upstream mistake folks make?) Those things seem to me to likely belong in nearly every conversation. Should you include your "mood" that things in fact are very dire? There's not a strategic/correct answer to this, because strategy/correctness is not what mood is for/about. Share your truth in a way that you feel serves mutual understanding. Share your feelings in a way that you feel serves mutual relating.
This seems like a bad idea.Transparency is important, but ideally, we would find ways to increase this without blowing up a bunch of trust within the community. I guess I'd question whether this is really the bottleneck in terms of transparency/public trust.
I'm worried that as a response to FTX we might end up turning this into a much more adversarial space.
They often do things of the form "leaving out info, knowing this has misleading effects"
On that, here are a few examples of Conjecture leaving out info in what I think is a misleading way.
(Context: Control AI is an advocacy group, launched and run by Conjecture folks, that is opposing RSPs. I do not want to discuss the substance of Control AI’s arguments -- nor whether RSPs are in fact good or bad, on which question I don’t have a settled view -- but rather what I see as somewhat deceptive rhetoric.)
One, Control AI’s X account features a banner image with a picture of Dario Amodei (“CEO of Anthropic, $2.8 billion raised”) saying, “There’s a one in four chance AI causes human extinction.” That is misleading. What Dario Amodei has said is, “My chance that something goes really quite catastrophically wrong on the scale of human civilisation might be somewhere between 10-25%.” I understand that it is hard to communicate uncertainty in advocacy, but I think it would at least have been more virtuous to use the middle of that range (“one in six chance”), and to refer to “global catastrophe” or something rather than “human extinction”.
Two, Control AI writes that RSPs like Anthropic’s “contain wording allowing companies to opt-out of any safety agreements if they deem that another AI company may beat them in their race to create godlike AI”. I think that, too, is misleading. The closest thing Anthropic’s RSP says is:
However, in a situation of extreme emergency, such as when a clearly bad actor (such as a rogue state) is scaling in so reckless a manner that it is likely to lead to imminent global catastrophe if not stopped (and where AI itself is helpful in such defense), we could envisage a substantial loosening of these restrictions as an emergency response. Such action would only be taken in consultation with governmental authorities, and the compelling case for it would be presented publicly to the extent possible.
Anthropic’s RSP is clearly only meant to permit labs to opt out when any other outcome very likely leads to doom, and for this to be coordinated with the government, with at least some degree of transparency. The scenario is not “DeepMind is beating us to AGI, so we can unilaterally set aside our RSP”, but more like “North Korea is beating us to AGI, so we must cooperatively set aside our RSP”.
Relatedly, Control AI writes that, with RSPs, companies “can decide freely at what point they might be falling behind – and then they alone can choose to ignore the already weak” RSPs. But part of the idea with RSPs is that they are a stepping stone to national or international policy enforced by governments. For example, ARC and Anthropic both explicitly said that they hope RSPs will be turned into standards/regulation prior to the Control AI campaign. (That seems quite plausible to me as a theory of change.) Also, Anthropic commits to only updating its RSP in consultation with its Long-Term Benefit Trust (consisting of five people without any financial interest in Anthropic) -- which may or may not work well, but seems sufficiently different from Anthropic being able to “decide freely” when to ignore its RSP that I think Control AI’s characterisation is misleading. Again, I don't want to discuss the merits of RSPs, I just think Control AI is misrepresenting Anthropic's and others' positions.
Three, Control AI seems to say that Anthropic’s advocacy for RSPs is an instance of safetywashing and regulatory capture. (Connor Leahy: “The primary aim of responsible scaling is to provide a framework which looks like something was done so that politicians can go home and say: ‘We have done something.’ But the actual policy is nothing.” And also: “The AI companies in particular and other organisations around them are trying to capture the summit, lock in a status quo of an unregulated race to disaster.”) I don’t know exactly what Anthropic’s goals are -- I would guess that its leadership is driven by a complex mixture of motivations -- but I doubt it is so clear-cut as Leahy makes it out to be.
To be clear, I think Conjecture has good intentions, and wants the whole AI thing to go well. I am rooting for its safety work and looking forward to seeing updates on CoEm. And again, I personally do not have a settled view on whether RSPs like Anthropic’s are in fact good or bad, or on whether it is good or bad to advocate for them – it could well be that RSPs turn out to be toothless, and would displace better policy – I only take issue with the rhetoric.
(Disclosure: Open Philanthropy funds the organisation I work for, though the above represents only my views, not my employer’s.)
I'm surprised to hear they're posting updates about CoEm.
At a conference held by Connor Leahy, I said that I thought it was very unlikely to work, and asked why they were interested in this research area, and he answered that they were not seriously invested in it.We didn't develop the topic and it was several months ago, so it's possible that 1- I misremember or 2- they changed their minds 3- I appeared adversarial and he didn't feel like debating CoEm. (For example, maybe he actually said that CoEm didn't look promising and this changed recently?)Still, anecdotal evidence is better than nothing, and I look forward to seeing OliviaJ compile a document to shed some light on it.
Context: I came into AI safety (professionally) by way of EA and will be in a policy role in DC next year.
In light of what is looking like a serious blunder from the OpenAI board and the backlash we’ve seen so far from important players in the AI world, I feel extremely concerned about the fact that I am affiliated with EA organizations and (at least in the moment) deeply regret this affiliation. I feel like my ability to be taken seriously will be hurt, and it will possibly also hurt/silo my career long-term.
Time will tell if any of this is true. This comment is just my emotions/fears, but this seems like an appropriate place to express this.
Adding a datapoint here: I've been involved in the Control AI campaign, which was run by Andrea Miotti (who also works at Conjecture). Before joining, I had heard some integrity/honesty concerns about Conjecture. So when I decided to join, I decided to be on the lookout for any instances of lying/deception/misleadingness/poor integrity. (Sidenote: At the time, I was also wondering whether Control AI was just essentially a vessel to do Conjecture's bidding. I have updated against this– Control AI reflects Andrea's vision. My impression is that Conjecture folks other than Andrea have basically no influence over what Control AI does, unless they convince Andrea to do something.)
I've been impressed by Andrea's integrity and honesty. I was worried that the campaign might have some sort of "how do we win, even if it misleads people" vibe (in which case I would've protested or left), but there was constantly a strong sense of "are we saying things that are true? Are we saying things that we actually believe? Are we communicating clearly?" I was especially impressed given the high volume of content (it is especially hard to avoid saying untrue/misleading things when you are putting out a lot of content at a fast pace.)
In contrast, integrity/honesty/openness norms feel much less strong in DC. When I was in DC, I think it was fairly common to see people "withhold information for strategic purposes", "present a misleading frame (intentionally)", "focus on saying things you think the other person will want to hear", or "decide not to talk at all because sharing beliefs in general could be bad." It's plausible to me that these are "the highest EV move" in some cases, but if we're focusing on honesty/integrity/openness, I think DC scored much worse. (See also Olivia's missing mood point).
The Bay Area scores well on honesty/integrity IMO, but has its own problems, especially with groupthink/conformity/curiosity-killing. I think the Bay Area tends to do well on honesty/integrity norms (relative to other spaces), but I think these norms are enforced in a way that comes with important tradeoffs. For instance, I think the Bay Area tends to punish people for saying things that are imprecise or "seem dumb", which leads to a lot of groupthink/conformity and a lot of "people just withholding their beliefs so that they don't accidentally say something incorrect and get judged for it." Also, I think high-status people in the Bay Area are often able to "get away with" low openness/transparency/clarity. There are lots of cases where people are like "I believe X because Paul believes X" and then when asked "why does Paul believe X" they're like "idk". This seems like an axis separate from honesty/integrity, but it still leads to pretty icky epistemic discourse.
(This isn't to say that people shouldn't criticize Conjecture– but I think there's a sad thing that happens where it starts to feel like both "sides" are just trying to criticize each other. My current position is much closer to something like "each of these communities has some relative strengths and weaknesses, and each of them has at least 1-3 critical flaws". Whereas in the status quo I think these discussions sometimes end up feeling like members of tribe A calling tribe B low integrity and then tribe B firing back by saying Tribe A is actually low integrity in an even worse way.)
Probably somewhat controversially, but I've been kind of happy about the Politico pieces that have been published. We had two that basically tried to make the case there is an EA conspiracy in DC that has lots of power in a kind of unaccountable way.Maybe someone could reach out to the author and be like "Ok, yeah, we are kind of a bit conspiratorial, sorry about that. But I think let's try to come clean, I will tell you all the stuff that I know, and you take seriously the hypothesis that we really aren't doing this to profit off of AI, but because we are genuinely concerned about catastrophic risks from AI".
Maybe someone could reach out to the author and be like "Ok, yeah, we are kind of a bit conspiratorial, sorry about that. But I think let's try to come clean, I will tell you all the stuff that I know, and you take seriously the hypothesis that we really aren't doing this to profit off of AI, but because we are genuinely concerned about catastrophic risks from AI".
I like that you imagine conversations like that in your head and that they sometimes go well there!
Seems important to select the right journalist if someone were to try this. I feel like the journalist would have to be sympathetic already or at least be a very reasonable and fair-minded person. Unfortunately, some journalists cannot think straight for the life of them and only make jumpy, shallow associations like "seeks influence, so surely this person must be selfish and greedy."I didn't read the Politico article yet, but given that "altruism" is literally in the name with "EA," I wonder why it needs to be said "and you take seriously the hypothesis that we really aren't doing this to profit off of AI." If a journalist is worth his or her salt, and they write about a movement called EA, shouldn't a bunch of their attention go into the question of why/whether some of these people might be genuine? And if the article takes a different spin and they never even consider that question, doesn't it suggest something is off? (Again, haven't read the article yet – maybe they do consider it or at least leave it open.)
Here are some of my current principles around this issue.
1) It's fine to contribute to work on policies that you think are good-to-neutral that aren't your core motivation, just so you can get involved with the thing that is your core motivation.
"I am happy to be here helping the Senator achieve an agreement with foreign country Fiction-land on export taxes. I am not here because I personally care a great deal about export taxes (they seem fine to me), but because I want to build a personally strong relationship with representatives of Fiction-land, and I'm happy to help this appears-good-to-me agreement get made to do so.
2) It's not okay to contribute to work on policies that you think are harmful for the world, just so you can get involved with the thing that is your core motivation.
"I am here helping coordinate a ban on building houses in San Francisco which I expect will contribute to homelessness and substantially damage the economic growth of the city, because I would like to build a relationship with the city governance."
3) It's never okay to misrepresent what you believe.
"I am here primarily because I personally care about this policy a great deal" vs "I am here because the policy seems reasonable to me and my good ally John Smith has asked me to help him get it enacted, who I believe honestly cares about it and thinks it will make people's lives better."
4) I think it's okay to spend time talking about things that aren't your favorite thing.
"Let's talk through how we could get this policy passed that is like my 20th favorite thing" or "Let's spend a few hours figuring out how to achieve this goal that the head of our office wants even though I am not personally invested in it."
5) But you should answer honestly about your intentions.
"I'm here because I'd like to prevent us all going extinct from AI, but while I'm building up a career and reputation, in the meantime I'm happy to improve the government's understanding of what's even happening, or improve its communications with companies, or join in on what we're all working on here."
I would have guessed that this is just not the level of trust people operate at. like for most things in policy people don't really act like their opposition is in good faith so there's not much to lose here. (weakly held)
A claim I've heard habryka make before (I don't know myself) is that there are actual rules to the kind of vague-deception that goes on in DC. And something like, while it's a known thing that a politician will say "we're doing policy X" when they don't end up doing policy X, if you misrepresent who you're affiliated with, this is an actual norm violation. (i.e. it's lying about the Simulacrum 3 level, which is the primary level in DC)
My guess is we maybe could have also done that at least a year earlier, and honestly I think given the traction we had in 2015 on a lot of this stuff, with Bill Gates and Elon Musk and Demis, I think there is a decent chance we could have also done a lot of Overton window shifting back then, and us not having done so is I think downstream of a strategy that wanted to maintain lots of social capital with the AI capability companies and random people in governments who would be weirded out by people saying things outside of the Overton window.Though again, this is just one story, and I also have other stories where it all depended on Chat-GPT and GPT-4 and before then you would have been laughed out of the room if you had brought up any of this stuff (though I do really think the 2015 Superintelligence stuff is decent evidence against that). It's also plausible to me that you need a balance of inside and outside game stuff, and that we've struck a decent balance, and that yeah, having inside and outside game means there will be conflict between the different people involved in the different games, but it's ultimately the right call in the end.
I really want an analysis of this. The alignment and rationality communities were wrong about how tractable getting public & governmental buy-in to AI x-risk would be. But what exactly was the failure? That seems quite important to knowing how to alter decision making and to prevent future failures to grab low-hanging fruit.
I tried writing a fault analysis myself, but I couldn't make much progress and it seems like you more detailed models than I do. So someone other than me is probably the right person for this.
That said, the dialogues on AI governance and outreach are providing some of what I'm looking for here, and seem useful to anyone who does want to write an analysis. So thank you to everyone who's discussing these topics in public.
AI has immense potential, but also immense risks. AI might be misused by China, or get of control. We should balance the needs for innovation and safety." I wouldn't call this lying (though I agree it can have misleading effects, see Issue 1).
Not sure where this slots in, but there's also a sense in which this contains a missing positive mood about how unbelievably good (aligned) AI could or will be, and how much we're losing by not having it earlier.
On the meta level, for unfalsifiable claims (and even falsifiable claims that would take more effort to verify then a normal independent 3rd party adult could spend in say a month) it doesn't really seem to matter whether the person pushing the claims has integrity, beyond a very low bar?
They just need to have enough integrity to pass the threshold of not being some maniac/murderer/ persistent troll/etc...
But otherwise, beyond that threshold, there doesn't seem to be a much of a downside in treating folks politely while always assuming there's some hidden motivations going on behind the scenes.
And for the falsifiable claims that have reasonable prospects for independent 3rd party verification, assigning credibility, trust, integrity, etc., based on the author's track record of such claims proving to be true is more than sufficient for discussions. And without regard for what hidden motivations might there be.
Maybe this is not sufficient for the community organization/emotional support/etc. side of things, though you'd be the better judge of that.