I think that the AI safety community in general (including myself) was too pessimistic about OpenAI's strategy of gradually releasing models (COI: I work at OpenAI), and should update more on that mistake.
I agree with this!I thought it was obviously dumb, and in retrospect, I don't know.
I think that "doomers" were far too pessimistic about governance before ChatGPT (in ways that I and others predicted beforehand, e.g. in discussions with Ben and Eliezer). I think they should update harder from this mistake than they're currently doing (e.g. updating that they're too biased towards inside-view models and/or fast takeoff and/or high P(doom)).
I think it remains to be seen what the right level of pessimism was. It still seems pretty likely that we'll see not just useless, but actively catastrophically counter-productive interventions from governments in the next handful of years.But you're absolutely right that I was generally pessimistic about policy interventions from 2018ish through to 2021 or so. My main objection was that I wasn't aware of any policies that seemed like they helped and I was unenthusiastic about the way that EAs seemed to be optimistic about getting into positions of power without (seeming to me) to be very clear-to-themselves that they didn't have policy ideas to implement. I felt better about people going into policy to the extent that those people had clarity for themselves, "I don't know what to recommend if I have power. I'm trying to execute one part of a two part plan that involves getting power and then using that to advocate for x-risk mitigating policies. I'm intentionally punting that question to my future self / hoping that other EAs thinking full time about this come up with good ideas." I think I still basically stand by this take. My main update is it turns out that the basic idea of this post was false. There were developments that were more alarming than "this is business as usual" to a good number of people and that really changed the landscape. One procedural update that I've made from that and similar mistakes is just "I shouldn't put as much trust in Eliezer's rhetoric about how the world works, when it isn't backed up by clearly articulated models. I should treat those ideas a plausible hypotheses, and mostly be much more attentive to evidence that I can see directly."
Also, I think that this is one instance of the general EA failure mode of pursuing a plan which entails accruing more resources for EA (community building to bring in more people, marketing to bring in more money, politics to acquire power), without a clear personal inside view of what to do with those resources, effectively putting a ton of trust in the EA network to reach correct conclusions about which things help.There are a bunch of people trusting the EA machine to 1) aim for good things and 2) have good epistemics. They trust it so much they'll go campaign for a guy running for political office without knowing much about him, except that he's an EA. Or they route their plan for positive impact on the world through positively impacting EA itself ("I want to do mental health coaching for EAs" or "I want to build tools for EAs" or going to do ops for this AI org, which 80k recommended (despite not knowing much about what they do).)This is pretty scary, because it seems like a some of those people were not worthy of trust (SBF in particular, won a huge amount of veneration). And even in the cases the people who are, I believe, earnest geniuses, it is still pretty dangerous to mostly be deferring to them. Paul put a good deal of thought into the impacts of developing RLHF, and he thinks the overall impacts are positive. But that Paul is smart and good does not make it a foregone conclusion that his work is good not net. That's a really hard question to answer, about which I think most people should be pretty circumspect. It seems to me that there is an army of earnest young people who want to do the most good that they can. They've been told (and believe) the AI risk is the most important problem, but it's a confusing problem depending on technical expertise, famously fraught problems of forecasting the character of not-yet-existent technologies, and a bunch of weird philosophy. The vast majority of those young people don't know how to make progress on the core problems of AI risk directly, or even necessarily identify which work is making progress. But they still want to help, so they commit themselves to eg community building, getting more people to join, everyone taking social cues from the few people that seem to have personal traction on the problem about what kinds of object level things are good to do. This seems concerning to me. This kind of structure where a bunch of smart young people are building a pile of resources to be controlled mostly by deference to a status hierarchy, where you figure out which thinkers are cool by picking up on the social cues of who is regarded as cool, rather than evaluating their work for yourself...well, it's not so much that I expect it to be coopted, but I just don't expect that overall agglomerated machine to be particularly steered towards the good, whatever values it professes. It doesn't have a structure that binds it particularly tightly to what's true. Better than most non-profit communities, worse than many for-profit companies, probably.It seems more concerning to the extent that many of the object level actions to which the EAs are funneling resources are not just useless, but actively bad. It turns out that being smart enough, as a community, to identify the most important problem in the world, but not smart enough to systematically know how to positively impact that problem is pretty dangerous.eg the core impacts of people trying to impact x-risk so far includ
- (Maybe? Partially?) causing Deepmind to exist
- (Definitely) causing OpenAI to exist
- (Definitely) causing Anthropic to exist
- Inventing RLHF and accelerating the development of RLHF'd language models
It's pretty unclear to me what the sign of these interventions are. They seem bad on the face of it, but as I've watched things develop I'm not as sure. It depends on pretty complicated questions about second and third order effects, and counterfactuals.But it seems bad to have an army of earnest young people who, in the name of their do-gooding ideology, shovel resources at the decentralized machine doing these maybe good maybe bad activities, because they're picking up on social cues of who to defer to and what those people think! That doesn't seem very high EV for the world!(To be clear, I was one of the army of earnest young people. I spent a number of years helping recruit for a secret research program—I didn't even have the most basic information, much less the expertise to assess if it was any good—because I was taking my cues from Anna, who was taking her cues from Eliezer. I did that out of a combination of 1) having read Eliezer's philosophy, and having enough philosophical grounding to be really impressed by it, and 2) being ready and willing to buy into a heroic narrative to save the world, which these people were (earnestly) offering me.)And, procedurally, all this is made somewhat more perverse, by the fact that that this community, this movement, was branded as the "carefully think through our do gooding" movement. We raised the flag of "let's do careful research and cost benefit analysis to guide our charity", but over time this collapsed into a deferral network, with ideas about what's good to do driven mostly by the status hierarchy. Cruel irony.
If you go with journalists, I'd want to find one who seems really truth-seeking.
I think it would be a very valuable public service to the community to have someone who’s job it is to read a journalist’s corpus and check if it seems fair and honest.I think we could, as a community, have a policy of only talking with journalists who are honest. This seems like a good move pragmatically, because it means coverage of our stuff will be better on average, and it also universalizes really well, so long as “honest” doesn’t morph into “agrees with us about what’s important.”It seems good and cooperative to disproportionately help high-integrity journalists get sources, and it helps us directly.
And then one of my current stories is that at some point, mostly after FTX when people were fed up with listening to some vague EA conservative consensus, a bunch of people started ignoring that advice and finally started saying things publicly (like the FLI letter, Eliezer's time piece, the CAIS letter, Ian Hogarth's piece). And then that's the thing that's actually been moving things in the policy space.
My impression is that this was driven by developments in AI, which created enough of a sense that other people might predict that other people would take concern seriously, because they could all just see ChatGPT. And this emboldened people. They had more of a sense of tractability.And Eliezer, in particular, went on a podcast, and it went better than he anticipated, so he decided to do more outreach.My impression is that this is basically 0 to do with FTX?
and things like "being gay" where society seems kind of transparently unreasonable about it,
Importantly "being gay" is classed for me as "a person's personal business", sort of irrespective of whether society is reasonable about it or not. I'm inclined to give people leeway to keep private personal info that doesn't have much impact on other people.
So, I'm often tempted to mention my x risk motivations only briefly, then focus on whatever's inferentially closest and still true. (Classically, this would be "misuse risks, especially from foreign adversaries and terrorists" and "bioweapon and cyberoffensive capabilities coming in the next few years".)
One heuristic that I'm tempted to adopt and recommend is the onion test: your communications don't have to emphasize your weird beliefs, but you want your communications to satisfy the criterion that if your interlocutor became aware of everything you think, they would not be surprised.This means that I'll when I'm talking with a potential ally, I'll often mostly focus on places where we agree, while also being intentional about flagging that I have disagreements that they could double click on if they wanted.I'm curious if your sense, Olivia, is that your communications (including the brief communication of x risk) passes the onion test. And if not, I'm curious what's hard about meeting that standard. Is this a heuristic that can be made viable in the contexts of eg DC?
I feel much less ok about ppl dissing OAI on their own blog posts on LW. I assume that if they knew ahead of time, they would have been much less likely to participate.
I find it hard to parse this second sentence. If who knew what ahead of time, they would be less likely to participate?
Most of these seem legitimate to me, modulo that instead of banning the thing you should pay for the externality you're imposing. Namely, climate change, harming wildlife, spreading contagious diseases, and risks to children's lives.Those are real externalities, either on private individuals or on whole communities (by damaging public goods). It seems completely legitimate to pay for those externalities.The only ones that I don't buy are the religious ones, which are importantly different because they entail not merely an external cost, but a disagreement about actual cause and effect. "I agree that my trash hurts the wildlife, but I don't want to stop littering or pay to have the litter picked up" is structurally different than "God doesn't exist, and I deny the claim that my having gay sex increases risk of smiting" or "Anthropogenic climate change is fake, and I deny the claim that my pollution contributes to warming temperatures."Which is fine. Libertarianism depends on having some shared view of reality, or at least some shared social accounting about cause and effect and which actions have which externalities, in order to work.
If there are disagreements, you need courts to rule on them, and for the rulings of the courts to be well regarded (even when people disagree with the outcome of any particular case).
we're in pretty big trouble because we'll struggle to convince others that any good alignment proposals humans come up with are worth implementing.
Proposals generated by humans might contain honest mistakes, but they're not very likely to be adversarially selected to look secure while actually not being secure. We're implicitly relying on the alignment of the human in our evaluation of human-generated alignment proposals. Even if we couldn't tell the difference between the proposals that are safe.
There's a problem of inferring the causes of sensory experience in cognition-that-does-science. (Which, in fact, also appears in the way that humans do math, and is possibly inextricable from math in general; but this is an example of the sort of deep model that says "Whoops I guess you get science from math after all", not a thing that makes science less dangerous because it's more like just math.)
To flesh this out: We train a model up to superintelligence on some theorems to prove. There's a question that it might have which is "where are these theorems coming from? Why these ones?" and when it is a superintelligence, that won't be idle questioning. There will be lots of hints from the distribution of theorems that point to the process that selected / generated them. And it can back out from that process that it is an AI in training, on planet earth? (The way it might deduce General Relativity from the statics of a bent blade of grass.)Is that the basic idea?