Glad to hear that!
I do feel excited about this being used as a sort of "201 level" overview of AI strategy and what work it might be useful to do. And I'm aware of the report being included in the reading lists / curricula for two training programs for people getting into AI governance or related work, which was gratifying.
Unfortunately we did this survey before ChatGPT and various other events since then, which have majorly changed the landscape of AI governance work to be done, e.g. opening various policy windows. So I imagine people reading this report today may feel it has some odd omissions / vibes. But I still think it serves as a good 201 level overview despite that. Perhaps we'll run a followup in a year or two to provide an updated version.
I'd consider those to be "in-scope" for the database, so the database would include any such estimates that I was aware of and that weren't too private to share in the database.
If I recall correctly, some estimates in the database are decently related to that, e.g. are framed as "What % of the total possible moral value of the future will be realized?" or "What % of the total possible moral value of the future is lost in expectation due to AI risk?"
But I haven't seen many estimates of that type, and I don't remember seeing any that were explicitly framed as "What fraction of the accessible universe's resources will be used in a way optimized for 'the correct moral theory'?"
If you know of some, feel free to comment in the database to suggest they be added :)
...and while I hopefully have your attention: My team is currently hiring for a Research Manager! If you might be interested in managing one or more researchers working on a diverse set of issues relevant to mitigating extreme risks from the development and deployment of AI, please check out the job ad!
The application form should take <2 hours. The deadline is the end of the day on March 21. The role is remote and we're able to hire in most countries.
People with a wide range of backgrounds could turn out to be the best fit for the role. As such, if you're interested, please don't rule yourself out due to thinking you're not qualified unless you at least read the job ad first!
I found this thread interesting and useful, but I feel a key point has been omitted thus far (from what I've read):
We've definitely done a significant amount of this kind of work, but I think we've often (a) deliberately held back on doing so or on conveying key parts of the arguments, due to reasonable downside risk concerns, and (b) not prioritized this. And I think there's significantly more we could do if we wanted to, especially after a period of actively building capacity for this.
Important caveats / wet blankets:
All I really want to convey in this comment is what I said in my first paragraph: we may be able to significantly push beliefs and opinions in favorable directions relative to where they are now or would be n future by default.
Due to time constraints, I'll just point to this vague overview.
Personally I haven't thought about how strong the analogy to GoF is, but another thing that feels worth noting is that there may be a bunch of other cases where the analogy is similarly strong and where major government efforts aimed at risk-reduction have occurred. And my rough sense is that that's indeed the case, e.g. some of the examples here.
In general, at least for important questions worth spending time on, it seems very weird to say "You think X will happen, but we should be very confident it won't because in analogous case Y it didn't", without also either (a) checking for other analogous cases or other lines of argument or (b) providing an argument for why this one case is far more relevant evidence than any other available evidence. I do think it totally makes sense to flag the analogous case and to update in light of it, but stopping there and walking away feeling confident in the answer seems very weird.
I haven't read any of the relevant threads in detail, so perhaps the arguments made are stronger than I imply here, but my guess is they weren't. And it seems to me that it's unfortunately decently common for AI risk discussions on LessWrong to involve this pattern I'm sketching here.
(To be clear, all I'm arguing here is that these arguments often seem weak, not that their conclusions are false.)
(This comment is raising an additional point to Jan's, not disagreeing.)
Update: Oh, I just saw Steve Byrnes also the following in this thread, which I totally agree with:
"[Maybe one could argue] “It’s all very random—who happens to be in what position of power and when, etc.—and GoF is just one example, so we shouldn’t generalize too far from it” (OK maybe, but if so, then can we pile up more examples into a reference class to get a base rate or something? and what are the interventions to improve the odds, and can we also try those same interventions on GoF?)"
(Disclaimer: I only skimmed this post, having landed here from Habryka's comment on It could be useful if someone ran a copyediting service. Apologies if these questions are answered already in the post.)
Thanks for this post! This seems like good advice to me.
I made an Anki card on your three "principles that stand out" so I can retain those ideas. (Mainly for potentially suggesting to people I manage or other people I know - I think I already have roughly the sort of mindset this post encourages, but I think many people don't and that me suggesting these techniques sometimes could be helpful.)
It's not sufficient to argue that taking over the world will improve prediction accuracy. You also need to argue that during the training process (in which taking over the world wasn't possible), the agent acquired a set of motivations and skills which will later lead it to take over the world. And I think that depends a lot on the training process.[...] if during training the agent is asked questions about the internet, but has no ability to edit the internet, then maybe it will have the goal of "predicting the world", but maybe it will have the goal of "understanding the world". The former incentivises control, the latter doesn't.
It's not sufficient to argue that taking over the world will improve prediction accuracy. You also need to argue that during the training process (in which taking over the world wasn't possible), the agent acquired a set of motivations and skills which will later lead it to take over the world. And I think that depends a lot on the training process.
[...] if during training the agent is asked questions about the internet, but has no ability to edit the internet, then maybe it will have the goal of "predicting the world", but maybe it will have the goal of "understanding the world". The former incentivises control, the latter doesn't.
I agree with your key claim that it's not obvious/guaranteed that an AI system that has faced some selection pressure in favour of predicting/understanding the world accurately would then want to take over the world. I also think I agree that a goal of "understanding the world" is a somewhat less dangerous goal in this context than a goal of "predicting the world". But it seems to me that a goal of "understanding the world" could still be dangerous for basically the same reason as why "predicting the world" could be dangerous. Namely, some world states are easier to understand than others, and some trajectories of the world are easier to maintain an accurate understanding of than others.
E.g., let's assume that the "understanding" is meant to be at a similar level of analysis to that which humans typically use (rather than e.g., being primarily focused at the level of quantum physics), and that (as in humans) the AI sees it as worse to have a faulty understanding of "the important bits" than "the rest". Given that, I think:
I'd be interested in whether you think I'm misinterpreting your statement or missing some important argument.
(Though, again, I see this just as pushback against one particular argument of yours, and I think one could make a bunch of other arguments for the key claim that was in question.)
Thanks for this series! I found it very useful and clear, and am very likely to recommend it to various people.
Minor comment: I think "latter" and "former" are the wrong way around in the following passage?
By contrast, I think the AI takeover scenarios that this report focuses on have received much more scrutiny - but still, as discussed previously, have big question marks surrounding some of the key premises. However, it’s important to distinguish the question of how likely it is that the second species argument is correct, from the question of how seriously we should take it. Often people with very different perspectives on the latter actually don’t disagree very much on the former.
(I.e., I think you probably mean that, of people who've thought seriously about the question, probability estimates vary wildly but (a) tend to be above (say) 1 percentage point of x-risk from a second species risk scenario and (b) thus tend to suffice to make the people think humanity should put a lot more resources into understanding and mitigating the risk than we currently do. Rather than that people tend to wildly disagree on how much effort to put into this risk yet agree on how likely the risk is. Though I'm unsure, since I'm just guessing from context that "how seriously we should take it" means "how much resources should be spent on this issue", but in other contexts it'd mean "how likely is this to be correct" or "how big a deal is this", which people obviously disagree on a lot.)
FWIW, I feel that this entry doesn't capture all/most of how I see "meta-level" used.
Here's my attempted description, which I wrote for another purpose. Feel free to draw on it here and/or to suggest ways it could be improved.