In this competition, we (Ought) want to amplify Rohin Shah’s forecast for the question: When will a majority of AGI researchers agree with safety concerns? Rohin has provided a prior distribution based on what he currently believes, and we want others to:
- Try to update Rohin’s thinking via comments (for example, comments including reasoning, distributions, and information sources). If you don’t want your comment to be considered for the competition, label it ‘aside’
- Predict what his posterior distribution for the question will be after he has read all the comments and reasoning in this thread
The competition will close on Friday July 31st. To participate in this competition, create your prediction on Elicit, click ‘Save Snapshot to URL,’ and post the snapshot link in a comment on this post. You can provide your reasoning in the ‘Notes’ section of Elicit or in your LessWrong comment. You should have a low bar for making predictions – they don’t have to be perfect.
Here is Rohin’s prior distribution on the question. His reasoning for the prior is in this comment. Rohin spent ~30 minutes creating this distribution based on the beliefs and evidence he already has. He will spend 2-5 hours generating a posterior distribution.
We will award two $200 prizes, in the form of Amazon gift cards:
- Most accurate prediction: We will award $200 to the most accurate prediction of Rohin’s posterior distribution submitted through an Elicit snapshot. This will be determined by estimating KL divergence between Rohin’s final distribution and others’ distributions. If you post more than one snapshot, either your most recent snapshot or the one you identify as your final submission will be evaluated.
- Update to thinking: Rohin will rank each comment from 0 to 5 depending on how much the reasoning updated his thinking. We will randomly select one comment in proportion to how many points are assigned (so, a comment rated 5 would be 5 times more likely to receive the prize than a comment rated 1), and the poster of this comment will receive the $200 prize.
This project is similar in spirit to amplifying epistemic spot checks and other work on scaling up individual judgment through crowdsourcing. As in these projects, we’re hoping to learn about mechanisms for delegating reasoning, this time in the forecasting domain.
The objective is to learn whether mechanisms like this could save people like Rohin work. Rohin wants to know: What would I think if I had more evidence and knew more arguments than I currently do, but still followed the sorts of reasoning principles that I'm unlikely to revise in the course of a comment thread? In real-life applications of amplified forecasting, Rohin would only evaluate the arguments in-depth and form his own posterior distribution 1 out of 10 times. 9 out of 10 times he’d just skim the key arguments and adopt the predicted posterior as his new view.
The question is: When will a majority of AGI researchers agree with safety concerns?
Suppose that every year I (Rohin) talk to every top AI researcher about safety (I'm not explaining safety, I'm simply getting their beliefs, perhaps guiding the conversation to the safety concerns in the alignment community). After talking to X, I evaluate:
- (Yes / No) Is X's work related to AGI? (AGI safety counts)
- (Yes / No) Does X broadly understand the main concerns of the safety community?
- (Yes / No) Does X agree that there is at least one concern such that we have not yet solved it and we should not build superintelligent AGI until we do solve it?
I then compute the fraction #answers(Yes, Yes, Yes) / #answers(Yes, *, *) (i.e. the proportion of AGI-related top researchers who are aware of safety concerns and think we shouldn't build superintelligent AGI before solving them). In how many years will this fraction be >= 0.5?
For reference, if I were to run this evaluation now, I would be looking for an understanding of reward gaming, instrumental convergence, and the challenges of value learning, but would not be looking for an understanding of wireheading (because I'm not convinced it's a problem we need to worry about) or inner alignment (because the safety community hasn't converged on the importance of inner alignment).
We'll define the set of top AI researchers somewhat arbitrarily as the top 1000 AI researchers in industry by salary and the top 1000 AI researchers in academia by citation count.
If the fraction never reaches > 0.5 (e.g. before the fraction reaches 0.5, we build superintelligent AGI and it kills us all, or it is perfectly benevolent and everyone realizes there weren't any safety concerns), the question resolves as >2100.
Interpret this reasonably (e.g. a comment to the effect of "your survey will annoy everyone and so they'll be against safety" will be ignored even if true, because it's overfitting to the specific counterfactual survey proposed here and is clearly irrelevant to the spirit of the question).
Rohin Shah is an AI Safety researcher at the Center for Human-Compatible AI (CHAI). He also publishes the Alignment Newsletter. Here is a link to his website where you can find more information about his research and views.
You are welcome to share a snapshot distribution of your own beliefs, but make sure to specify that the snapshot contains your own beliefs and not your prediction of Rohin’s beliefs (snapshots of your own beliefs will not be evaluated for the competition).