So your question is whether (with added newline and capitalization for clarity):
any dissenting views from "AI in median >30 years" and "utter AI ruin <10%" (as expressed in the correct directions of shorter timelines and worse ruin chances; and as said before the ChatGPT moment), were permitted to exercise decision-making power over the flow of substantial amounts of funding;
OR if the weight of reputation and publicity of OpenPhil was at any point put behind promoting those dissenting viewpoints
Re the first part:
Open Phil decisions were strongly affected by whether they were good according to worldviews where "utter AI ruin" is >10% or timelines are <30 years. Many staff believed at the time that worlds with shorter timelines and higher misalignment risk were more tractable to intervene on, and so put additional focus on interventions targeting those worlds; many also believed that risk was >10% and that median timeline was <30 years. I'm not really sure how to operationalize this, but my sense is that the majority of their funding related to AI safety was targeted at scenarios with higher misalignment risk and shorter timelines than 10%/30 years.
As an example, see Some Background on Our Views Regarding Advanced Artificial Intelligence (2016), where Holden says that his belief that P(AGI before 2036) is above 10% "is important to my stance on the importance of potential risks from advanced artificial intelligence. If I did not hold it, this cause would probably still be a focus area of the Open Philanthropy Project, but holding this view is important to prioritize the cause as highly as we’re planning to." So he's clearly saying that the grantmaking strategy is strongly affected by wanting to target the sub-20-year timelines.
I'm not sure how to translate this into the language you use. Among other issues, it's a little weird to talk about the relative influence of different credences over hypotheses, rather than the relative influence of different hypotheses. The "AI risk is >10% and <30 years" hypotheses had a lot of influence, but that could be true even if all the relevant staff had believed that AI risk is <10% and >30 years (if they'd also believed that those worlds were particularly leveraged to intervene on, as they do).
Lots of decisions were made that would not have been made given the decision procedure of "do whatever's best assuming AI is in >30 years and risk is <10%"—I think that that decision procedure would have massively changed the AI safety stuff Open Phil did.
I think that this suffices to contradict your description of the situation—they explicitly made many of their decisions based on the possibility of shorter timelines than you described. I haven't presented evidence here that something similar is true for their assessment of misalignment risk, but I also believe that to be the case.
If I persuaded you of the claims I wrote here (only some of which I backed up with evidence), would that be relevant to your overall stance?
All of this is made more complicated by the fact that Open Phil obviously is and was a large organization with many staff and other stakeholders, who believed different things and had different approaches to translating beliefs into decisions, and who have changed over time. So we can't really talk about what "Open Phil believed" coherently.
Re the second part: I think the weight of reputation and publicity was put behind encouraging people to plan for the possibility of AI sooner than 30 years, as I noted above; this doesn't contradict the statement you've made but IMO it is relevant to your broader point.
Section 2.2 in "Some Background..." looks IMO pretty prescient
The technical advisors I have spoken with the most on this topic are close friends I’ve met through GiveWell and effective altruism: Dario Amodei, Chris Olah and Jacob Steinhardt. They are all relatively junior (as opposed to late-career) researchers; they do not constitute a representative sample of researchers; there are therefore risks in leaning too heavily on their thinking.[...]
There may turn out to be a few broadly applicable AI approaches that lead to rapid progress on an extremely wide variety of intellectual tasks. This intuition seems correlated with (though again, not the same as) an intuition that the human brain makes repeated use of a relatively small set of underlying algorithms, and that by applying the processes, with small modifications, in a variety of contexts, it generates a wide variety of different predictive models, which can end up looking like very different intellectual functions.
[..]Certain areas of AI and machine learning, particularly related to deep neural networks and other deep learning methods, have recently experienced rapid and impressive progress.
[...]Deep learning is a general approach to fitting predictive models to data that can lead to automated generation of extremely complex non-linear models. It seems to be, conceptually, a relatively simple and cross-domain approach to generating such models (though it requires complex computations and generates complex models, and hardware improvements of past decades have been a key factor in being able to employ it effectively). My impression is that the field is still very far away from exploring all the ways in which deep learning might be applied to challenges in AI.
[...]In my view, there is a live possibility that with further exploration of the implications and applications of deep learning – and perhaps a small number (1-3) of future breakthroughs comparable in scope and generality to deep learning – researchers will end up being able to achieve better-than-human performance in a large number of intellectual domains, sufficient to produce transformative AI.
[...]
But broadly speaking, based on these conversations, it seems to me that:
- It is easy to imagine (though far from certain) that headway on a relatively small number of core problems could lead to AI systems equalling or surpassing human performance in a large number of domains.
- The total number of core open problems is not clearly particularly large (though it is highly possible that there are many core problems that the participants simply haven’t thought of).
- Many of the identified core open problems may turn out to have overlapping solutions. Many may turn out to be solved by continued extension and improvement of deep learning methods.
- None appear that they will clearly require large numbers of major breakthroughs, large (decade-scale) amounts of trial and error, or further progress on directly studying the human brain. There are examples of outstanding technical problems, such as unsupervised learning, that could turn out to be very difficult, leading to a dramatic slowdown in progress in the near future, but it isn’t clear that we should confidently expect such a slowdown.
If you imagine the very serious person wearing the expensive suit saying, "But of course we must prepare for cases where the ship sinks sooner and there is a possibility of some passengers drowning", whether or not this is Very Exculpatory depends on the counterfactual for what happens if the guy is not there. I think OpenPhil imagines that if they are not there, even fewer people take MIRI seriously. To me this is not clear and it looks like the only thing that broke the logjam was ChatGPT, after which the weight and momentum of OpenPhil views was strictly net negative.
One issue among others is that the kind of work you end up funding when the funding bureaucrats go to the funding-seekers and say, "Well, we mostly think this is many years out and won't kill everyone, but, you know, just in case, we thought we'd fund you to write papers about it" tends to be papers that make net negative contributions.
At many points now, I've been asked in private for a critique of EA / EA's history / EA's impact and I have ad-libbed statements that I feel guilty about because they have not been subjected to EA critique and refutation. I need to write up my take and let you all try to shoot it down.
Before I can or should try to write up that take, I need to fact-check one of my take-central beliefs about how the last couple of decades have gone down. My belief is that the Open Philanthropy Project, EA generally, and Oxford EA particularly, had bad AI timelines and bad ASI ruin conditional probabilities; and that these invalidly arrived-at beliefs were in control of funding, and were explicitly publicly promoted at the expense of saner beliefs.
An exemplar of OpenPhil / Oxford EA reasoning about timelines is that, as late as 2020, their position on timelines seemed to center on Ajeya Cotra's "Biological Timelines" estimate which put median timelines to AGI at 30 years later. Leadership dissent from this viewpoint, as I recall, generally centered on having longer rather than shorter median timelines.
An exemplar of poor positioning on AI ruin is Joe Carlsmith's "Is Power-Seeking AI an Existential Risk?" which enacted a blatant Multiple Stage Fallacy in order to conclude this risk was ~5%.
I recall being told verbally in person by OpenPhil personnel that Cotra and Carlsmith were representative of the OpenPhil view and would be the sort of worldview that controlled MIRI's chances of getting funding from OpenPhil, i.e., we should expect funding decisions to be premised on roughly these views and try to address ourselves to those premises if we wanted funding.
In recent personal conversations in which I exposited my current fault analysis of EA, I've heard people object, "But this wasn't an official OpenPhil view! Why, some people inside OpenPhil discussed different views!" I think they are failing to appreciate the extent to which mere tolerance of dissenting discussion is not central, in an organizational-psychology analysis of what a large faction actually does. But also, EAs have consistently reacted with surprised dismay when I presented my view that these bad beliefs were in effective control. They may have better information than I did; I was an outsider and did not much engage with what I estimated to then be a lost cause. I want to know the true facts of OpenPhil's organizational history whatever they may be.
I therefore throw open to EAs / OpenPhil personnel / the Oxford EAs, the question of whether they have strong or weak evidence that any dissenting views from "AI in median >30 years" and "utter AI ruin <10%" (as expressed in the correct directions of shorter timelines and worse ruin chances; and as said before the ChatGPT moment), were permitted to exercise decision-making power over the flow of substantial amounts of funding; or if the weight of reputation and publicity of OpenPhil was at any point put behind promoting those dissenting viewpoints (in the correct direction, before the ChatGPT moment).
This to me is the crux in whether the takes I have been giving in private were fair to OpenPhil. Tolerance of verbal discussion of dissenting views inside OpenPhil is not a crux. EA forum posts are not a crux even if the bylines include mid-level OpenPhil employees.
Public statements saying "But I do concede 10% AGI probability by 2036", or "conditional on ASI at all, I do assign substantial probability to this broader class of outcomes that includes having a lot of human uploads around and biological humans thereby being sidelined", is not something I see as exculpatory; it is rather a clear instance of what I see as a larger problem for EA and a primary way it did damage.
(Eg, imagine that your steamship is sinking after hitting an iceberg, and you are yelling for all passengers to get to the lifeboats. As it seems like a few passengers might be starting to pay some little attention, somebody wearing a much more expensive and serious-looking suit than you can afford, stands up and begins declaiming about how their own expert analysis does suggest a 10% chance that the ship takes on enough water to sink as early as the next week; and that they think this has a 25% chance of producing a broad class of genuinely attention-worthy harms, like many passengers needing to swim to the ship's next destination.)
I have already asked the shoggoths to search for me, and it would probably represent a duplication of effort on your part if you all went off and asked LLMs to search for you independently. I want to know if insiders have contrary evidence that I as an outsider did not know about. If my current take is wrong and unfair, I want to know it; that is not the same as promising to be easy to convince, but I do want to know.
I repeat: You should understand my take to be that of an organizational-psychology cynic who is not per se impressed by the apparent tolerance of dissenting views, people invited to give dissenting talks, dissenters still being invited to parties, et cetera. None of that will surprise me. I do not view it as sufficient to organizational best practices. I will only be surprised by the demonstrated past pragmatic power to control the disposition of funding and public promotion of ideas, contrary to "AGI median in 30 years or longer" and "utter ruin at 10% or lower", before the ChatGPT moment.
(If you doubt my ability to ever concede to evidence about this sort of topic, observe this past case on Twitter where I immediately and without argument concede that OpenPhil was right and I was wrong, the moment that the evidence appeared to be decisive. (The choice of example may seem snarky but is not actually snark; it is not easy for me to find other cases where, according to my own view, clear concrete evidence came out that I was definitely wrong and OpenPhil definitely right; and I did in that case immediately concede.))