Purpose of this post/TL;DR: Many people who argue against the existence of large-scale risks associated with AI, including Marc Andreessen or Yann LeCun, use the following argument template (call this the Argument from Overestimation, AFO): "since people were wrong in estimating high risk for *insert safe technology that we use today*, AI is probably safe too". I was surprised that this argument is being used by some of the leading figures in the AI Safety debate because as I argue in this paper, its logic relies on the survivorship bias. This, in turn, means that the argument begs the question by grouping AI with a skewed sample of only safe technologies, making it a flawed argument. I conclude that for the sake of improving the current debate on AI Safety, this argument should be abandoned. 

Epistemic status: While I tend to be more on the “x-risks side” in AI Safety debates (to make this clearer, I was more in agreement with Tegmark and Bengio in the debate I reference), I attempted to give the most charitable reconstructions and strongest responses for AFO and gave concrete examples of the argument in action to avoid the straw-man fallacy. Note also that just because a particular argument in favour of AI safety fails, we are not warranted to conclude that AI is unsafe. The main point of this post is simply that we should not assume safety/unsafety without providing further arguments. I invite feedback and potential improvements on my position in the comments! 

Argument from Overestimation

Consider the following passage from Marc Andreessen’s essay about concerns regarding AI risk, titled ‘Why AI will Save the World’:

“The fear that technology of our own creation will rise up and destroy us is deeply coded into our culture. [Recounts the myth of Prometheus about the technology of fire as an example.] The presumed evolutionary purpose of this mythology is to motivate us to seriously consider potential risks of new technologies – fire, after all, can indeed be used to burn down entire cities. But just as fire was also the foundation of modern civilization as used to keep us warm and safe in a cold and hostile world, this mythology ignores the far greater upside of most – all? – new technologies, and in practice inflames destructive emotion rather than reasoned analysis. Just because premodern man freaked out like this doesn’t mean we have to; we can apply rationality instead.” 

A similar sentiment was also expressed by Yann LeCun in the recent Munk Debate on AI, listing historical instances where warnings about a new technology’s negative effects were wide of mark: 

“Socrates was against writing, he thought people are going to lose their memory. The Catholic church was against the printing press, saying they would lose control of the Dogma. [...] The Ottoman Empire banned the printing press and according to some historian [sic] this is what accelerated their decline.” 

Later on, LeCun spells out his argument when reacting to Yoshua Bengio’s statement that AI is an unprecedented technology in its unique capabilities to design and produce its own improved copies: 

“That very argument was made for computers, you know, 50 years ago. This is not a new issue. [References a website called Pessimist Archives which keeps a record of newspaper clips with wrong and sometimes absurd predictions regarding the effects of new technologies]. [Take the example of] the train: you’re not gonna take the train if it’s going 50 km/h, you can't breathe at that speed. [...] Everybody has said this kind of thing about every time there was a technological evolution or cultural evolution.” 

It seems that the implicit argument in these passages is meant to be something like the following. Call it the Argument from Overestimation (AFO): 

P1 People usually overestimate the risks associated with new technologies.

P2 AI is a new technology.

C1 People probably overestimate the risks associated with AI.

The argument may seem inductively strong at first, but take a look at what all the technologies listed above have in common: they all turned out to be safe. This creates problems for LeCun's argument, because it highlights that the logic behind his argument rests on fallacious reasoning, specifically, it falls prey to the survivorship bias. 

Objection from Survivorship Bias

Survivorship bias is a cognitive shortcut that occurs when a more easily noticeable subgroup which passes a hidden selection process is mistaken as the entire group (read more here and here). A hypothetical example can be taken from a medical experiment attempting to estimate the effectiveness of a treatment intervention. Even an intervention which exhibits relatively high success rates (e.g. high survival rates) may actually be relatively ineffective, as the analysis of its effects often excludes people who did not survive long enough to get the treatment -- either because their immune systems were comparatively weaker, they had worse genetic predispositions for the disease etc. Hence, without including the weaker group in the analysis, the survival rate of the drug is most likely positively skewed and too high, as its selection mechanism operates only on people who are naturally more resistant to the disease. 

Andreessen’s and LeCun's arguments are based on similarly skewed reasoning. Think again of the technologies mentioned: fire, writing, printing, trains, computers. All of these technologies went through the selection process of "risk warnings about us were overestimated because we are safe technologies”. However, there is a whole group that did not pass this test: unsafe technologies where the high risk assessments turned out to be accurate. If someone said "look, we really shouldn't use airships for mass transport because they are dangerous", they estimated the risk correctly, meaning airships are automatically excluded from being used in AFO. Similar examples of unsafe technologies include certain food additives which turned out to be carcinogenic or specific pesticides that severely damage the human bodyand environment. Hence, AFO suffers from survivorship bias because the "risk overestimation" selection process picks out only technologies which actually turned out to be safe.

Therefore, due to this bias, AFO actually begs the question: it aims to prove AI Safety by comparing AI to a sample of exclusively safe technologies. This means that in reality, AFO actually works more like this:

P1* People usually overestimate risks associated with technologies that turned out to be safe. 

P2 AI is a new technology.

C1 People probably overestimate the risks associated with AI.

This formulation makes clear why the argument doesn't go through: in order for the argument to be sound, we would have to add the premise that AI turned out to be safe. However, that would beg the question, since the whole point of introducing the parallel was to prove AI Safety. Hence, the argument cannot go through without begging the question, making it a bad argument. Call this critique the Objection from Survivorship Bias. 

Possible Responses to the Objection from Survivorship Bias 

It seems challenging to find responses against this objection. Perhaps one could argue that P1* historically holds true for most technologies, therefore, it is likely true for AI as well. However, this seems very improbable. This is equivalent to either arguing that (i) most technologies we invent are safe from the get-go or (ii) that inventors (or social institutions or society as a whole) are very accurate at estimating risks. (i) seems improbable because if it were true, it would make it very hard to explain why regulatory processes were ever established. (ii) seems a bit more likely, but at the very least, individuals are known to be very bad at assessing risk and larger groups and even political institutions may aggregate the same biases at least partially. 

But perhaps a better response is that we are allowed to assume that AI will be safe as a by-product of current regulatory practice. Hence, we are protected against the discrete risks from AI, say misinformation, because there are regulations over misinformation, protected against discrimination thanks to discrimination guidelines and so on. I do think this defense works at least partially, since some negative effects of AI may in fact be buffered by existing regulation. For instance, creditors have the Right to Explanation regarding denied credit applications, regardless of whether the decision was made by a human or an algorithm. However, it still seems that this defense rests on some questionable assumptions. 

Firstly, why should we assume that non-targeted regulations will behave in a targeted way? AI will likely behave in novel unexpected ways that we will have to regulate for. Secondly, without concrete arguments for why current regulations are good enough for AI, just saying "current regulations are fine" is a form of confirmation bias. This is especially true since many regulations for algorithms have actually failed, including Meta's own engagement algorithm recommending radical content. Finally, from a more practical standpoint, it is not clear that existing regulations will apply smoothly to AI. For instance, many experts in the EU AI sector point out that as AI is being integrated into more and more products, businesses are faced with double regulatory obligations: is the responsible regulatory body for a product the sector specific institution, an IT-security institution or even a new AI body forthcoming thanks to the AI Act? 

In short, without clear guidelines on how to regulate AI specifically, there will be a tendency for creating legal friction rather than functioning regulation. Hence, overall it seems that AFO cannot really be defended against the objection from Survivorship Bias.

Conclusion 

In conclusion, I have argued that many debates about AI Safety revolve around a flawed Argument from Overestimation which suffers from the Survivorship Bias, using Yann LeCun's arguments form the Munk Debate as example. I have demonstrated that AFO is question-begging due to this bias, as it implicitly groups AI with a sample of only safe technologies, even though AI safety is the point AFO is trying to prove. Overall, it seems that AFO is a flawed argument and I hope that as the debate on AI Safety progresses, people will be motivated to look for better arguments. All feedback is very welcome in the comments!

Big thank you to Sam Robinson for valuable comments!

 

Cross-posted from the EA Forum: https://forum.effectivealtruism.org/posts/yQHzdmXa7KBB52fBz/ai-risk-and-survivorship-bias-how-andreessen-and-lecun-got 

New Comment
2 comments, sorted by Click to highlight new comments since:
[-][anonymous]30

The "survival bias" argument is overgeneralizing.  For each technology mentioned and many others, the number of wrong ways to use/construct an implementation using the technology greatly exceeds the number of correct ways.  We found the correct ways through systematic iteration.  

As a simple example, fire has escaped engines many times, and caused all types of vehicles to burn.  It took methodical and careful iteration to improve engines to the point that this usually doesn't happen, and vehicles have fire suppression systems, firewalls, and many other design elements to deal with this expected risk.  Note we do throw away performance, even combat jet fighters carry the extra weight of fire suppression systems. 

Worst case you burn down Chicago.

Humans would be able to do the same for AI if humans are able to iterate on many possible constructions for an AGI, cleaning up the aftermath in the events where they find out about significant flaws late (which is why deception is such a problem).  The "doom" argument is that an AGI can be made that has such an advantage it kills humans or disempowers humans before humans and other AI systems working for humans can react.  

 

To support the doom argument you need to provide evidence for the main points:
 

(1) that humans can construct an ASI and provide the information necessary to train it that has a massive margin over human beings AND

(2) the ASI can run in useful timescales when performing at this level of cognition, inferencing on computational hardware humans can build AND

(3) whatever resources (robotics, physical tasks performed) the ASI can obtain, above the ones required for the ASI to merely exist and satisfy humans (essentially "profit") are enough to kill/disempower humans AND

(4a) other AGI/ASI built by humans, and humans, are unable to stop it because they are less intelligent, despite potentially having a very large (orders of magnitude) advantage in resources and weapons OR

(4b) humans are scammed and support the ASI

 

If you wanted to argue against doom, or look for alignment ideas you can look for ways to limit each of these points.  For example,

(1) does intelligence actually scale this way or is it diminishing returns?  An accelerationist argument would point to the current data across many experiments saying it is in fact diminishing, or theoretical optimal policy arguments that prove it always has diminishing returns.

       An alignment idea would be to subdivide AGI/ASI systems into smaller, better defined systems and you would not expect more than a slight performance penalty because of diminishing returns.

(2) This is a diminishing returns argument, you need logarithmically more compute to get linearly more intelligence.  An accelerationist argument would count how many thousand H100s one running 'instance' of a strong ASI would likely need, and point out worldwide compute production won't be enough for decades at the current ramp rate. (and a pro doom argument would point out production can be scaled up many OOM)

      An alignment idea would be to register and track where high performance AI chips are purchased and     deployed, limiting deployment to licensed data centers, and to audit datacenters to ensure all their loads are human customers and they are not harboring an escaped AGI. 

( 3)  An accelerationist would argue that humans can prevent doom with sparse systems that are tightly supervised, and an accelerationist argument would be that humans will do this naturally, it's what EMH demands.  (competing AGI in a free market will not have any spare capacity to kill humans, as they are too busy spending all their resources trying to make money)

        This is sparse/myopia/ "deception?  I ain't got time for that"

(4a)  An accelerationist would argue that humans should race to build many kinds of powerful but restricted AGI/ASI as rapidly as possible, so they can stop AGI doom, by having a large stockpile of weapons and capabilities.  

        Note that this is what every alignment lab ends up doing.  I have talked to one person who suggested they     should develop combat drones that can be mass produced as an alignment strategy, aka an offensive defense against hostile AGI/ASI by having the capability to deploy very large numbers of essentially smart bombs.  So long as humans retain control, this might be a viable idea...    

(4b) This is an argument that we're screwed and deserve to die, an accelerationist argument would be that if humans are this stupid they deserve to die.

        I'm not sure how we fight this, this is my biggest fear.  That we can win on a technical level and it's possible to win without needing unrealistic international cooperation, but we die because we got scammed.  Lightly touching politics: this seems to be an entirely plausible risk.  There are many examples of democracies failing and picking obviously self-interested leaders who are obviously unqualified for the role.

Summary: the accelerationist arguments made in this debate are weak.  I pointed out some stronger ones.  

Hi Gerald, thanks for your comment! Note that I am arguing neither in favour of or against doom. What I am arguing though is the following: it is not good practice to group AI with technologies that we were able to iteratively improve towards safety when you are trying to prove AI safety. The point here is that without further arguments, you could easily make the reverse argument and it would have roughly the equal force:

P1 Many new technologies are often unsafe and impossible to iteratively improve (e.g. airhips).

P2 AI is a new technology.

C1 AI is probably unsafe and impossible to iteratively improve.

That is why I argue that this is not a good argument template because through survivorship bias in P1, you‘ll always be able to sneak in whatever it is you’re trying to prove.

With respect to your arguments about doom scenarios, I think they are really interesting and I’d be excited to read a post with your thoughts (maybe you already have one?).