Just up front: I have no qualifications on this so adjust accordingly. I teach AP Calc/AP Stats if you want to know my deal. Putting this down because half the time I'm reading a post I'm thinking to myself "I wish I knew what this person's deal is" :)
People who believe there is a >50% possibility of doom in the next 50 years or so strike me as overconfident. Just to say the general public is obviously way underestimating the risk (0% too low lol), but I believe many people on this site are overestimating the risk.
The complexity of the system is just so high. How can we predict what a superintelligence that hasn't even been created yet will behave? I understand that it's reasonable to assume it will want to accumulate resources, eliminate threats (us lol) etc., but how can anyone be for instance 90%+ sure that it will end with us dead?
There are so many links of reasoning that all have at least a small chance of going awry. Just spit-balling off the top of my head-- I'm sure all of these can be refuted, but like are you 100% sure? Just to say these specifics are not the point of my argument: I'm just arguing that there are a million things that could in theory go wrong, and even if each is unlikely, it's kind of a swiss-cheese defense against doom:
- Maybe when a certain level of intelligence is reached, consciousness comes online, and that affects the behavior of the system in an unpredictable way.
- Maybe alignment works!
- Maybe the system is unbelievably super-intelligent, but for some reason true social reasoning is basically impossible for LLMs and the like, and we need to go down a different, distant path before that becomes possible. We can still easily trick it!
- Maybe superintelligence is subject to rot. Many complex systems just kind of decay, and maybe for some reason this intelligence is unable to constantly update its code and maintain itself perfectly.
- Maybe it's actually worse than it seems, but fledgling AIs go crazy in 2035 and kill 50 people in a sufficiently brutal way that the world wakes up and we destroy every computer we can see with a sledgehammer before scaled up evil-AIs are possible.
It just seems like there a million things that could potentially go wrong. If you ask me to predict what would happen if a hungry tiger came into a room with me and three of my friends, I couldn't do it. There are just too many unknowns. Maybe it eats us? Maybe it is scared of us? Maybe I start crying but Richard saves the day? Who knows! This is that to the infinity power.
Obvously, I understand the basic argument and I totally accept it. I believe that misaligned AI is the largest existential threat we face, and I believe there is a serious threat that I will die because of it, and an even more serious threat that my daughter will die from it before the end of her natural life. I'm frightened, but I believe the people who put p(doom) in the next 100 years above 50% are overconfident.
I think the least controversial statement one can make is that when a transformative AI emerges, all bets are off (the original meaning of Kurzwell's Singularity), which also means that survival of humanity should not be a privileged hypothesis, so it makes sense to say that there is a 50/50 chance of survival. Unless you claim to know something about superintelligence with a reasonable degree of certainty.
This seems both inaccurate and highly controversial. (Controversially, this implies there is nothing that AI alignment can do - not only can we not make AI safer, we couldn't even deliberately make AI more dangerous if we tried)
Accuracy-wise, you may not be able to know much about superintelligences, but even if you were to go with a uniform prior over outcomes, what that looks like depends tremendously on the sample space.
For instance, take the following argument: When transformative AI emerges, all bets are off, which means that any particular number of humans left alive should not be a privileged hypothesis. Thus, it makes sense to consider "number of humans alive after the singularity" to be a uniform distribution between 0 and N, where N is the number of humans in an intergalactic civilisation, so the chance of humanity being wiped out is almost zero.
If we want to use only binary hypotheses instead of numerical ones, I could instead say that each individual human has a 50/50 chance of survival, meaning that when you add these together, roughly half of humanity lives and again the chance of humanity being wiped out is basically zero.
This is not a good argument, but it isn't obvious to me how its structure differs from your structure.
I see your point, and I agree that the prior/distribution matters. It always does. I guess my initial point is that a long-term prediction in a fast-moving "pre-paradigmatic" field is a fool's errand. As for survival of the species vs a single individual, it is indeed hard to tell. One argument that can be made is that a Thanos-AI does not make a lot of sense. Major forces have major consequences, and whole species and ecosystems have been wiped out before, many times. One can also point out that there are long tails whenever there are lots of disparate variables, so there might be pockets of human or human-like survivors if there is a major calamity, so a full extinction is unlikely. It is really hard to tell long in advance what reference class the AI advances will be in. Maybe we should just call it Knightean uncertainty...
I agree that it is very difficult to make predictions about something that is A) Probably a long way away (Where "long" here is more than a few years) and B) Is likely to change things a great deal no matter what happens.
I think the correct solution to this problem of uncertainty is to reason normally about it but have very wide confidence intervals, rather than anchoring on 50% because X will happen or it won't.
I wonder if someone could create a similar structured argument for the opposite viewpoint. (Disclaimer: I do not endorse a mirrored argument of OP's argument)
You could start with "People who believe there is a >50% possibility of humanity's survival in the next 50 years or so strike me as overconfident.", and then point out that for every plan of humanity's survival, there are a lot of things that could potentially go wrong.
The analogy is not perfect, but to a first approximation, we should expect that things can go wrong in both directions.
It's not symmetric in my view: The person positing a specific non-baseline thing has the burden of proof, and the more elaborate the claim, the higher the burden of proof.
"AI will become a big deal!" faces fewer problems than "AI will change our idea of humanity!" faces fewer problems than "AI will kill us all!" faces fewer problems than "AI will kill us all with nanotechnology!"
He who gets to choose which thing is baseline and which thing gets the burden of proof, is the sovereign.
(That said I agree that burden of proof is on people claiming AGI is a thing, that it is happening soon probably, and that it'll probably be existential catastrophe. But I think the burden of proof is much lighter than the weight of arguments and evidence that has accumulated so far to meet it.)
I'd be interested to hear your take on this article.
Yeah I totally agree with that article-- it's almost tautologically correct in my view, and I agree that the implications are wild
I'm specifically pushing back on the ppl saying it is likely that humanity ends during my daughter's lifetime-- I think that claim specifically is overconfident. If we extend the timeline than my objection collapses.
OK, fair. Well, as I always say these days, quite a lot of my views flow naturally from my AGI timelines. It's reasonable to be skeptical that AGI is coming in about 4 years, but once you buy that premise, basically everything else I believe becomes pretty plausible. In particular, if you think AGI is coming in 2027, it probably seems pretty plausible that humanity will be unprepared & more likely than not that things will go very badly. Would you agree?
It depends on what you mean by "go very badly" but I think I do disagree.
Again, I don't know what I'm talking about, but "AGI" is a little too broad for me. If you told me that you could more or less simulate my brain in a computer program and that this brain had the same allegiances to other AIs and itself that I currently have for other humans, and the same allegiance to humans that I currently have for even dogs (which I absolutely love), then yes I think it's all over and we die.
If you say to me, "FTPickle, I'm not going to define AGI. It is a promise that in 2027 an AGI emerges. Is it more likely than not that humanity is wiped out by this event?" I would gulp and pick 'no.'
Difference between "plausible" and "likely" is huge I think. Again huge caveat that AGI may be more specifically defined than I am aware of.
I'm happy to define it more specifically -- e.g. if you have time, check out What 2026 Looks Like and then imagine that in 2027 the chatbots finally become superhuman at all relevant intellectual domains (including agency / goal-directedness / coherence) whereas before they had been superhuman in some but subhuman in others. That's the sort of scenario I think is likely. It's a further question whether or not the AGIs would be aligned, to be fair. But much has been written on that topic as well.
Based on the five Maybes you suggested might happen, it sounds like you're saying some AI doomers are overconfident because there are a million things that could go potentially right. But there doesn't seem to be a good reason to expect any of those maybes to be likelihoods, and they seem more speculative (e.g. "consciousness comes online") than the reasons well-informed AI doomers think there's a good chance of doom this century.
PS I also have no qualifications on this.
Hmmm. I don't feel like I'm saying that. This isn't the perfect analogy, but it's kind of like AI doomers are looking at an ecosystem and predicting that if you introduce wolves into the system the wolves will become overpopulated and crush everything. There may be excellent reasons to believe this:
I just think that it's too complex to really feel confident, even if you have really excellent reasons to believe it will happen. Maybe wolves do horribly on hills and we didn't know that before we let them loose in this environment etc.
It's not on me to come up with reasons why the wolves won't take over-- simply saying "it's incredibly complex and we shouldn't be too confident about this even though it seems reasonable" is enough in my view
I agree with you that we shouldn't be too confident. But given how sharply capabilities research is accelerating—timelines on TAI are being updated down, not up—and in the absence of any obvious gating factor (e.g. current costs of training LMs) that seems likely to slow things down much if at all, the changeover from a world in which AI can't doom us to one in which it can doom us might happen faster than seems intuitively possible. Here's a quote from Richard Ngo on the 80,000 Hours podcast that I think makes this point (episode link: https://80000hours.org/podcast/episodes/richard-ngo-large-language-models/#transcript):
"I think that a lot of other problems that we’ve faced as a species have been on human timeframes, so you just have a relatively long time to react and a relatively long time to build consensus. And even if you have a few smaller incidents, then things don’t accelerate out of control.
"I think the closest thing we’ve seen to real exponential progress that people have needed to wrap their heads around on a societal level has been COVID, where people just had a lot of difficulty grasping how rapidly the virus could ramp up and how rapidly people needed to respond in order to have meaningful precautions.
"And in AI, it feels like it’s not just one system that’s developing exponentially: you’ve got this whole underlying trend of things getting more and more powerful. So we should expect that people are just going to underestimate what’s happening, and the scale and scope of what’s happening, consistently — just because our brains are not built for visualising the actual effects of fast technological progress or anything near exponential growth in terms of the effects on the world."
I'm not saying Richard is an "AI doomer", but hopefully this helps explain why some researchers think there's a good chance we'll make AI that can ruin the future within the next 50 years.
I agree in the abstract, but I don't think your reasons are mechanical enough in nature to be any reason to chill in action, only reason to chill in emotion. We're going to solve it - but only because we are calmly "panicking" and putting in the work to actually understand and befriend our nonhuman children, and it is going to require the field to continue to grow rapidly, which I think it will in fact continue to do. We have, like, a couple more years, but we don't need to panic to make it happen, because safety is a fundamental component to the next stage of capability.
I do think you have some point with not being able to reliably predict the behavior of a superintelligent-on-everything system, but there's reason to believe that any system that gains will to live - a very common attribute in any physical system, whether it's an organization, a partially completed chemical reaction, an individual organism, a species, a meme - might want to take action to try to survive. it might even want to grow rapidly. these "it might"s are the same kind of maybe as if talking about what a person might want: we can know ahead of time that these are things smart people sometimes want due to experiences or innate features. certainly we don't know exactly what it will want, but the problem with raising kids from a new species nobody has met before is exactly that we don't know how they'll grow up. having a solid idea how a superintelligent system would behave should make us feel better, in fact.
so I don't really disagree in any deep way. but I think your reasons to have hope are in fact reasons to be worried, and the real reason to have hope is that we'll understand in time if we hurry.