I mean, I guess I just conflate with "there is an obvious solution and everyone is aware of the problem" as a scenario in which there's not a lot else to say - you just don't build the thing. Though the how (international enforcement etc) may still be tricky, the situation would be vastly different.
Just adding to this - have been rinsing with saline for a long time due to nasal polyps. All I do is boil tap water for about 5 minutes and add the required dose of salt+bicarbonate (NeilMed bags). It's been years now and well, no brain-eating amoebas yet. This in the UK, btw, not sure if other waters can be at higher contamination risk but afaik the amoebas die around 60-70 C.
I don't think that's necessarily the case - if we get one or more warning shots then obviously people start taking the whole AI risk thing quite a bit more seriously. Complacency is still possible but "an AI tries to kill us all" stops being in the realm of speculation and generally speaking pushback and hostility against perceived hostile forces can be quite robust.
I have the sense that rationalists think there's a a very important distinction between "literally everyone will die" and, say, "the majority of people will suffer and/or die." I do not share that sense, and to me, the burden of proof set by the title is unreasonably high.
I would say that there is a distinction, but I agree that at those level of badness it sort of blurs out in a single blob of awfulness. But generally speaking I see it as, if someone was told "your whole family will be killed except your youngest son" or "your whole family will be killed, no one survives"... obviously both scenarios are horrifying but still you'd marginally prefer the first one. I think if people fall in the trap of being so taken by the extinction risk that they brush off a scenario in which, say, 95% of all people die, then they're obviously losing perspective, but I also think it's fair to say that the loss of all of humanity is worse than just the sum total of the loss of each individual in it (same reason why we consider genocide bad in and of its own - it's not just the loss of people, it's the loss of culture, knowledge, memory, on top of the people).
“We can’t tell you how it would win, but we can tell that it would win” is not believable for most people. You might know you’re not a good fighter, but most people don’t really feel it until they get in the ring with a martial arts expert. Then they realize how helpless they are. Normal people will not feel helpless based only on a logical theory.
I wonder how much having someone play a game of their choice against top level RL agents would help. Make the complete inability to even see the moves coming real.
"We already knew, so why not start working on it before the problem manifested itself in full" sounds very reasonable, but look at how it's going with climate change. Even with COVID if you remember there were a couple of months at the beginning of 2020 when various people were like "eh, maybe it won't come over here", or "maybe it's only in China because their hygiene/healthcare is poor" (which was ridiculous, but I've heard it. I've even heard a variant of it about the UK when the virus started spreading in northern Italy - that apparently the UK's superior health service had nothing on Italy's, so no reason to worry). Then people started dying in the west too and suddenly several governments scrambled to respond. Which to be sure is absolutely more inefficient and less well coordinated than if they had all made a sensible plan back in January, but that's not how political consensus works; you don't get enough support for that stuff unless enough people do have the ability and knowledge to extrapolate the threat to the future with reasonable confidence.
Yudkowsky and Soares seem to be entirely sincere, and they are proposing something that threatens tech company profits. This makes them much more convincing. It is refreshing to read something like this that is not based on hype.
I find it interesting that this is something you see as fresh because ironically this was the original form of existential risk from AI arguments. What happened here I think is something akin to seeing a bunch of inferior versions of a certain trope in a bunch of movies before seeing the original movie that established the trope (and did so much better).
In practice, it's not that companies made up the existential risk to drum up the potential power of their own AIs, and then someone refined the arguments into something more sensible. Rather, the arguments started more serious, and some of the companies were founded on the premise of doing research to address them. OpenAI was meant to be a no profit with these goals; Anthropic split up when they thought OpenAI was not following that mission properly. But in the end all these companies, being private entities that needed to attract funding, fell exactly to the drives that the "paperclip maximizer" scenario actually points at: not an explicit attempt to destroy the world, but rather a race to the bottom in which in order to achieve a goal efficiently and competitively risks are taken, costs are cut, solutions are rushed, and eventually something might just go a bit too wrong for anyone to fix it. And as they did so they tried to rationalise away the existential risk with wonkier arguments.
Why should we assume the AI wants to survive? If it does, then what exactly wants to survive?
Why should we assume that the AI has boundless, coherent drives?
I think these concerns have related answers. I believe they belong to the category where Yudkowski's argument is indeed weaker, but more in the sense that he's absolutely certain this might happen, and I might think it's only, like, 60-70% likely? Which for the purposes of this question is still a lot.
So generally the concept is, if you were to pick a goal from the infinite space of all possible imaginable goals, then yeah, maybe it would be something completely random. "Successfully commit suicide" is a goal. But more likely, the outcome of a badly aligned AI would be an AI with something like a botched, incomplete version of a human goal. And human goals generally have to do with achieving something in the real world, something material, that we enjoy or we want more of for whatever reason. Such goals are usually aided by survival - by definition an AI that stays around can do more of X than an AI that dies and can't do X any more. So survival becomes merely a means to an end, in that case.
The general problem here seems to be, even the most psychopathic, most deluded and/or most out of touch human still has a lot of what we could call common sense. Virtually no stationery company CEO, no matter how ruthless and cut throat, would think "strip mine the Earth to make paperclips" is a good idea. But all of these things we give for granted aren't necessarily as obvious for an AI whose goals we are building from scratch, and via what is essentially just an invite to guess our real wishes from a bunch of examples ("hey AI, look at this! This is good! But now look at this, this is bad! But this other thing, this is good!" etc. etc., and then we expect it to find a rule that coherently explains all of that). So, there still are infinite goals that are probably just as good at achieving those examples, and by sheer dint of entropy, most of them will have something bad about them rather than being neatly aligned with what a human would say is good even in cases which we didn't show. For the same reason why if I was given the pieces of a puzzle and merely arranged them randomly, the chance of getting out the actual picture of the puzzle is minuscule.
Why should we assume there will be no in between?
This is another one where I'd go from Yudkowski's certainty to a mere "very likely", but again, not a big gap.
My thinking here would be: if an AI is weaker, or at least on par with us, and knows it is, why should it start a fight it can lose? Why not bide its time, grow stronger, and then win? It would only open hostilities with that sort of situation if:
Of course both scenarios could happen, but I don't think they're terribly likely. Usually in the discourse these get referred to as "warning shots". In some way, a future in which we do get a warning shot is probably desirable - given how often it takes that kind of tangible experience of risk for political action to be taken. But of course it could be still very impactful. Even a war you win is still a war, and theoretically if we could avoid that too, all the better.
Connected to this: Le Guin also wrote "The Lathe of Heaven". I wrote a review of it here on LW. It's a novel that seems entirely about how utopia will always have a cost, even when there's no obvious reason why, as a fundamentally karmic payoff, though it's also not always pessimistic about improvements being possible.
The robots didn't open the eggs box and individually put them in the rack inside the fridge, obviously crap, not buying the hype. /s
It just seems to me like the topics are interconnected:
EY argues that there is likely no in-between. He does so specifically to argue that a "wait and see" strategy is not feasible, we can not experiment and hope to gleam further evidence past a certain point, we must act on pure theory because that's the best possible knowledge we can hope for before things become deadly;
dvd is not convinced of this thinking. Arguably, they're right - while EY's argument has weight I would consider it far from certain, and mostly seems built around the assumption of ASI-as-singleton rather than, say, an ecosystem of evolving AIs in competition which may have to worry also about each other and a closing window of opportunity;
if warning shots are possible, a lot of EY's arguments don't hold as straightforwardly. It becomes less reasonable to take extreme actions on pure speculation because we can afford - however with risk - to wait for a first sign of experimental evidence that the risk is real before going all in and risking paying the costs for nothing.
This is not irrelevant or unrelated IMO. I still think the risk is large but obviously warning shots would change the scenario and the way we approach and evaluate the risks of superintelligence.