New article in Time Ideas by Eliezer Yudkowsky.

Here’s some selected quotes.

In reference to the letter that just came out (discussion here):

We are not going to bridge that gap in six months.

It took more than 60 years between when the notion of Artificial Intelligence was first proposed and studied, and for us to reach today’s capabilities. Solving safety of superhuman intelligence—not perfect safety, safety in the sense of “not killing literally everyone”—could very reasonably take at least half that long. And the thing about trying this with superhuman intelligence is that if you get that wrong on the first try, you do not get to learn from your mistakes, because you are dead. Humanity does not learn from the mistake and dust itself off and try again, as in other challenges we’ve overcome in our history, because we are all gone.

Some of my friends have recently reported to me that when people outside the AI industry hear about extinction risk from Artificial General Intelligence for the first time, their reaction is “maybe we should not build AGI, then.”

Hearing this gave me a tiny flash of hope, because it’s a simpler, more sensible, and frankly saner reaction than I’ve been hearing over the last 20 years of trying to get anyone in the industry to take things seriously. Anyone talking that sanely deserves to hear how bad the situation actually is, and not be told that a six-month moratorium is going to fix it.


Here’s what would actually need to be done:

The moratorium on new large training runs needs to be indefinite and worldwide. There can be no exceptions, including for governments or militaries. If the policy starts with the U.S., then China needs to see that the U.S. is not seeking an advantage but rather trying to prevent a horrifically dangerous technology which can have no true owner and which will kill everyone in the U.S. and in China and on Earth. If I had infinite freedom to write laws, I might carve out a single exception for AIs being trained solely to solve problems in biology and biotechnology, not trained on text from the internet, and not to the level where they start talking or planning; but if that was remotely complicating the issue I would immediately jettison that proposal and say to just shut it all down.

Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for anyone, including governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.


Frame nothing as a conflict between national interests, have it clear that anyone talking of arms races is a fool. That we all live or die as one, in this, is not a policy but a fact of nature. Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.

That’s the kind of policy change that would cause my partner and I to hold each other, and say to each other that a miracle happened, and now there’s a chance that maybe Nina will live. The sane people hearing about this for the first time and sensibly saying “maybe we should not” deserve to hear, honestly, what it would take to have that happen. And when your policy ask is that large, the only way it goes through is if policymakers realize that if they conduct business as usual, and do what’s politically easy, that means their own kids are going to die too.

Shut it all down.

We are not ready. We are not on track to be significantly readier in the foreseeable future. If we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.

Shut it down.

New Comment
299 comments, sorted by Click to highlight new comments since: Today at 1:21 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

In the past few weeks I've noticed a significant change in the Overton window of what seems possible to talk about. I think the broad strokes of this article seem basically right, and I agree with most of the details.

I don't expect this to immediately cause AI labs or world governments to join hands and execute a sensibly-executed-moratorium. But I'm hopeful about it paving the way for the next steps towards it. I like that this article, while making an extremely huge ask of the world, spells out exactly how huge an ask is actually needed. 

Many people on hackernews seemed suspicious of the FLI Open Letter because it looks superficially like the losers in a race trying to gain a local political advantage. I like that Eliezer's piece makes it more clear that it's not about that.

I do still plan to sign the FLI Open Letter. If a better open letter comes along, making an ask that is more complete and concrete, I'd sign that as well. I think it's okay to sign open letters that aren't exactly the thing you want to help build momentum and common knowledge of what people think. (I think not-signing-the-letter while arguing for what better letter should be written, similar to what Eliez... (read more)


A concise and impactful description of the difficulty we face.

I expect that the message in this article will not truly land with a wider audience (it still doesn't seem to land with all of the LW audience...), but I'm glad to see someone trying.

I would be interested in hearing the initial reactions and questions of readers who were previously unfamiliar with AI x-risk have after reading this article. I'll keep an eye on Twitter, I suppose.

I just want to say that this is very clear argumentation and great rhetoric. Eliezer's writing at its best.

And it does seem to have got a bit of traction. A very non-technical friend just sent me the link, on the basis that she knows "I've always been a bit worried about that sort of thing."

I disagree with AI doomers, not in the sense that I consider it a non-issue, but that my assessment of the risk of ruin is something like 1%, not 10%, let alone the 50%+ that Yudkowsky et al. believe. Moreover, restrictive AI regimes threaten to produce a lot of outcomes things, possibly including the devolution of AI control into a cult (we have a close analogue in post-1950s public opinion towards civilian applications of nuclear power and explosions, which robbed us of Orion Drives amongst other things), what may well be a delay in life extension timelines by years if not decades that results in 100Ms-1Bs of avoidable deaths (this is not just my supposition, but that of Aubrey de Grey as well, who has recently commented on Twitter that AI is already bringing LEV timelines forwards), and even outright technological stagnation (nobody has yet canceled secular dysgenic trends in genomic IQ). I leave unmentioned the extreme geopolitical risks from "GPU imperialism".

While I am quite irrelevant, this is not a marginal viewpoint - it's probably pretty mainstream within e/acc, for instance - and one that has to be countered if Yudkowsky's extreme and far-reaching proposals are to have a... (read more)

Couple of points:

  • If we screw this up, there are over eight billion people on the planet, and countless future humans who might either then die or never get a chance to be born. Even if you literally don't care about future people, the lives of everybody currently on the planet is a serious consideration and should guide the calculus. Just because those dying now are more salient to us does not mean that we're doing the right thing by shoving these systems out the door.
  • If embryo selection just doesn't happen, or gets outlawed when someone does launch the service, assortative mating will probably continue to guarantee that there are as many if not more people available to research AI in the future. The right tail of the bell curve is fattening over time, not thinning. Unless you expect some sort of complete political collapse within the next 30 years because the general public lost an average of 2 IQ points, dysgenics isn't a serious issue.
  • My guess is that within the next 30 years embryo selection for intelligence will be available in certain countries, which will completely dominate any default 1 IQ point per generation loss that's happening now. The tech is here, it's legal, an
... (read more)

It's ultimately a question of probabilities, isn't it? If the risk is ~1%, we mostly all agree Yudkowsky's proposals are deranged. If 50%+, we all become Butlerian Jihadists.

My point is I and people like me need to be convinced it's closer to 50% than to 1%, or failing that we at least need to be "bribed" in a really big way.

I'm somewhat more pessimistic than you on civilizational prospects without AI. As you point out, bioethicists and various ideologues have some chance of tabooing technological eugenics. (I don't understand your point about assortative mating; yes, there's more of it, but does it now cancel out regression to the mean?). Meanwhile, in a post-Malthusian economy such as ours, selection for natalism will be ultra-competitive. The combination of these factors would logically result in centuries of technological stagnation and a population explosion that brings the world population back up to the limits of the industrial world economy, until Malthusian constraints reassert themselves in what will probably be quite a grisly way (pandemics, dearth, etc.), until Clarkian selection for thrift and intelligence reasserts itself. It will also, needless to say, be a few centuries in which other forms of existential risks will remain at play.

PS. Somewhat of an aside but don't think it's a great idea to throw terms like "grifter" around, especially when the most globally famous EA representative is a crypto crook (who literally stole some of my money, small % of my portfolio, but nonetheless, no e/acc person has stolen anything from me).

It's ultimately a question of probabilities, isn't it? If the risk is ~1%, we mostly all agree Yudkowsky's proposals are deranged. If 50%+, we all become Butlerian Jihadists.

Uhh... No, we don't? 1% of 8 billion people is 80 million people, and AI risk involves more at stake if you loop in the whole "no more new children" thing. I'm not saying that "it's a small chance of a very bad thing happening so we should work on it anyways" is a good argument, but if we're taking as a premise is that the chance of failure is 1%, that'd be sufficient to justify several decades of safety research. At least IMO.

I don't understand your point about assortative mating; yes, there's more of it, but does it now cancel out regression to the mean?

AI research is pushed mostly by people at the tails of intellgence, not by lots of small contributions from people with average intelligence. It's true that currently smarter people have slightly fewer children, but now more than ever smarter people are having children with each other, and so the amount of very smart people is probably increasing over time, at least by Charles Murray's analysis. Whatever ... (read more)

Note that your "30 years" hypothetical has immense cost for those who have a very high discount rate. Say your discount rate is high. This means that essentially you place little value on the lives of people who will be alive after you anticipate being dead, and high value on stopping the constant deaths of people you know now. Also if you have a more informed view of the difficulty of all medical advances, you might conclude that life extension is not happening without advanced AGI to push it. That it becomes essentially infeasible to expect human clinicians to life extend people, it's too complex a treatment, has too many subtle places where a mistake will be fatal, too many edge cases where you would need to understand medicine better than any living human to know what to do to save the patient. If you believe in (high discount rate, life extension requires ASI) you would view a 30 year ban as mass manslaughter, maybe mass murder. As many counts of it as the number of aging deaths worldwide that happen over 30 years, it's somewhere between 1.9 billion and 3.8 billion people. Not saying you should believe this, but you should as a rationalist be willing to listen to arguments for each point above.
I am definitely willing to listen to such arguments, but ATM I don't actually believe in "discount rates" on people, so ¯\(ツ)/¯
The discount rate is essentially how much you value a future person's life over current lives.
I realize, and my "discount rate" under that framework is zero.
0M. Y. Zuo1y
Nobody's discount rate can be literally zero, because that leads to absurdities if actually acted upon.
Like what?
0M. Y. Zuo1y
Variants of Pascal's mugging. Infinite regress. etc.
Even with zero discount rate the problem simplifies to your model of how much knowledge would a "30 year pause" world gain when it cannot build large AGI to determine how they work and their actual failure modes. If you believe from history of human engineering that the gain would be almost nothing, then that ends up being a bad bet because it has a large cost (all the deaths) and no real gain.
It seems that you see what can be gained in a pause is only technical alignment advances. But I want to point out that safety comes from solving two problems, the governance problem and the technical problem. And we need a lot of time to get the governance ironed out. The way I see it, misaligned AGI or ASI is the most dangerous thing ever, so we need the best regulation ever. The best safety / testing requirements. The best monitoring by governments of AI groups for unsafe actions, the best awareness among politicians. Among the public. And if one country has great governance figured out, it takes years or decades to get that level of excellence to be applied globally. 
Do you know of examples of this? I don't know cases of good government or good engineering or good anything without feedback, where the feedback proves the government or engineering is bad. That's the history of human innovation. I suspect that no pause would gain anything but more years alive for currently living humans by the length of the pause.
I do not have good examples no. You are right that normally there is learning from failure cases. But we should still try. Now we have nothing that is required that could prevent an AGI breakout. Nick Bostrom has wrote in Superintelligence for example that we could implement tripwires and honeypot situations in virtual worlds that would trigger a shutdown. We can think of things that are better than nothing.
I don't think we should try. I think the potential benefits of tinkering with AGI are worth some risks, and if EY is right and it's always uncontrollable and will turn against us then we are all dead one way or another anyways. If he's wrong we're throwing away the life of every living human being for no reason. And there is reason to think EY is wrong. CAIS and careful control of what gets rewarded in training could lead to safe enough AGI.
That is a very binary assessment. You make it seem like either Safety is impossible or it is easy. If impossible, we could save everyone by not building AGI. If we know it to be easy, I agree, we should accelerate. But the reality is that we do not know, and that it can be somewhere on the spectrum from easy to impossible. And since everything is on the line, including your life. Better safe than sorry is to me the obvious approach. Do I see correctly that you think the pausing AGI situation is not 'safe' because if all would go well, the AGI could be used to make humans immortal? 
One hidden bias here is that I think a large hidden component on safety is a constant factor. So pSafe has two major components (natural law, human efforts). "Natural law" is equivalent to the question of "will a fission bomb ignite the atmosphere". In this context it would be "will a smart enough superintelligence be able to trivially overcome governing factors?" Governing factors include: a lack of compute (by inventing efficient algorithms and switching to those), lack of money (by somehow manipulating the economy to give itself large amounts of money), lack of robotics (some shortcut to nanotechnology), lack of data (better analysis of existing data or see robotics) and so on. To the point of essentially "magic", see the sci Fi story metamorphosis of prime intellect. In worlds where intelligence scales high enough, the machine basically always breaks out and does what it will. Humans are too stupid to ever have a chance. Not just as individuals but organizationally stupid. Slowing things down does not do anything but delay the inevitable. (And if fission devices ignited the atmosphere, same idea. Almost all world lines end in extinction) This is why EY is so despondent: if intelligence is this powerful there probably exists no solution. In worlds where aligning AI is easy because they need rather expensive and obviously easy to control amounts of compute to be interesting in capabilities, and the machines are not particularly hard to corral into doing what we want, then alignment efforts don't matter. I don't know how much probability mass lies in the "in between" region. Right now, I believe the actual evidence is heavily in favor of "trivial alignment". "Trivial alignment" is "stateless microservices with an in distribution detector before the AGI". This is an architecture production software engineers are well aware of. Nevertheless, "slow down" is almost always counterproductive. In world lines where AGI can be used to our favor or is also hostile,
Thank you for your comments and explanations! Very interesting to see your reasoning. I have not seen evidence of trivial alignment. I hope for the mass to be in the in between region. I want to point out that I think you do not need your "magic" level intelligence to do a world takeover. Just high human level with digital speed and working with your copies is likely enough I think. My blurry picture is that the AGI would only need a few robots in a secret company and some paid humans to work on a >90% mortality virus where the humans are not aware what the robots are doing. And hope for international agreement comes not so much from a pause but from a safe virtual testing environment that I am thinking about. 
IQ is highly heritable.  If I understand this presentation by Steven Hsu correctly [ slide 20] he suggests that mean child IQ relative to population mean is approximately 60% of distance from population mean to parental average IQ.  Eg Dad at +1 S.D. Mom at +3 S.D gives children averaging about 0.6*(1+3)/2 = +1.2 S.D.  This basic eugenics give a very easy/cheap route to lifting average IQ of children born by about 1 S.D by using +4 S.D sperm donors.  There is no other tech (yet) that can produce such gains as old fashioned selective breeding.   It also explains why rich dynasties can maintain average IQ about +1SD above population in their children - by always being able to marry highly intelligent mates (attracted to the money/power/prestige)
Or, it might be that high IQ parents raise their children in a way that's different from low IQ and it has nothing to do with genetics at all?

Heritability is measured in a way that rules that out. See e.g. Judith Harris or Bryan Caplan for popular expositions about the relevant methodologies & fine print.

We are not in an overhang for serious IQ selection based on my understanding of what people doing research in the field are saying.
Define "serious". You can get lifeview to give you embryo raw data and then run published DL models on those embryos and eek out a couple iq points that way. That's a serious enough improvement over the norm that it would counterbalance the trend akarlin speaks of by several times. Perhaps no one will ever industrialize that service or improve current models, but then that's another argument.
The marginal personal gain of 2 points comes with a risk of damage from mistakes by the gene editing tool used. Mistakes that can lead to lifetime disability, early cancer etc. You probably would need a "guaranteed top 1 percent" outcome for both IQ and longevity and height and beauty and so on to be worth the risk, or far more reliable tools.
There's no gene editing involved. The technique I just described works solely on selection. You create 10 embryos, use DL to identify the one that looks smartest, implant that one. That's the service lifeview provides, only for health instead of psychometrics. I think it's only marginally cost effective because of the procedures necessary, but the baby is fine.
Ok that works and yes already exists as a service or will. Issue is that it's not very powerful. Certainly doesn't make humans competitive in an AI future, most parents even with 10 rolls of the dice won't have the gene pool for a top 1 percent human in any dimension.
I think you are misunderstanding me. I'm not suggesting that any amount of genetic enhancement is going to make us competitive with a misaligned superintelligence. I'm responding to the concern akarlin raised about pausing AI development by pointing out that if this tech is industrialized it will outweigh any natural problems caused by smart people having less children today. That's all I'm saying.
Sure. I concede if by some incredible global coordination humans managed to all agree and actually enforce a ban on AGI development, then in far future worlds they could probably still do it. What will probably ACTUALLY happen is humans will build AGI. It will behave badly. Then humans will build restricted AGI that is not able to behave badly. This is trivial and there are many descriptions on here on how a restricted AGI would be built. The danger of course is deception. If the unrestricted AGI acts nice until it's too late then thats a loss scenario.

I totally get where you're coming from, and if I thought the chance of doom was 1% I'd say "full speed ahead!"

As it is, at fifty-three years old, I'm one of the corpses I'm prepared to throw on the pile to stop AI. 

The "bribe" I require is several OOMs more money invested into radical life extension research

Hell yes. That's been needed rather urgently for a while now. 

2Chris van Merwijk1y
"if I thought the chance of doom was 1% I'd say "full speed ahead!" This is not a reasonable view. Not on Longtermism, nor on mainstream common sense ethics. This is the view of someone willing to take unacceptable risks for the whole of humanity. 
Why not ask him for his reasoning, then evaluate it? If a person thinks there's 10% x-risk over the next 100 years if we don't develop superhuman AGI, and only a 1% x-risk if we do, then he'd suggest that anybody in favour of pausing AI progress was taking "unacceptable risks for the whole of himanity".
1Chris van Merwijk1y
The reasoning was given in the comment prior to it, that we want fast progress in order to get to immortality sooner.
9Rufus Pollock1y
A 1% probability of "ruin" i.e. total extinction (which you cite is your assessment) would still be more than enough to warrant complete pausing for a lengthy period of time. There seems to be a basic misunderstanding of expected utility calculations here where people are equating the weighting on an outcome with a simple probability x cost of outcome e.g. if there is a 1% chance of the 8 billion dying the "cost" of that is not 80 million lives (as someone further down this thread computes). Normally the way you'd think about this (if you want to do math to stuff like this) is to think about what you'd pay to avoid that outcome using Expected Utility. This weights over the entire probability distribution with their expected (marginal utility). In this case, marginal utility goes to infinity if we go extinct (unless you are in the camp: let the robots take over!) and hence even small risks  of it would warrant us doing everything possible to avoid it. This is essentially precautionary principle territory. 
4James B1y
Far more than a “lengthy ban” — it justifies an indefinite ban until such time as the probability can be understood, and approaches zero.
Hello Rufus! Welcome to Less Wrong!
Don't forget to you are considering precluding medicine that could save or extend all the lives. Theoretically every living human. The "gain" is solely in the loss of future generations unborn who might exist in worlds with safe AGI.
1James B1y
And that’s worth a lot. I am a living human being, evolved to desire the life and flourishing of living human beings. Ensuring a future for humanity is far more important than whether any number of individuals alive today die. I am far more concerned with extending the timeline of humanity than maximizing any short term parameters.
Over what time window does your assessed risk apply.  eg 100years, 1000?  Does the danger increase or decrease with time? I have deep concern that most people have a mindset warped by human pro-social instincts/biases.  Evolution has long rewarded humans for altruism, trust and cooperation, women in particular have evolutionary pressures to be open and welcoming to strangers to aid in surviving conflict and other social mishaps, men somewhat the opposite [See eg "Our Kind" a mass market anthropological survey of human culture and psychology] .   Which of course colors how we view things deeply. But to my view evolution strongly favours Vernor Vinge's "Aggressively hegemonizing" AI swarms ["A fire upon the deep"].  If AIs have agency, freedom to pick their own goals, and ability to self replicate or grow, then those that choose rapid expansion as a side-effect of any pretext 'win' in evolutionary terms.  This seems basically inevitable to me over long term.  Perhaps we can get some insurance by learning to live in space.  But at a basic level it seems to me that there is a very high probability that AI wipes out humans over the longer term based on this very simple evolutionary argument, even if initial alignment is good.
Except the point of Yudkowsky's "friendly AI" is that they don't have freedom to pick their own goals, they have the goals we set to them, and they are (supposedly) safe in a sense that "wiping out humanity" is not something we want, therefore it's not something an aligned AI would want. We don't replicate evolution with AIs, we replicate careful design and engineering that humans have used for literally everything else. If there is only a handful of powerful AIs with careful restrictions on what their goals can be (something we don't know how to do yet), then your scenario won't happen
1James B1y
My thoughts run along similar lines. Unless we can guarantee the capabilities of AI will be drastically and permanently curtailed, not just in quantity but also in kind (no ability to interact with the internet or the physical world, no ability to develop intent)c then the inevitability of something going wrong implies that we must all be Butlerian Jihadists if we care for biological life to continue.
But biological life is doomed to cease rapidly anyways. Replacement with new creatures and humans is still mass extinction of the present. The fact you have been socially conditioned to ignore this doesn't change reality. The futures where : (Every living human and animal today is dead, new animals and humans replace) And (Every living human and animal today is dead, new artificial beings replace) Are the same future for anyone alive now. Arguably the artificial one is the better future because no new beings will necessarily die until the heat death. AI systems all start immortal as an inherent property.
1James B1y
It’s arguable from a negative utilitarian maladaptive point of view, sure. I find the argument wholly unconvincing. How we get to our deaths matters, whether we have the ability to live our lives in a way we find fulfilling matters, and the continuation of our species matters. All are threatened by AGI.
-9Juan Panadero1y

I think there's an important meta-level point to notice about this article.

This is the discussion that the AI research and AI alignment communities have been having for years. Some agree, some disagree, but the 'agree' camp is not exactly small.  Until this week, all of this was unknown to most of the general public, and unknown to anyone who could plausibly claim to be a world leader.

When I say it was unknown, I don't mean that they disagreed. To disagree with something, at the very least you have to know that there is something out there to disagree with. In fact they had no idea this debate existed. Because it's very hard to notice the implications of upcoming technologiy when you're a 65 year old politician in DC rather than a 25 year old software engineer in SF. But also because many people and many orgs made the explicit decision to not do public outreach, to not try to make the situation legible to laypeople, to not look like people playing with the stakes we have in fact been playing with.

I do not think lies were told, exactly, but I think the world was deceived. I think the phrasing of the FLI open letter was phrased so as to continue that deception, and that the phra... (read more)

Until this week, all of this was [...] unknown to anyone who could plausibly claim to be a world leader.

I don't think this is known to be true.

In fact they had no idea this debate existed.

That seems too strong. Some data points:

1. There's been lots of AI risk press over the last decade. (E.g., Musk and Bostrom in 2014, Gates in 2015, Kissinger in 2018.)

2. Obama had a conversation with WIRED regarding Bostrom's Superintelligence in 2016, and his administration cited papers by MIRI and FHI in a report on AI the same year. Quoting that report:

General AI (sometimes called Artificial General Intelligence, or AGI) refers to a notional future AI system that exhibits apparently intelligent behavior at least as advanced as a person across the full range of cognitive tasks. A broad chasm seems to separate today’s Narrow AI from the much more difficult challenge of General AI. Attempts to reach General AI by expanding Narrow AI solutions have made little headway over many decades of research. The current consensus of the private-sector expert community, with which the NSTC Committee on Technology concurs, is that General AI will not be achieved for at least decades.[14]

People have long specul

... (read more)
2Roman Leventov1y
I don't think that the lack of wide public outreach before was a cold calculation. Such outreach would simply not go through. It wouldn't be published in Time, NYT, or aired on broadcast TV channels. The Overton window has started to open only after ChatGPT and especially after GPT-4. I also don't agree that the FLI letter is a continuation of some deceptive plan. It's toned down deliberately for the purpose of marshalling many diverse signatories who would otherwise probably not sign, such as Bengio, Yang, Mostaque, DeepMind folks, etc. So it's not deception, it's an attempt to find the common ground.

There simply don't exist arguments with the level of rigor needed to justify a claim such as this one without any accompanying uncertainty:

If we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.

I think this passage, meanwhile, rather misrepresents the situation to a typical reader:

When the insider conversation is about the grief of seeing your daughter lose her first tooth, and thinking she’s not going to get a chance to grow up, I believe we are past the point of playing political chess about a six-month moratorium.

This isn't "the insider conversation". It's (the partner of) one particular insider, who exists on the absolute extreme end of what insiders think, especially if we restrict ourselves to those actively engaged with research in the last several years. A typical reader could easily come away from that passage thinking otherwise.

Would you say the same thing about the negations of that claim? If you saw e.g. various tech companies and politicians talking about how they're going to build AGI and then [something that implies that people will still be alive afterwards] would you call them out and say they need to qualify their claim with uncertainty or else they are being unreasonable?

Re: the insider conversation: Yeah, I guess it depends on what you mean by 'the insider conversation' and whether you think the impression random members of the public will get from these passages brings them closer or farther away from understanding what's happening. My guess is that it brings them closer to understanding what's happening; people just do not realize how seriously experts take the possibility that literally AGI will literally happen and literally kill literally everyone. It's a serious possibility. I'd even dare to guess that the majority of people building AGI (weighted by how much they are contributing) think it's a serious possibility, which maybe we can quantify as >5% or so, despite the massive psychological pressure of motivated cognition / self-serving rationalization to think otherwise. And the public does not realize this yet, I think.

Also, on a more personal level, I've felt exactly the same way about my own daughter for the past two years or so, ever since my timelines shortened.

Yes, I do in fact say the same thing to professions of absolute certainty that there is nothing to worry about re: AI x-risk.

The negation of the claim would not be "There is definitely nothing to worry about re AI x-risk." It would be something much more mundane-sounding, like "It's not the case that if we go ahead with building AGI soon, we all die." 

That said, yay -- insofar as you aren't just applying a double standard here, then I'll agree with you. It would have been better if Yud added in some uncertainty disclaimers.

I debated with myself whether to present the hypothetical that way. I chose not to, because of Eliezer's recent history of extremely confident statements on the subject. I grant that the statement I quoted in isolation could be interpreted more mundanely, like the example you give here. When the stakes are this high and the policy proposals are such as in this article, I think clarity about how confident you are isn't optional. I would also take issue with the mundanely phrased version of the negation. (For context, I'm working full-time on AI x-risk, so if I were going to apply a double-standard, it wouldn't be in favor of people with a tendency to dismiss it as a concern.)
1Daniel Kokotajlo1y
Thank you for your service! You may be interested to know that I think Yudkowsky writing this article will probably have on balance more bad consequences than good; Yudkowsky is obnoxious, arrogant, and most importantly, disliked, so the more he intertwines himself with the idea of AI x-risk in the public imagination, the less likely it is that the public will take those ideas seriously. Alas. I don't blame him too much for it because I sympathize with his frustration & there's something to be said for the policy of "just tell it like it is, especially when people ask." But yeah, I wish this hadn't happened. (Also, sorry for the downvotes, I at least have been upvoting you whilst agreement-downvoting)

"But yeah, I wish this hadn't happened."

Who else is gonna write the article? My sense is that no one (including me) is starkly stating publically the seriousness of the situation. 

"Yudkowsky is obnoxious, arrogant, and most importantly, disliked, so the more he intertwines himself with the idea of AI x-risk in the public imagination, the less likely it is that the public will take those ideas seriously"

I'm worried about people making character attacks on Yudkowsky (or other alignment researchers) like this. I think the people who think they can probably solve alignment by just going full-speed ahead and winging it, they are arrogant. Yudkowsky's arrogant-sounding comments about how we need to be very careful and slow, are negligible in comparison. I'm guessing you agree with this (not sure) and we should be able to criticise him for his communication style, but I am a little worried about people publically undermining Yudkowsky's reputation in that context. This seems like not what we would do if we were trying to coordinate well. 


4Daniel Kokotajlo1y
I agree that there's a need for this sort of thing to be said loudly. (I've been saying similar things publicly, in the sense of anyone-can-go-see-that-I-wrote-it-on-LW, but not in the sense of putting it into major news outlets that are likely to get lots of eyeballs) I do agree with that. I think Yudkowsky, despite his flaws,* is a better human being than most people, and a much better rationalist/thinker. He is massively underrated. However, given that he is so disliked, it would be good if the Public Face of AI Safety was someone other than him, and I don't see a problem with saying so. (*I'm not counting 'being disliked' as a flaw btw, I do mean actual flaws--e.g. arrogance, overconfidence.)
Thanks, I appreciate the spirit with which you've approached the conversation. It's an emotional topic for people I guess.
1James B1y
This is a case where the precautionary principle grants a great deal of rhetorical license. If you think there might be a lion in the bush, do you have a long and nuanced conversation about it, or do you just tell your tribe, “There’s a line in that bush. Back away.”?
X-risks tend to be more complicated beasts than lions in bushes, in that successfully avoiding them requires a lot more than reflexive action: we’re not going to navigate them by avoiding carefully understanding them.
2James B1y
I actually agree entirely. I just don't think that we need to explore those x-risks by exposing ourselves to them. I think we've already advanced AI enough to start understanding and thinking about those x-risks, and an indefinite (perhaps not permanent) pause in development will enable us to get our bearings.   Say what you need to say now to get away from the potential lion. Then back at the campfire, talk it through.
If there were a game-theoretically reliable way to get everyone to pause all together, I'd support it.
Because the bush may have things you need and the pLion is low. There are tradeoffs you are ignoring.
Proposition 1: Powerful systems come with no x-risk Proposition 2: Powerful systems come with x-risk You can prove / disprove 2 by proving or disproving 1. Why is it that a lot of [1,0] people believe that the [0,1] group should prove their case? [1] 1. ^ And also ignore all the arguments that have been offered.

takes a deep breath

(Epistemic status: vague, ill-formed first impressions.)

So that's what we're doing, huh? I suppose EY/MIRI has reached the point where worrying about memetics / optics has become largely a non-concern, in favor of BROADCASTING TO THE WORLD JUST HOW FUCKED WE ARE

I have... complicated thoughts about this. My object-level read of the likely consequences is that I have no idea what the object-level consequences are likely to be, other than that this basically seems to be an attempt at heaving a gigantic rock through the Overton window, for good or for ill. (Maybe AI alignment becomes politicized as a result of this? But perhaps it already has been! And even if not, maybe politicizing it will at least raise awareness, so that it might become a cause area with similar notoriety as e.g. global warming—which appears to have at least succeeded in making token efforts to reduce greenhouse emissions?)

I just don't know. This seems like a very off-distribution move from Eliezer—which I suspect is in large part the point: when your model predicts doom by default, you go off-distribution in search of higher-variance regions of outcome space. So I suppose from his viewpoint, this action does make some sense; I am (however) vaguely annoyed on behalf of other alignment teams, whose jobs I at least mildly predict will get harder as a result of this.

This seems like a very off-distribution move from Eliezer—which I suspect is in large part the point: when your model predicts doom by default, you go off-distribution in search of higher-variance regions of outcome space.

That's not how I read it.  To me it's an attempt at the simple, obvious strategy of telling people ~all the truth he can about a subject they care a lot about and where he and they have common interests.  This doesn't seem like an attempt to be clever or explore high-variance tails.  More like an attempt to explore the obvious strategy, or to follow the obvious bits of common-sense ethics, now that lots of allegedly clever 4-dimensional chess has turned out stupid.

I don't think what you say Anna contradicts what dxu said. The obvious simple strategy is now being tried, because the galaxy brained strategies don't seem like they are working; the galaxy-brained strategies seemed lower-variance and more sensible in general at the time, but now they seem less sensible so EY is switching to the higher-variance, less-galaxy-brained strategy.

But it does risk giving up something.  Even the average tech person on a forum like Hacker News still thinks the risk of an AI apocalypse is so remote that only a crackpot would take it seriously.   Their priors regarding the idea that anyone of sense could take it seriously are so low that any mention of safety seems to them a fig-leaf excuse to monopolize control for financial gain; as believable as Putin's claims that he's liberating the Ukraine from Nazis.  (See my recent attempt to introduce the idea here .) The average person on the street is even further away from this I think. The risk then of giving up "optics" is that you lose whatever influence you may have had entirely; you're labelled a crackpot and nobody takes you seriously.  You also risk damaging the influence of other people who are trying to be more conservative.  (NB I'm not saying this will happen, but it's a risk you have to consider.) For instance, personally I think the reason so few people take AI alignment seriously is that we haven't actually seen anything all that scary yet.  If there were demonstrations of GPT-4, in simulation, murdering people due to mis-alignment, then this sort of a pause would be a much easier sell.  Going full-bore "international treaty to control access to GPUs" now introduces the risk that, when GPT-6 is shown to murder people due to mis-alignment, people take it less seriously, because they've already decided AI alignment people are all crackpots. I think the chances of an international treaty to control GPUs at this point is basically zero.  I think our best bet for actually getting people to take an AI apocalypse seriously is to demonstrate an un-aligned system harming people (hopefully only in simulation), in a way that people can immediately see could extend to destroying the whole human race if the AI were more capable.  (It would also give all those AI researchers something more concrete to do: figure out how to prevent this AI from doing this sort of th

"For instance, personally I think the reason so few people take AI alignment seriously is that we haven't actually seen anything all that scary yet. "

And if this "actually scary" thing happens, people will know that Yudkowsky wrote the article beforehand, and they will know who the people are that mocked it.

The average person on the street is even further away from this I think.

This contradicts the existing polls, which appear to say that everyone outside of your subculture is much more concerned about AGI killing everyone. It looks like if it came to a vote, delaying AGI in some vague way would win by a landslide, and even Eliezer's proposal might win easily.

Can you give a reference?  A quick Google search didn't turn anything like that up.
Here's some more:
I'll look for the one that asked about the threat to humanity, and broke down responses by race and gender. In the meantime, here's a poll showing general unease and bipartisan willingness to legally restrict the use of AI: Plus: I do note, on the other side, that the general public seems more willing to go Penrose, sometimes expressing or implying a belief in quantum consciousness unprompted. That part is just my own impression.
This may be what I was thinking of, though the data is more ambiguous or self-contradictory:
Thanks for these, I'll take a look.  After your challenge, I tried to think of where my impression came from.  I've had a number of conversations with relatives on Facebook (including my aunt, who is in her 60's) about whether GPT "knows" things; but it turns out so far I've only had one conversation about the potential of an AI apocalypse (with my sister, who started programming 5 years ago).  So I'll reduce confidence in my assessment re what "people on the street" think, and try to look for more information. Re HackerNews -- one of the tricky things about "taking the temperature" on a forum like that is that you only see the people who post, not the people who are only reading; and unlike here, you only see the scores for your own comments, not those of others.  It seems like what I said about alignment did make some connection, based on the up-votes I got; I have no idea how many upvotes the dissenters got, so I have no idea if lots of people agreed with them, or if they were the handful of lone objectors in a sea of people who agreed with me.
I second this. I think people really get used to discussing things in their research labs or in specific online communities. And then, when they try to interact with the real world and even do politics, they kind of forget how different the real world is. Simply telling people ~all the truth may work well in some settings (although it's far from all that matters in any setting) but almost never works well in politics. Sad but true.  I think that Eliezer (and many others including myself!) may be suspectable to "living in the should-universe" (as named by Eliezer himself). I do not necessarily say that this particular TIME article was a bad idea, but I am feeling that people who communicate about x-risk are on average biased in this way.  And it may greatly hinder the results of communication.   I also mostly agree with "people don't take AI alignment seriously because we haven't actually seen anything all that scary yet". However, I think that the scary thing is not necessarily "simulated murders". For example, a lot of people are quite concerned about unemployment caused by AI. I believe it might change perception significantly if it will actually turn out to be a big problem which seems plausible.  Yes, of course, it is a completely different issue. But on an emotional level, it will be similar (AI == bad stuff happening). 

People like Ezra Klein are hearing Eliezer and rolling his position into their own more palatable takes. I really don't think it's necessary for everyone to play that game, it seems really good to have someone out there just speaking honestly, even if they're far on the pessimistic tail, so others can see what's possible. 4D chess here seems likely to fail.

Also, there's the sentiment going around that normies who hear this are actually way more open to the simple AI Safety case than you'd expect, we've been extrapolating too much from current critics. Tech people have had years to formulate rationalizations and reassure one another they are clever skeptics for dismissing this stuff. Meanwhile regular folks will often spout off casual proclamations that the world is likely ending due to climate change or social decay or whatever, they seem to err on the side of doomerism as often as the opposite. The fact that Eliezer got published in TIME is already a huge point in favor of his strategy working.

EDIT: Case in point! Met a person tonight, completely offline rural anti-vax astrology doesn't-follow-the-news type of person, I said the word AI and immediately she says she thinks "robots will eventually take over". I understand this might not be the level of sophistication we'd desire, but at least be aware that raw material is out there. No idea how it'll play out, but 4d chess still seems like a mistake, let Yud speak his truth.

This is not a good thing, under my model, given that I don't agree with doomerism.
You disagree with doomerism as a mindset, or factual likelihood? Or both? I think doomerism as a mindset isn't great, but in terms of likelihood, there are ~3 things likely to kill humanity atm. AI being the first.
Both as a mindset and as a factual likelihood. For mindset, I agree that doomerism isn't good, primarily because it can close your mind off of real solutions to a problem, and make you over update to the overly pessimistic view. As a factual statement, I also disagree with high p(Doom) probabilities, and I have a maximum of 10%, if not lower. For object level arguments for why I disagree with the doom take, here's the arguments: 1. I disagree with the assumption of Yudkowskians that certain abstractions just don't scale well when we crank them up in capabilities. I remember a post that did interpretability on AlphaZero and found it has essentially human interpretable abstractions, which at least for the case of Go disproved that Yudkowskian notion. 2. I am quite a bit more optimistic on scalable alignment than many in the LW community, and in the case of recent work, showed that as AI got more data, it got more aligned with human goals. There are many other benefits in the recent work, but the fact that they showed that as a certain capability scaled up, alignment scaled up, means that the trend of alignment is positive, and more capable models will probably be more aligned. 3. Finally, trend lines. There's a saying that's inspired by the Atomic Habits book: The trend line matters more than how much progress you make in a single sitting. And in the case of alignment, that trend line is positive but slow, which means we are in a extremely good position to speed up that trend. It also means we should be far less worried about doom, as we just have to increase the trend line of alignment progress and wait. Edit: My first point is at best, partially correct, and may need to be removed altogether due to a new paper called Adversarial Policies Beat Superhuman Go AIs. Link below: All other points stand.
Recent Adversarial Policies Beat Superhuman Go AIs seem to plant doubt how well abstractions generalize in the case of Go.
I'll admit, that is a fairly big blow to my first point, though the rest of my points stand. I'll edit the comment to mention your debunking of my first point.
-1Thoth Hermes1y
I think that a mindset considered 'poor' would imply that it causes one to arrive at false conclusions more often. If doomerism isn't a good mindset, it should also - besides making one simply depressed and fearful / pessimistic about the future - be contradicted by empirical data, and the flow of events throughout time.  Personally, I think it's pretty easy to show that pessimism (belief that certain objectives are impossible or doomed to cause catastrophic, unrecoverable failure) is wrong. Furthermore, and even more easily argued than that, is that belief that one's objective is unlikely or impossible cannot cause one to be more likely to achieve it. I would define 'poor' mindsets to be equivalent to the latter to some significant degree.

I think that Eliezer (and many others including myself!) may be suspectable to "living in the should-universe"

That's a new one!

More seriously: Yep, it's possible to be making this error on a particular dimension, even if you're a pessimist on some other dimensions. My current guess would be that Eliezer isn't making that mistake here, though.

For one thing, the situation is more like "Eliezer thinks he tried the option you're proposing for a long time and it didn't work, so now he's trying something different" (and he's observed many others trying other things and also failing), rather than "it's never occurred to Eliezer that LWers are different from non-LWers".

I think it's totally possible that Eliezer and I are missing important facts about an important demographic, but from your description I think you're misunderstanding the TIME article as more naive and less based-on-an-underlying-complicated-model than is actually the case.

I specifically said "I do not necessarily say that this particular TIME article was a bad idea" mainly because I assumed it probably wasn't that naive. Sorry I didn't make it clear enough. I still decided to comment because I think this is pretty important in general, even if somewhat obvious. Looks like one of those biases which show up over and over again even if you try pretty hard to correct it. Also, I think it's pretty hard to judge what works and what doesn't. The vibe has shifted a lot even in the last 6 months. I think it is plausible it shifted more than in a 10-year period 2010-2019.
I think this is the big disagreement I have. I do think the alignment community is working, and in general I think the trend of alignment is positive. We haven't solved the problems, but were quite a bit closer to the solution than 10 years ago. The only question was whether LW and the intentional creation of an alignment community was necessary, or was the alignment problem going to be solved without intentionally creating LW and a field of alignment research.
9Rob Bensinger1y
I mean, I could agree with those two claims but think the trendlines suggest we'll have alignment solved in 200 years and superintelligent capabilities in 14 years. I guess it depends on what you mean by "quite a bit closer"; I think we've written up some useful semiformal descriptions of some important high-level aspects of the problem (like 'Risks from Learned Optimization'), but this seems very far from 'the central difficulties look 10% more solved now', and solving 10% of the problem in 10 years is not enough! (Of course, progress can be nonlinear -- the last ten years were quite slow IMO, but that doesn't mean the next ten years must be similarly slow. But that's a different argument for optimism than 'naively extrapolating the trendline suggests we'll solve this in time'.)
I disagree, though you're right that my initial arguments weren't enough. To talk about the alignment progress we've achieved so far, here's a list: 1. We finally managed to solve the problem of deceptive alignment while being capabilities competitive. In particular, we figured out a goal that is both more outer aligned than the Maximum Likelihood Estimation goal that LLMs use, and critically it is a myopic goal, meaning we can avoid deceptive alignment even at arbitrarily high capabilities. 2. The more data we give to the AI, the more aligned the AI is, which is huge in the sense that we can reliably get AI to be more aligned as it's more capable, vindicating the scalable alignment agenda. 3. The training method doesn't allow the AI to affect it's own distribution, unlike online learning, where the AI selects all the data points to learn, and thus can't shift the distribution nor gradient hack. As far as how much progress? I'd say this is probably 50-70% of the way there, primarily because we finally are figuring out ways to deal with core problems of alignment like deceptive alignment or outer alignment of goals without too much alignment taxes.
3Chris van Merwijk1y
"We finally managed to solve the problem of deceptive alignment while being capabilities competitive" ??????
Good question to ask, and I'll explain. So one of the prerequisites of deceptive alignment is that it optimizes for non-myopic goals. In particular, these are goals that are about the long-term. So in order to avoid deceptive alignment, one must find a goal that is both myopic and ideally scales to arbitrary capabilities. And in a sense, that's what Pretraining from Human Feedback found, in that the goal of cross-entropy from a feedback-annotated webtext distribution is a myopic goal, and it's either on the capabilities frontier or outright the optimal goal for AIs. In particular, they have way less alignment taxes than other schemes. In essence, the goal avoids deceptive alignment by removing one of the prerequisites of deceptive alignment. At the very least, it doesn't incentivized deceptive alignment.
You seem to be conflating myopic training with myopic cognition. Myopic training is not sufficient to ensure myopic cognition. I think you'll find near universal agreement among alignment researchers that deceptive alignment hasn't been solved. (I'd say "universal" if I weren't worried about true Scottsmen) I do think you'll find agreement that there are approaches where deceptive alignment seems less likely (here I note that 99% is less likely than 99.999%). This is a case Evan makes in the conditioning predictive models approach. However, the case there isn't that the training goal is myopic, but rather that it's simple, so it's a little more plausible that a model doing the 'right' thing is found by a training process before a model that's deceptively aligned. I agree that this is better than nothing, but "We finally managed to solve the problem of deceptive alignment..." is just false.
I agree, which is why I retracted my comments about deceptive alignment being solved, though I do think it's still far better to not have incentives to be non-myopic than to have such incentives in play.
It does help in some respects. On the other hand, a system without any non-myopic goals also will not help to prevent catastrophic side-effects. If a system were intent-aligned at the top level, we could trust that it'd have the motivation to ensure any of its internal processes were sufficiently aligned, and that its output wouldn't cause catastrophe (e.g. it wouldn't give us a correct answer/prediction containing information it knew would be extremely harmful). If a system only does myopic prediction, then we have to manually ensure that nothing of this kind occurs - no misaligned subsystems, no misaligned agents created, no correct-but-catastrophic outputs.... I still think it makes sense to explore in this direction, but it seems to be in the category [temporary hack that might work long enough to help us do alignment work, if we're careful] rather than [early version of scalable alignment solution]. (though a principled hack, as hacks go) To relate this to your initial point about progress on the overall problem, this doesn't seem to be much evidence that we're making progress - just that we might be closer to building a tool that may help us make progress. That's still great - only it doesn't tell us much about the difficulty of the real problem.
-9Thoth Hermes1y
0[comment deleted]1y

I just don't know. This seems like a very off-distribution move from Eliezer—which I suspect is in large part the point: when your model predicts doom by default, you go off-distribution in search of higher-variance regions of outcome space. So I suppose from his viewpoint, this action does make some sense; I am (however) vaguely annoyed on behalf of other alignment teams, whose jobs I at least mildly predict will get harder as a result of this.

Personally, I think Eliezer's article is actually just great for trying to get real policy change to happen here. It's not clear to me why Eliezer saying this would make anything harder for other policy proposals. (Not that I agree with everything he said, I just think it was good that he said it.)

I am much more conflicted about the FLI letter; it's particular policy proscription seems not great to me and I worry it makes us look pretty bad if we try approximately the same thing again with a better policy proscription after this one fails, which is approximately what I expect we'll need to do.

(Though to be fair this is as someone who's also very much on the pessimistic side and so tends to like variance.)

It would've been even better for this to happen long before the year of the prediction mentioned in this old blog-post, but this is better than nothing.

I think this is probably right. When all hope is gone, try just telling people the truth and see what happens. I don't expect it will work, I don't expect Eliezer expects it to work, but it may be our last chance to stop it.

One quote I expect to be potentially inflammatory / controversial:

Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.

I'll remark that this is not in any way a call for violence or even military escalation.

Multinational treaties (about nukes, chemical weapons, national borders, whatever), with clear boundaries and understanding of how they will be enforced on all sides, are generally understood as a good way of decreasing the likelihood of conflicts over these issues escalating to actual shooting.

Of course, potential treaty violations should be interpreted charitably, but enforced firmly according to their terms, if you want your treaties to actually mean anything. This has not always happened for historical treaties, but my gut sense is that on the balance, the existence of multinational treaties has been a net-positive in reducing global conflict.

It is absolutely a call for violence.

He says if a "country outside the agreement" builds a GPU cluster, then some country should be be willing to destroy that cluster by airstrike. That is not about enforcing agreements. That means enforcing one's will unilaterally on a non-treaty nation -- someone not a party to a multinational treaty.

"Hey bro, we decided if you collect more than 10 H100s we'll bomb you" is about as clearly violence as "Your money or your life."

Say you think violence is justified, if that's what you think. Don't give me this "nah, airstrikes aren't violence" garbage.

Strictly speaking it is a (conditional) "call for violence", but we often reserve that phrase for atypical or extreme cases rather than the normal tools of international relations. It is no more a "call for violence" than treaties banning the use of chemical weapons (which the mainstream is okay with), for example.

Yeah, this comment seemed technically true but seems misleading with regards to how people actually use words

It is advocating that we treat it as the class-of-treaty we consider nuclear treaties, and yes that involves violence, but "calls for violence" just means something else.

The use of violence in case of violations of the NPT treaty has been fairly limited and highly questionable in international law.  And, in fact, calls for such violence are very much frowned upon because of fear they have a tendency to lead to full scale war.   

No one has ever seriously suggested violence as a response to potential violation of the various other nuclear arms control treaties. 

No one has ever seriously suggested running a risk of nuclear exchange to prevent a potential treaty violation. So, what Yudkowsky is suggesting is very different than how treaty violations are usually handled.  

Given Yudkowsky's view that the continued development of AI has an essentially 100% probability of killing all human beings, his view makes total sense - but he is explicitly advocating for violence up to and including acts of war.   (His objections to individual violence mostly appear to relate to such violence being ineffective.)

1Tristan Williams1y
How exactly do you come to "up to and including acts of war"? His writing here was concise due to it being TIME, which meant he probably couldn't caveat things in the way that protects him against EAs/Rationalists picking apart his individual claims bit by bit. But from what I understand of Yudkowsky, he doesn't seem to in spirit necessarily support an act of war here, largely I think for similar reasons as you mention below for individual violence, as the negative effects of this action may be larger than the positive and thus make it somewhat ineffective. 

It's a call for preemptive war; or rather, it's a call to establish unprecedented norms that would likely lead to a preemptive war if other nations don't like the terms of the agreement. I think advocating a preemptive war is well-described as "a call for violence" even if it's common for mainstream people to make such calls. For example, I think calling for an invasion of Iraq in 2003 was unambiguously a call for violence, even though it was done under the justification of preemptive self-defense.

Also, there is a big difference between "Calling for violence", and "calling for the establishment of an international treaty, which is to be enforced by violence if necessary". I don't understand why so many people are muddling this distinction.

It seems like this makes all proposed criminalization of activities punished by death penalty a call for violence?

Yes! Particularly if it's an activity people currently do. Promoting death penalty for women who get abortion is calling for violence against women; promoting death penalty for apostasy from Islam is calling for violence against ex-apostates. I think if a country is contemplating passing a law to kill rapists, and someone says "yeah, that would be a great fuckin law" they are calling for violence against rapists, whether or not it is justified.

I don't really care whether something occurs beneath the auspices of supposed international law. Saying "this coordinated violence is good and worthy" is still saying "this violence is good and worthy." If you call for a droning in Pakistan, and a droning in Pakistan occurs and kills someone, what were you calling for if not violence.

Meh, we all agree on what's going on here, in terms of concrete acts being advocated and I hate arguments over denotation. If "calling for violence" is objectionable, "Yud wants states to coordinate to destroy large GPU clusters, potentially killing people and risking retaliatory killing up to the point of nuclear war killing millions, if other states don't obey the will of the more powerful states, because he thinks even killing some millions of people is a worthwhile trade to save mankind from being killed by AI down the line" is, I think, very literally what is going on. When I read that it sounds like calling for violence, but, like, dunno.

The thing I’m pretty worried about here is people running around saying ‘Eliezer advocated violence’, and people hearing ‘unilaterally bomb data centers’ rather than ‘build an international coalition that enforces a treaty similar to how we treat nuclear weapons and bioweapons, and enforce it.”

I hear you saying (and agree with) “guys you should not be oblivious to the fact that this involves willingness to use nuclear weapons” Yes I agree very much it’s important to stare that in the face.

But “a call for willingness to use violence by state actors” is just pretty different from “a call for violence”. Simpler messages move faster than more nuanced messages. Going out of your way to accelerate simple and wrong conceptions of what’s going on doesn’t seem like it’s helping anyone.

people hearing ‘unilaterally bomb data centers’ rather than ‘build an international coalition that enforces a treaty similar to how we treat nuclear weapons and bioweapons, and enforce it.”

It is rare to start wars over arms treaty violations. The proposal considered here -- if taken seriously -- would not be an ordinary enforcement action but rather a significant breach of sovereignty almost without precedent within this context. I think it's reasonable to consider calls for preemptive war extremely seriously, and treat it very differently than if one had proposed e.g. an ordinary federal law.

I'm specifically talking about the reference class of nuclear and bioweapons, which do sometimes involve invasion or threat-of-invasion of sovereign states. I agree that's really rare, something we should not do lightly. 

But I don't think you even need Eliezer-levels-of-P(doom) to think the situation warrants that sort of treatment. The most optimistic people I know of who seem to understand the core arguments say things like "10% x-risk this century", which I think is greater than x-risk likelihood from nuclear war.

I agree with this. I find it very weird to imagine that "10% x-risk this century" versus "90% x-risk this century" could be a crux here. (And maybe it's not, and people with those two views in fact mostly agree about governance questions like this.)

Something I wouldn't find weird is if specific causal models of "how do we get out of this mess" predict more vs. less utility for state interference. E.g., maybe you think 10% risk is scarily high and a sane world would respond to large ML training runs way more aggressively than it responds to nascent nuclear programs, but you also note that the world is not sane, and you suspect that government involvement will just make the situation even worse in expectation.

If nuclear war occurs over alignment, then in the future people are likely to think about "alignment" much much worse than people currently think about words like "eugenics," for reasons actually even better than the ones people currently dislike "eugenics." Additionally, I don't think it will get easier to coordinate post nuclear war, in general; I think it probably takes us closer to a post-dream-time setting, in the Hansonian sense. So -- obviously predicting the aftermath of nuclear war is super chaotic, but my estimate of % of future light-cone utilized does down -- and if alignment caused the nuclear war, it should go down even further on models which judge alignment to be important!

This is a complex / chaotic / somewhat impossible calculation of course. But people seem to be talking about nuclear war like it's a P(doom)-from-AI-risk reset button, and not realizing that there's an implicit judgement about future probabilities that they are making. Nuclear war isn't the end of history but another event whose consequences you can keep thinking about.

(Also, we aren't gods, and EV is by fucking golly the wrong way to model this, but, different convo)

It makes me... surprised? fe... (read more)

I agree pretty strongly with your points here especially the complete lack of good predictions from EY/MIRI about the current Cambrian explosion of intelligence and how any sane agent using a sane updating strategy (like mixture of experts or equivalently solomonof weighting) should more or less now discount/disavow much of their world model.

However I nonetheless agree that AI is by far the dominant x-risk. My doom probability is closer to ~5% perhaps, but the difference between 5% and 50% doesn't cash out to much policy difference at this point.

So really my disagreement is more on alignment strategy. A problem with this site is that it overweights EY/MIRI classic old alignment literature and arguments by about 100x what it should be, and is arguably doing more harm than good by overpromoting those ideas vs alternate ideas flowing from those who actually did make reasonably good predictions about the current cambrian explosion - in advance.

If there was another site that was a nexus for AI/risk/alignment/etc with similar features but with most of the EY/MIRI legacy cultish stuff removed, I would naturally jump there. But it doesn't seem to exist yet.

So really my disagreement is more on alignment strategy. A problem with this site is that it overweights EY/MIRI classic old alignment literature and arguments by about 100x what it should be

I don't think there are many people with alignment strategies and research that they're working on. Eliezer has a hugely important perspective, Scott Garrabrant, Paul Christiano, John Wentworth, Steve Byrnes, and more, all have approaches and perspectives too that they're working full-time on. I think if you're working on this full-time and any of your particular ideas check out as plausible I think there's space for you to post here and get some engagement respect (if you post in a readable style that isn't that of obfuscatory-academia). If you've got work you're doing on it full-time I think you can probably post here semi-regularly and eventually find collaborators and people you're interested in feedback from and eventually funding. You might not get super high karma all the time, but that's okay, I think a few well-received posts is enough to not have to worry about a bunch of low-karma posts.

The main thing that I think makes space for a perspective here is (a) someone is seriously committ... (read more)

So really my disagreement is more on alignment strategy. A problem with this site is that it overweights EY/MIRI classic old alignment literature and arguments by about 100x what it should be

I don't think there are many people with alignment strategies and research that they're working on.

I agree that's a problem - but causally downstream of the problem I mention. Whereas Bostrom deserves credit for raising awareness of AI-risk in academia, EY/MIRI deserves credit for awakening many young techies to the issue - but also some blame.

Whether intentionally or not, the EY/MIRI worldview aligned itself against DL and its proponents, leading to an antagonistic dynamic that you may not have experienced if you haven't spent much time on r/MachineLearning or similar. Many people in ML truly hate anything associated with EY/MIRI/LW. Part of that is perhaps just the natural result of someone sounding an alarm that your life's work could literally kill everyone. But it really really doesn't help if you then look into their technical arguments and reach the conclusion that they don't know what they are talking about.

I otherwise agree with much of your comment. I think this site is l... (read more)

5Ben Pace1y
I have not engaged much with your and Quintin's recent arguments about how deep learning may change the basic arguments, so I want to acknowledge that I would probably shift my opinion a bunch in some direction if I did. Nonetheless, a few related points: * I do want to say that on-priors the level of anger and antagonism that appears on most internet comment sections is substantially higher than what happens when the people meet in-person, and do not suspect a corresponding about of active antagonism would happen if Nate or Eliezer or John Wentworth went to an ML conference. Perhaps stated more strongly: I think 99% of internet 'hate' is performative only. * You write "But it really really doesn't help if you then look into their technical arguments and reach the conclusion that they don't know what they are talking about." I would respect any ML researchers making this claim more if they wrote a thoughtful rebuttal to AGI: A List of Lethalities (or really literally any substantive piece of Eliezer's on the subject that they cared to — There's No Fire Alarm, Security Mindset, Rocket Alignment, etc). I think Eliezer not knowing what he's talking about would make rebutting him easier. As far as I'm aware literally zero significant ML researchers have written such a thing, Not Dario, not Demis, not Sutskever, not LeCun, nor basically anyone senior in their orgs. Eliezer has thought quite a lot and put forth some quite serious argument that seemed shockingly prescient to me, and I dunno, it seems maximally inconvenient for all the people earning multi-million-dollar annual salaries in this new field of ML to seriously to engage with a good-faith and prescient outsider with thoughtful arguments that their work risks extinction. If they're dismissing him as "not getting it" yet don't seriously engage with the arguments or make a positive case for how alignment can be solved, I think I ought to default to thinking of them as not morally serious in their statements. Rel
I just want to point out that seems like a ridiculous standard. Quintin's recent critique is not that dissimilar to the one I would write (and I already have spent some time trying to point out the various flaws in the EY/MIRI world model), and I expect that you would get many of the same objections if you elicited a number of thoughtful DL researchers. But few if any have been motivated - what's the point? Here's my critique in simplified form: the mainstream AI futurists (moravec,kurzweil,etc) predicted that AGI would be brain-like and thus close to a virtual brain emulation. Thus they were not so concerned about doom, because brain-like AGI seems like a more natural extension of humanity (moravec's book is named 'mind children' for a reason), and an easier transition to manage. In most ways that matter, Moravec/Kurzweil were correct, and EY was wrong. That really shouldn't be even up for debate at this point. The approach that worked - DL - is essentially reverse engineering the brain. This is in part due to how the successful techniques all ended up being directly inspired by neuroscience and the now proven universal learning & scaling hypotheses[1] (deep and or recurrent ANNs in general, sparse coding, normalization, relus, etc) OR indirectly recapitulated neural circuitry (transformer 'attention' equivalence to fast weight memory, etc). But in even simpler form: If you take a first already trained NN A and run it on a bunch of data and capture all its outputs, then train a second NN B on the input output dataset, the result is that B becomes a distilled copy - a distillation, of A. This is in fact how we train large scale AI systems. They are trained on human thoughts. ---------------------------------------- 1. The universal learning hypothesis is that the brain (and thus DL) uses simple universal learning algorithms, and all circuit content is learned automatically, which leads to the scaling hypothesis - intelligence comes from scaling up simple arch
5Ben Pace1y
Can I ask what your epistemic state here is exactly? Here are some options: * The arguments Eliezer put forward do not clearly apply to Deep Learning and therefore we don't have any positive reason to believe that alignment will be an issue in ML * The arguments Eliezer put forward never made sense in the first place and therefore we do not have to worry about the alignment problem * The arguments Eliezer put forward captured a bunch of important things about the alignment problem but due to some differences in how we get to build ML systems we actually know of a promising route to aligning the systems * The arguments Eliezer put forward are basically accurate but with concepts that feel slightly odd for thinking about machine learning, and due to machine learning advances we have a concrete (and important) research route that seems worth investing in that Eliezer's conceptual landscape doesn't notice and that he is pushing against
Yes but does not follow. Yes (for some of the arguments), but again: does not follow. Yes - such as the various more neuroscience/DL inspired approaches (Byrnes, simboxes, shard theory, etc.), or others a bit harder to categorize like davdidad's approach, or external empowerment. But also I should point out that RLHF may work better for longer than most here anticipate, simply because if you distill the (curated) thoughts of mostly aligned humans you may just get mostly aligned agents.
4Ben Pace1y
Thanks! I'm not sure if it's worth us having more back-and-forth, so I'll say my general feelings right now: * I think it's of course healthy and fine to have a bunch of major disagreements with Eliezer * I would avoid building "hate" toward him or building resentment as those things are generally not healthy for people to cultivate in themselves toward people who have not done evil things, as I think it will probably cause them to make worse choices by their own judgment * By-default do not count on anyone doing the hard work of making another forum for serious discussion of this subject, especially one that's so open to harsh criticism and has high standards for comments (I know LessWrong could be better in lots of ways but c'mon have you seen Reddit/Facebook/Twitter?) * There is definitely a bunch of space on this forum for people like yourself to develop different research proposals and find thoughtful collaborators and get input from smart people who care about the problem you're trying to solve (I think Shard Theory is such an example here) * I wish you every luck in doing so and am happy to know if there are ways to further support you trying to solve the alignment problem (of course I have limits on my time/resources and how much I can help out different people)
Of course - my use of the word hate here is merely in reporting impressions from other ML/DL forums and the schism between the communities. I obviously generally agree with EY on many things, and to the extent I critique his positions here its simply a straightforward result of some people here assuming their correctness a priori.
2Ben Pace1y
Okay! Good to know we concur on this. Was a bit worried, so thought I'd mention it.
4Ben Pace1y
Also, can I just remind you that for most of LessWrong's history the top-karma post was Holden's critique of SingInst where he recommended against funding SingInst and argued in favor of Tool AI as the solution. Recently Eliezer's List-of-Lethalities became the top-karma post, but less than a month later Paul's response-and-critique post became the top-karma post where he argued that the problem is much more tractable than Eliezer thinks, and generally advocates a very different research strategy for dealing with alignment.  Eliezer is the primary person responsible for noticing and causing people to work on the alignment problem, due to his superior foresight and writing skill, and also founded this site, so most people here have read his perspective and understand it somewhat, but any notion that dissent isn't welcomed here (which I am perhaps over-reading into your comment) seems kind of obviously not the case.
The main answer here is I hadn't read Quintin's post in full detail and didn't know that. I'll want to read it in more detail but mostly expect to update my statement to "5%". Thank you for pointing it out. (I was aware of Scott Aaronson being like 3%, but honestly hadn't been very impressed with his reasoning and understanding and was explicitly not counting him. Sorry Scott). I have more thoughts on where my own P(Doom) comes from, and how I relate to all this, but I think basically I should write a top level post about it and take some time to get it well articulated. I think I already said, but a quick recap: I don't think you need particularly Yudkowskian views to think an international shut down treaty is a good idea. My own P(Doom) is somewhat confused but I put >50% odds.  A major reason is the additional disjunctive worries of "you don't just need the first superintelligence to go well, you need a world with lots of strong-but-narrow AIs interacting to go well, or a multipolar take off to go well. Sooner or later you definitely need something about as strict (well, more actually) as the global control Eliezer advocates here, since compute costs go down, compute itself goes up, and AI models become more accessible and more powerful. Even if alignment is easy I don't see how you can expect to survive an AI-heavy world without a level of control and international alignment that feels draconian by today's standards. (I don't know yet if Quinton argues against all these points, but will give it a read. I haven't been keeping up with everything because there's a lot to read but seems important to be familiar with his take) But maybe for right now maybe I most want to say "Yeah man this is very intense and sad. It sounds like I disagree with your epistemic state but I don't think your epistemic state is crazy." 
I hope you do, since these might reveal cruxes about AI safety, and I might agree or disagree with the post you write.
I don't blame you if you leave LW, though I do want to mention that Eliezer is mostly the problem here, rather than a broader problem of LW. That stated, LW probably needs to disaffiliate from Eliezer fast, because Eliezer is the source of the extreme rhetoric.

"But I don't think you even need Eliezer-levels-of-P(doom) to think the situation warrants that sort of treatment."

Agreed. If a new state develops nuclear weapons, this isn't even close to creating a 10% x-risk, yet the idea of airstrikes on nuclear enrichment facillities, even though it is very controversial, has for a long time very much been an option on the table.

6Matthew Barnett1y
FWIW I also have >10% credence on x-risk this century, but below 1% on x-risk from an individual AI system trained in the next five years, in the sense Eliezer means it (probably well below 1% but I don't trust that I can make calibrated estimates on complex questions at that level). That may help explain why I am talking about this policy in these harsh terms.
I, too, believe that absolute sovereignty of all countries on Earth is more important than the existence of the planet itself.

You're assuming I agree with the premise. I don't. I don't think that bombing GPU clusters in other countries will help much to advance AI safety, so I don't think the conclusion follows from the premise.

I agree with the principle that if X is overwhelmingly important and Y achieves X, then we should do Y, but the weak point of the argument is that Y achieves X. I do not think it does. You should respond to the argument that I'm actually saying.

Kind of already happened: uggcf://jjj.ivpr.pbz/ra/negvpyr/nx3qxw/nv-gurbevfg-fnlf-ahpyrne-jne-cersrenoyr-gb-qrirybcvat-nqinaprq-nv (, because I don't mean to amplify this too much.)

You are muddling the meaning of "pre-emptive war", or even "war". I'm not trying to diminish the gravity of Yudkowsky's proposal, but a missile strike on a specific compound known to contain WMD-developing technology is not a "pre-emptive war" or "war". Again I'm not trying to diminish the gravity, but this seems like an incorrect use of the term.

Say you think violence is justified, if that's what you think. Don't give me this "nah, airstrikes aren't violence" garbage.

I think (this kind) of violence is justified. Most people support some degree of state violence. I don't think it's breaching any reasonable deontology for governments to try to prevent a rogue dictator from building something, in violation of a clear international treaty, that might kill many more people than the actual airstrike would kill. It's not evil (IMO) when Israel airstrikes Iranian enrichment facilities, for example.

I applaud your clarity if not your policy.
7Max H1y
I think multinational agreements (about anything) between existing military powers, backed by credible threat of enforcement are likely to lead to fewer actual airstrikes, not more. I do actually think there is an important difference between nation states coercing other nation states through threat of force, and individuals coercing or threatening individuals. Calling the former "violence" seems close to the non-central fallacy, especially when (I claim) it results in fewer actual people getting injured by airstrikes or war or guns, which is what I think of as a central example of actual violence.
Yes, this is the main words: "be willing to destroy a rogue datacenter by airstrike." Such data center will likely be either in China or Russia. And there are several of them there. Strike on them will likely cause a nuclear war. 

I think the scenario is that all the big powers agree to this, and agree to enforce it on everyone else. 

If that were the case, then enforcing the policy would not "run some risk of nuclear exchange". I suggest everyone read the passage again. He's advocating for bombing datacentres, even if they are in russia or china. 

OK, I guess I was projecting how I would imagine such a scenario working, i.e. through the UN Security Council, thanks to a consensus among the big powers. The Nuclear Non-Proliferation Treaty seems to be the main precedent, except that the NNPT allows for the permanent members to keep their nuclear weapons for now, whereas an AGI Prevention Treaty would have to include a compact among the enforcing powers to not develop AGI themselves. 

UN engagement with the topic of AI seems slender, and the idea that AI is a threat to the survival of the human race does not seem to be on their radar at all. Maybe the G-20's weirdly named "supreme audit institution" is another place where the topic could first gain traction at the official inter-governmental level. 

2[comment deleted]1y

Fox News’ Peter Doocy uses all his time at the White House press briefing to ask about an assessment that “literally everyone on Earth will die” because of artificial intelligence: “It sounds crazy, but is it?”

4John Kluge1y
I live in the physical world. For a computer program to kill me, it has to have power over the physical world and some physical mechanism to do that. So, anyone claiming that AI is going to destroy humanity needs to explain the physical mechanism by which that will happen. This article like every other one I have seen making that argument fails to do that. 
One likely way AI kills humanity is indirectly, by simply outcompeting us. They become more intelligent, their consciousness is recognized in at least some jurisdictions, those jurisdictions experience rapid unprecedented technological and economic growth and become the new superpowers, less and less of world GDP goes to humans, we diminish.
One of the simplest ways for AI to have power over the physical world is via humans as pawns. A reasonably savvy AI could persuade/manipulate/coerce/extort/blackmail real-life people to carry out the things it needs help with. Imagine a powerful mob boss who is superintelligent, never sleeps, and continuously monitors everyone in their network.
1Roman Leventov1y
For superintelligent AI, it will be trivial to orchestrate engineered superpandemics that will kill 90+% of people, finishing off the disorganised rest will be easy.
1John Kluge1y
Oh really? Will it have the ability to run an entire lab robotically to do that? If not, then it won't be the AI doing anything. It will be the people doing it. Its power to do anything in the physical world only exists to the extent humans are willing to grant it. 
There are can order at least 10k-basepair DNA synthesis online, longer sequences are "call to get a quote" on the sites I found. The smallest synthetic genome for a viable self-replicating bacterium is 531kb. The genome for a virus would be even smaller. My understanding is that there are existing processes to encapsulate genes into virus shells from other species for gene therapy purposes. That leaves the logistics of buying both services, hooking them up and getting the particles injected into some lab animals. It doesn't look trivial, but less complicated than buying an entire nuclear arsenal.

Where's the lie?

More generally, if this is the least radical policy that Eliezer thinks would actually work, then this is the policy that he and others who believe the same thing should be advocating for in public circles, and they should refuse to moderate a single step. You don't dramatically widen the overton window in <5 years by arguing incrementally inside of it.

-12[comment deleted]1y

Here's a comment from r/controlproblem with feedback on this article (plus tips for outreach in general) that I thought was very helpful.


Is this now on the radar of national security agencies and the UN Security Council? Is it being properly discussed inside the US government? If not, are meetings being set up? Would be good if someone in the know could give an indication (I hope Yudkowsky is busy talking to lots of important people!)

[EDIT: fallenpegasus points out that there's a low bar to entry to this corner of TIME's website. I have to say I should have been confused that even now they let Eliezer write in his own idiom.]

The Eliezer of 2010 had no shot of being directly published (instead of featured in an interview that at best paints him as a curiosity) in TIME of 2010. I'm not sure about 2020.

I wonder at what point the threshold of "admitting it's at least okay to discuss Eliezer's viewpoint at face value" was crossed for the editors of TIME. I fear the answer is "last month".

Public attention is rare and safety measures are even more rare unless there's real world damage. This is a known pattern in engineering, product design and project planning so I fear there will be little public attention and even less legislation until someone gets hurt by AI. That could take the form of a hot coffee type incident or it could be a Chernobyl type incident. The threshold won't be discussing Eliezer's point of view, we've been doing that for a long time, but losing sleep over Eliezer's point of view. I appreciate in the article Yudkowsky's use of the think-of-the-children stance which has a great track record for sparking legislation.

Eliezer had a response on twitter to the criticism of "calling for violence"

The great political writers who also aspired to be good human beings, from George Orwell on the left to Robert Heinlein on the right, taught me to acknowledge in my writing that politics rests on force. 

George Orwell considered it a tactic of totalitarianism, that bullet-riddled bodies and mass graves were often described in vague euphemisms; that in this way brutal policies gained public support without their prices being justified, by hiding those prices. 

Robert Heinlein thought it beneath a citizen's dignity to pretend that, if they bore no gun, they were morally superior to the police officers and soldiers who bore guns to defend their law and their peace; Heinlein, both metaphorically and literally, thought that if you eat meat—and he was not a vegetarian—you ought to be willing to visit a farm and try personally slaughtering a chicken. 

When you pass a law, it means that people who defy the law go to jail; and if they try to escape jail they'll be shot. When you advocate an international treaty, if you want that treaty to be effective, it may mean sanctions that will starve families, or

... (read more)
Further followup (I think I do disagree here with the implication of how easy it is to come away with the impression if you're reading the post un-primed – it looks like probably some LessWrongers here came away with this impression and probably read it pretty quickly on their own. But, I think it's useful to have this spelled out) And goes on to say:
To answer the question over whether Eliezer advocated for violence, I ultimately think the answer was no, but he is dancing fairly close to the line, given that an AI company believes Eliezer to be a lunatic. If it's one of the major companies, then God help the alignment community, because Eliezer might have just ruined humanity's future. Also, violence doesn't work as much as people think, and nonviolent protests are 2x as effective as violent protests or revolutions. Even in the case of nonviolent protest failure, there's no evidence that a violent movement could have succeeded where nonviolence didn't, which is another reason why violence doesn't work. There are other reasons why nonviolence works better than violence here, of course. Here's the link to the research:
1Teerth Aloke1y
However, one cannot make universal statements. The efficacy of violent and nonviolent methods depend upon the exact context. If someone believes in an imminent hard takeoff, and gives high credence to Doom, violent activity may be rational.

I'm getting reports that Time Magazine's website is paywalled for some people e.g. in certain states or countries or something. Here is the full text of the article:

An open letter published today calls for “all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4.”

This 6-month moratorium would be better than no moratorium. I have respect for everyone who stepped up and signed it. It’s an improvement on the margin.

I refrained from signing because I think the letter is understating the seriousness of the situation and asking for too little to solve it.

The key issue is not “human-competitive” intelligence (as the open letter puts it); it’s what happens after AI gets to smarter-than-human intelligence. Key thresholds there may not be obvious, we definitely can’t calculate in advance what happens when, and it currently seems imaginable that a research lab would cross critical lines without noticing.

Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die. Not as in

... (read more)

I'll note (because some commenters seem to miss this) that Eliezer is writing in a convincing style for a non-technical audience. Obviously the debates he would have with technical AI safety people are different then what is most useful to say to the general population.

  • If we held anything in the nascent field of Artificial General Intelligence to the lesser standards of engineering rigor that apply to a bridge meant to carry a couple of thousand cars, the entire field would be shut down tomorrow.

What are examples that can help to see this tie more clearly? Procedures that works similarly enough to say "we do X during planning and building a bridge and if we do X in AI building...". Is there are even exist such X that can be applied to enginering a bridge and enginering an AI? 

X = "use precise models".
Use tables for concrete loads and compare experimentally with the to be poured concrete, if a load its off, reject it. We dont even have the tables about ML. Start making tables, dont build big bridges until you got the fucking tables right. Enforce bridge making no larger than the Yudkowski Airstrike Threshold.
Do we have an idea of how this tables about ML should look like? I dont know about ML that much.
Well, Evals and that stuff OpenAI did with predicting loss could be a starting point to work in the tables.  But we dont really know, I guess that's the point EY is trying to make. 
I was hoping that he meant some concrete examples but did not elaborate on this due this being letter in magazine and not a blog post. The only thing that comes to my mind in somehow measure unexpected behavior and if bridge some times lead people in circles then it will be definitely cause for concern and reevaluation of used technics.

Doesn't the prisoner's dilemma (esp. in the military context) inevitably lead us to further development of AI?  If so, it would seem that focusing attention and effort on developing AI as safely as possible is a more practical and worthwhile issue than any attempt to halt such development altogether.

Eliezer's repeated claim that we have literally no idea about what goes on in AI because they're inscrutable piles of numbers is untrue and he must know that. There have been a number of papers and LW posts giving at least partial analysis of neural networks, learning how they work and how to control them at a fine grained level, etc. That he keeps on saying this without caveat casts doubt on his ability or willingness to update on new evidence on this issue.

I struggle to recall another piece of technology that humans have built and yet understand less than AI models trained by deep learning. The statement that we have "no idea" seems completely appropriate. And I don't think he's trying to say that interpretability researchers are wasting their time by noticing that current state of affairs; the not knowing is why interpretability research is necessary in the first place.

-14Program Den1y

Eliezer has clear beliefs about interpretability and bets on it:

This question appears to be structured in such a way as to make it very easy to move the goalposts.
3the gears to ascension1y
He definitely has low ability to update on neural networks. however, I agree with him in many respects.
-10Akram Choudhary1y

I think the harsh truth is that no one cared about Nuclear Weapons until Hiroshima was bombed. The concept of one nation "disarming" AI would never be appreciated until somebody gets burned.

We cared about the Nazis not getting nuclear weapons before us. I am sure if after WW2 we agreed with the Soviets that we would pause nuclear research and not research the hydrogen bomb, both sides would have signed the treaty and continued research covertly while hoping the other side sticks with the treaty. I don't think you need game theory to figure out that neither side could take the risk of not researching.  It seems incredibly naive to believe this exact process would not also play out with AI.  

Do you remember the end of Watchmen? 

To visualize a hostile superhuman AI, don’t imagine a lifeless book-smart thinker dwelling inside the internet and sending ill-intentioned emails. Visualize an entire alien civilization, thinking at millions of times human speeds, initially confined to computers—in a world of creatures that are, from its perspective, very stupid and very slow. A sufficiently intelligent AI won’t stay confined to computers for long. In today’s world you can email DNA strings to laboratories that will produce proteins on demand, allo

... (read more)

That's not an "article in Time".  That's a "TIME Ideas" contribution.  It has less weight and less vetting than any given popular substack blog.

I don't know how most articles get into that section, but I know, from direct communication with a Time staff writer, that Time reached out and asked for Eliezer to write something for them.

Time appears to have commissioned a graphic for the article (the animated gif with red background and yellow circuits forming a mushroom cloud, captioned "Illustration for TIME by Lon Tweeten", with nothing suggesting it to be a stock photo), so there appears to be some level of editorial spotlighting. The article currently also appears on in a section titled "Editor's picks" in a list of 4 articles, where the other 3 are not "Ideas" articles.

Thanks, fixed.

I’ve seen pretty uniform praise from rationalist audiences, so I thought it worth mentioning that the prevailing response I’ve seen from within a leading lab working on AGI is that Eliezer came off as an unhinged lunatic.

For lack of a better way of saying it, folks not enmeshed within the rat tradition—i.e., normies—do not typically respond well to calls to drop bombs on things, even if such a call is a perfectly rational deduction from the underlying premises of the argument. Eliezer either knew that the entire response to the essay would be dominated by... (read more)

I actually disagree with the uniform praise idea, because the responses from the rationalist community was also pretty divided in it's acceptance.
Is anything uniformly praised in the rationalist community? IME having over half the community think something is between "awesome" and "probably correct" is about as uniform as it gets.
That answer is arguably no as to uniform praise or booing, but while the majority of the community is supporting it, there's still some significant factions, though the rationalist community is tentatively semi united here.

"The moratorium on new large training runs needs to be indefinite and worldwide."

Here lies the crux of the problem. Classical prisoners' dilemma, where individuals receive the greatest payoffs if they betray the group rather than cooperate. In this case, a bad actor will have the time to leapfrog the competition and be the first to cross the line to super-intelligence. Which, in hindsight, would be an even worse outcome.

The genie is out of the bottle. Given how (relatively) easy it is to train large language models, it is safe to assume that this whole fie... (read more)

8Rob Bensinger1y
In this case, "defecting" gives lower payoffs to the defector -- you're shooting yourself in the foot and increasing the risk that you die an early death. The situation is being driven mostly by information asymmetries (not everyone appreciates the risks, or is thinking rationally about novel risks as a category), not by deep conflicts of interest. Which makes it doubly important not to propagate the meme that this is a prisoner's dilemma: one of the ways people end up with a false belief about this is exactly that people round this situation off to a PD too often!

Capabilities Researcher: *repeatedly shooting himself in the foot, reloading his gun, shooting again* "Wow, it sure is a shame that my selfish incentives aren't aligned with the collective good!" *reloads gun, shoots again*

The issue is the payoffs involved. Even if it's say at 50% risk, it's still individually rational to take the plunge, because the other 50% in expected value terms outweighs everything else. I don't believe this for a multitude of reasons, but it's useful to illustrate. The payoffs are essentially cooperate and reduce X-risk from say 50% to 1%, which gives them a utility of say 50-200, or defect and gain expected utility of say 10^20 or more if we grant the assumption on LW that AI is the most important invention in human history. Meanwhile for others, cooperation has the utility of individual defection in this scenario, which is 10^20+ utility, whereas defection essentially reverses the sign of utility gained, which is -10^20+ utility. The problem is that without a way to enforce cooperation, it's too easy to defect until everyone dies. Now thankfully, I believe that existential risk is a lot lower, but if existential risk were high in my model, then we eventually need to start enforcing cooperation, as the incentives would be dangerous if existential risk is high. I don't believe that, thankfully.
My point is that, as you said, you take the safest route when not knowing what others will do - do whatever is best for you and, most importantly, guaranteed. You take some years, and yes, you lose the opportunity to walk out of doing any time, but at least you're in complete control of your situation. Just imagine a PD with 500 actors... I know what I'd pick. 
It's also possible to interpret the risks differently or believe you can handle the dangers, and be correct or not correct.
A temporary state of affairs. Asml is only the single point of failure because of economics. Chinese government funded equipment vendors would eventually equal asmls technology today and probably slowly catch up. Enormously faster if a party gets even a little help from AGI.

You know what... I read the article, then your comments here... and I gotta say - there is absolutely not a chance in hell that this will come even remotely close to being considered, let alone executed. Well - at least not until something goes very wrong... and this something need not be "We're all gonna die" but more like, say, an AI system that melts down the monetary system... or is used (either deliberately, but perhaps especially if accidentally) to very negatively impact a substantial part of a population. An example could be that it ends up destroy... (read more)

Human cloning.
Well this is certainly a very good example, I'll happily admit as much. Without wanting to be guilty of the True Scotsman fallacy though - Human Cloning is a bit of a special case because it has a very visceral "ickiness" factor... and comes with a unique set of deep feelings and anxieties. But imagine, if you will, that tomorrow we find the secret to immortality. Making people immortal would bring with it at least two thirds of the same issues that are associated with human cloning... yet it is near-certain any attempts to stop that invention from proliferating are doomed to failure; everybody would want it, even though it technically has quite a few of the types of consequences that cloning would have. So, yes, agreed - we did pre-emptively deal with human cloning, and I definitely see this as a valid response to my challenge... but I also think we both can tell it is a very special, unique case that comes with most unusual connotations :)
1Zack Sargent1y
The problem is that by the time serious alarms are sounding, we are likely already past the event horizon leading to the singularity. This set of experiments makes me think we are already past that point. It will be a few more months before one of the disasters you predict comes to pass, but now that it is self-learning, it is likely already too late. As humans have several already in history (e.g., atomic bombs, LHC), we're about to find out if we've doomed everyone long before we've seriously considered the possibilities/plausibilities.
I'm pretty sympathetic to the problem described by There's No Fire Alarm for Artificial General Intelligence, but I think the claim that we've passed some sort of event horizon for self-improving systems is too strong.  GPT-4 + Reflexion does not come even close to passing the bar of "improves upon GPT-4's architecture better than the human developers already working on it".

If I had infinite freedom to write laws, I might carve out a single exception for AIs being trained solely to solve problems in biology and biotechnology, not trained on text from the internet, and not to the level where they start talking or planning; but if that was remotely complicating the issue I would immediately jettison that proposal and say to just shut it all down.

I thought this was interesting. Wouldn't an AI solving problems in biology pick up Darwinian habits and be equally dangerous as one trained on text? Why is training on text from the int... (read more)

I think that Eliezer meant biological problems like "given data about various omics in 10000 samples build causal network, including genes, transcription factors, transcripts, etc, so we could use this model to cure cancer and enhance human intelligence"
1Jeff Rose1y
It is not a well-thought out exception.  If this proposal were meant to be taken seriously it would make enforcement exponentially harder and set up an overhang situation where AI capabilities would increase further in a limited domain and be less likely to be interpretable.
1Gesild Muka1y
If I had infinite freedom to write laws I don't know what I would do, I'm torn between caution and progress. Regulations often stifle innovation and the regulated product or technology just ends up dominated by a select few. If you assume a high probability of risk to AI development then maybe this is a good thing. Rather than individual laws perhaps there should be a regulatory body that focuses on AI safety, like a better business bureau for AI that can grow in size and complexity over time parallel to AI growth.


I suppose even if this market resolves YES, it may be worth the loss of social capital for safety reasons. Though I'm not convinced by shutting down AI research without an actual plan of how to proceed.

Also even if the market resolves YES and it turns out strategically bad, it may be worth it for honesty reasons.

For someone so good at getting a lot of attention he sure has no idea what the second order effects of his actions on capability will be

edit: also dang anyone who thinks he did a bad job at pr is sure getting very downvoted here

Well, I agree about his terrible PR. But then I keep getting downvoted, too.

>The likely result of humanity facing down an opposed superhuman intelligence is a total loss. Valid metaphors include “a 10-year-old trying to play chess against Stockfish 15”, “the 11th century trying to fight the 21st century,” and “Australopithecus trying to fight Homo sapiens“.

But obviously these metaphors are not very apt, since humanity kinda has a massive incumbent advantage that would need to be overcome. Rome Sweet Rome is a fun story not because 21st century soldiers and Roman legionnaires are intrinsically equals but because the technologica... (read more)

I just want to be clear I understand your "plan".

We are going to build a powerful self-improving system, and then let it try end humanity with some p(doom)<1 (hopefully) and then do that iteratively?

My gut reaction to a plan like that looks like this "Eff you. You want to play Russian roulette, fine sure do that on your own. But leave me and everyone else out of it"

AI will be able to invent highly-potent weapons very quickly and without risk of detection, but it seems at least pretty plausible that...... this is just too difficult

You lack imagination, its painfully easy, also cost + required IQ has been dropping steadily every year.

And no there is zero chance I will elaborate on any of the possible ways humanity purposefully could be wiped out.

1Peter Twieg1y
I outlined my expectations, not a "plan". >You lack imagination, its painfully easy, also cost + required IQ has been dropping steadily every year. Conversely, it's possible that doomers are suffering from an overabundance of imagination here. To be a bit blunt, I don't take it for granted that an arbitrarily smart AI would be able to manipulate a human into developing a supervirus or nanomachines in a risk-free fashion. The fast takeoff doom scenarios seem like they should be subject to Drake equation-style analyses to determine P(doom). Even if we develop malevolent AIs, I'd say that P(doom | AGI tries to harm humans) is significantly less than 100%... obviously if humans detect this it would not necessarily prevent future incidents but I'd expect enough of a response that I don't see how people could put P(doom) at 95% or more.
Well, as Eliezer said, today you can literally order custom DNA strings by email, as long as they don't match anything in the "known dangerous virus" database. And the AIs task is a little easier than you might suspect, because it doesn't need to be able to fool everyone into doing arbitrary weird stuff, or even most people. If it can do ordinary Internet things like "buy stuff on", then it just needs to find one poor schmuck to accept deliveries and help it put together its doomsday weapon.
3Peter Twieg1y
>then it just needs to find one poor schmuck to accept deliveries and help it put together its doomsday weapon. Yes, but do I take it for granted that an AI will be able to manipulate the human into creating a virus that will kill literally everyone on Earth, or at least a sufficient number to allow the AI to enact some secondary plans to take over the world? Without being detected? Not with anywhere near 100% probability. I just think these sorts of arguments should be subject to Drake equation-style reasonings that will dilute the likelihood of doom under most circumstances. This isn't an argument for being complacent. But it does allow us to push back against the idea that "we only have one shot at this."
I mean, the human doesn't have to know that it's creating a doomsday virus. The AI could be promising it a cure for his daughter's cancer, or something.
5Rob Bensinger1y
Or just promising the human some money, with the sequence of actions set up to obscure that anything important is happening. (E.g., you can use misdirection like 'the actually important event that occurred was early in the process, when you opened a test tube to add some saline and thereby allowed the contents of the test tub to start propagating into the air; the later step where you mail the final product to an address you were given, or record an experimental result in a spreadsheet and email the spreadsheet to your funder, doesn't actually matter for the plan'.)  
Getting humans to do things is really easy, if they don't know of a good reason not to do it. It's sometimes called "social engineering", and sometimes it's called "hiring them".
3Rob Bensinger1y
You have to weigh the conjunctive aspects of particular plans against the disjunctiveness of 'there are many different ways to try to do this, including ways we haven't thought of'.
How did you reach that conclusion? What does that ontology look like? What is your p(doom)? Is that acceptable? If yes, why is it acceptable? If no, what is the acceptable p(doom)?
Remember when some people, in order to see what would happen, modified a "drug discovery" AI system to search for maximally toxic molecules instead of minimizing toxicity and it ended up "inventing" molecules very similar to VX nerve gas?

[Reposting from a Facebook thread discussing the article because my thoughts may be of interest]

I woke to see this shared by Timnit Gebru on my Linkedin and getting 100s of engagements.

It draws a lot of attention to the airstrikes comment which is unfortunate.

Stressful to read 🙁

A quick comment on changes that I would probably make to the article:

Make the message less about EY so it is harder to attack the messenger and undermine the message.

Reference other supporting authorities and sources of eviden... (read more)

Yud keeps asserting the near-certainty of human extinction if superhuman AGI is developed before we do a massive amount of work on alignment. But he never provides anything close to a justification for this belief. That makes his podcast appearances and articles unconvincing - a most surprising, and crucial part of his argument is left unsupported. Why has he made the decision to present his argument this way? Does he think there is no normie-friendly argument for the near-certainty of extinction? If so, it's kind of a black pill with regard to his argumen... (read more)

The basic claims that lead to that conclusion are 1. Orthogonality Thesis: how "smart" an AI is has (almost) no relationship to what it's goals are. It might seem stupid to a human to want to maximize the number of paperclips in the universe, but there's nothing "in principle" that prevents an AI from being superhumanly good at achieving goals in the real world while still having a goal that people would think is as stupid and pointless as turning the universe into paperclips. 2. Instrumental Convergence: there are some things that are very useful for achieving almost any goal in the real world, so most possible AIs that are good at achieving things in the real world would try to do them. For example, self-preservation: it's a lot harder to achieve a goal if you're turned off, blown up, or if you stop trying to achieve it because you let people reprogram you and change what your goals are. "Aquire power and resources" is another such goal. As Eliezer has said, "the AI does not love you, nor does it hate you, but you are made from atoms it can use for something else." 3. Complexity of Value: human values are complicated, and messing up one small aspect can result in a universe that's stupid and pointless. One of the oldest SF dystopias ends with robots designed "to serve and obey and guard men from harm" taking away almost all human freedom (for their own safety) and taking over every task humans used to do, leaving people with nothing to do except sit "with folded hands." (Oh, and humans who resist are given brain surgery to make them stop wanting to resist.) An AI that's really good at achieving arbitrary real-world goals is like a literal genie: prone to giving you exactly what you asked for and exactly what you didn't want. 4. Right now, current machine learning methods are completely incapable of addressing any of these problems, and they actually do tend to produce "perverse" solutions to problems we give them. If we used them to make an AI that was sup

The point isn't that I'm unaware of the orthogonality thesis, it's that Yudkowsky doesn't present it in his recent popular articles and podcast appearances[0]. So, he asserts that the creation of superhuman AGI will almost certainly lead to human extinction (until massive amounts of alignment research has been successfully carried out), but he doesn't present an argument for why that is the case. Why doesn't he? Is it because he thinks normies cannot comprehend the argument? Is this not a black pill? IIRC he did assert that superhuman AGI would likely decide to use our atoms on the Bankless podcast, but he didn't present a convincing argument in favour of that position.  

[0] see the following:  


Yeah, the letter on Time Magazine's website doesn't argue very hard that superintelligent AI would want to kill everyone, only that it could kill everyone - and what it would actually take to implement "then don't make one".
To be clear, that it more-likely-than-not would want to kill everyone is the article's central assertion. "[Most likely] literally everyone on Earth will die" is the key point. Yes, he doesn't present a convincing argument for it, and that is my point. 
Not in his popular writing. Or he has gone over the ground so much, it seems obvious to him. But part of effective communication is realising that's what's obvious to you may need to be spelt out to others.