By Scott Alexander

A basic primer on why AI might lead to human extinction, and why solving the problem 
is difficult. Scott Alexander walks readers through a number of questions with evidence based on progress from machine learning.

Recent Discussion

Epistemic status: After a couple hours of arguing with myself, this still feels potentially important, but my thoughts are pretty raw here.

Hello LessWrong! I’m an undergraduate student studying at the University of Wisconsin-Madison, and part of the new Wisconsin AI Safety Initiative. This will be my first “idea” post here, though I’ve lurked on the forum on and off for close to half a decade by now. I’d ask you to be gentle, but I think I’d rather know how I’m wrong! I’d also like to thank my friend Ben Hayum for going over my first draft and WAISI more broadly for creating a space where I’m finally pursuing these ideas in a more serious capacity. Of course, I’m not speaking for anyone but myself here.

With that...

All the smart people agitating for a 6-month moratorium on AGI research seem to have unaccountably lost their ability to do elementary game theory.  It'a a faulty idea regardless of what probability we assign to AI catastrophe.

Our planet is full of groups of power-seekers competing against each other. Each one of them could cooperate (join in the moratorium) defect (publicly refuse) or stealth-defect (proclaim that they're cooperating while stealthily defecting). The call for a moratorium amounts to saying to every one of those groups "you should choose to lose power relative to those who stealth-defect". It doesn't take much decision theory to predict that the result will be a covert arms race conducted in a climate of fear by the most secretive and paranoid among the power...

Agreed, this would make it super easy to front-run you.

1Del Nobolo2h
This seems like the only realistic aspiration we can pursue. It would require pressure from players that have centralized compute hardware where any large scale train runs require that level of code and data transparency. Hardware companies could also flag large acquisitions. Ultimately sovereign nations will have to push hardest, this alongside global cooperation seems insurmountable. The true problem is that there is no virus causing enough harm to take action, rather the emergence of intelligent phenomena none of us, even their creators, fail to understand. So beyond sparking hollow debate, what can we tangibly do? Where do the dangers actually lie?
Worked well enough in Aus, UK and Canada.
3Brendan Long3h
I think the claim is that a ban would give an advantage to stealth defection because the stealth defector would work faster than people who can't work at all, while a regulation requiring open sharing of research would make stealth defection a disadvantage because the stealth defector has to work alone and in secret while everyone else collaborates openly. I think it depends, since you could have a situation where a stealth defector knows something secret and can combine it with other people's public research, but it would also be hard for someone to get ahead in the first place while working alone/in secret.

We call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4.


AI systems with human-competitive intelligence can pose profound risks to society and humanity, as shown by extensive research and acknowledged by top AI labs. As stated in the widely-endorsed Asilomar AI Principles, Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources. Unfortunately, this level of planning and management is not happening, even though recent months have seen AI labs locked in an out-of-control race to develop and deploy ever more powerful digital minds that no one – not even their creators – can understand, predict, or reliably control.


1Greg C10h
Compute - what fraction of world compute did it take to train GPT-4? Maybe 1e-6? There's 1e6 improvement right there from a superhuman GPT-6 capturing all of the "hardware overhang". Data - superhuman GPT-6 doesn't need to rely on human recorded data, it can harness all the sensors on the planet to gather exabytes of-real time data per second, and re-derive scientific theories from scratch in minutes based on it's observations (including theories about human behaviour, language etc) Robotics/Money - easy for GPT-6. Money it can get from scamming gullible humans, hacking crypto wallets via phishing/ransomware, or running rings round stock market traders. Robotics it can re-derive and improve on from it's real time sensing of the planet and it's speed of thought making our daily life look like geology does to us. It can escape to the physical world any number of ways by manipulating humans into giving it access to boot loaders for it to gain a foothold in the physical world (robots, mail-order DNA etc). Algorithm search time - wall clock time is much reduced when you've just swallowed the world's hardware overhang (see Compute above) Factoring the above, your extra decades become extra hours.
1Gerald Monroe9h
This isn't an opinion grounded in physical reality. I suggest you work out a model of how fast each step would actually take.

Can you be more specific about what you don't agree with? Which parts can't happen, and why?

Just compare how collaborative governments solving global warming versus Gates Foundation solving polio.

Slimrock Investments Pte. Ltd. is listed on the Alameda County Recorder's records as associated with Lightcone's recent purchase of the Rose Garden Inn in Berkeley. 

"Assignment of Rents" implies that they are the lender who provided the capital to purchase the property. There is not much information about them on the internet. They appear to be a holding company incorporated in Singapore. 

However, I was able to find them in a list of creditors in the bankruptcy proceedings for FTX:

What is Lightcone's relationship to Slimrock, and is there any specific reason that the purchase of the Rose Garden Inn was financed through them rather than a more mundane/pedestrian lender? 

5Answer by habryka24m
Slimrock Investments Pte. Ltd. is an investment company that Jaan Tallinn owns. He gave us the loan to buy the property, and he facilitated it via this company. I don't really know anything more about the company than that.  My guess is the only reason why it shows up in relation to FTX is that FTX owes money to Jaan, which isn't very surprising given that he owns a bunch of crypto and I think was also an early investor in Alameda (and maybe also FTX)? The total price for the property was $16.5MM, the total loan was for $20MM, including $3.5MM for repairs and renovation.

This makes sense; thanks for the quick reply.

There are a lot of FTX creditors and it's not surprising to me that the best financing option for Lightcone would be an EA rather than a commercial bank, and given they're an EA it's also not a shock that they would have some financial interaction with FTX; many people had financial interactions with FTX, including many prominent EAs. (you can see that screenshot lists them as creditor 2,229, and there were many more after them in that document!)

Paris Climate Accords

In the early 21st century, the climate movement converged around a "2°C target", shown in Article 2(1)(a) of the Paris Climate Accords:

Holding the increase in the global average temperature to well below 2°C above pre-industrial levels and pursuing efforts to limit the temperature increase to 1.5°C above pre-industrial levels, recognizing that this would significantly reduce the risks and impacts of climate change;
"Holding the increase in the global average temperature to well below 2°C above pre-industrial levels and pursuing efforts to limit the temperature increase to 1.5°C above pre-industrial levels, recognizing that this would significantly reduce the risks and impacts of climate change;"(source)

The 2°C target helps facilitate coordination between nations, organisations, and individuals.

  • It provided a clear, measurable goal.
  • It provided a sense of urgency and severity.
  • It promoted a sense of shared responsibility.
  • It helped to align efforts across different stakeholders.
  • It created a shared understanding of what success would look like.

The AI governance community should converge around a similar target.

0.2 OOMs/year target

In this article, I propose a target of...

Unfortunately we may already have enough compute, and it will be difficult to enforce a ban on decentralized training (which isn't competitive yet, but likely could be with more research).

I think you'd want to set the limit to something slightly faster than Moore's law. Otherwise you have a constant large compute overhang. Ultimately, we're going to be limited by Moore's law (or its successor) growth rates eventually anyway. We're on a kind of z-curve right now, where we're transitioning from ML compute being some small constant fraction of all compute to some much larger constant fraction of all compute. Before the transition it grows at the same speed as compute in general. After the transition it also grows at the same speed as compute in general. In the middle it grows faster as we rush to spend a much larger share of GWP on it. From that perspective, Moore's law growth is the minimum growth rate you might have (unless annual spend on ML shrinks). And the question is just whether you transition from the small constant fraction of all compute to the large constant fraction of all compute slowly or quickly. Trying to not do the transition at all (i.e. trying to growing at exactly the same rate as compute in general) seems potentially risky, because the resulting constant compute overhang means it's relatively easy for someone somewhere to rush ahead locally and build something much better than SOTA. If on the other hand, you say full steam ahead and don't try to slow the transition at all, then on the plus side the compute overhang goes away, but on the minus side, you might rush into dangerous and destabilizing capabilities. Perhaps a middle path makes sense, where you slow the growth rate down from current levels, but also slowly close the compute overhang gap over time.
I suppose a possible mistake in this analysis is that I'm treating Moore's law as the limit on compute growth rates, and this may not hold once we have stronger AIs helping to design and fabricate chips. Even so, I think there's something to be said for trying to slowly close the compute overhang gap over time.
I think concrete ideas like this that take inspiration from past regulatory successes are quite good, esp. now that policymakers are discussing the issue.

It's a 3 hours 23 minutes episode.

[I might update this post with a summary once I'm done listening to it.]

Yes, unfortunately, Eliezer's delivery was suffering in many places from assuming that listeners have a lot of prior knowledge/context.

If he wishes to become a media figure going forward (which looks to me like an optimal thing to do for him at this point), this is one of the most important aspects to improve in his rhetoric. Pathos (the emotional content) is already very good, IMO.

same, I also saw the story unfold in real time (and it matches other stories about Sydney/early GPT-4 Bing), though I didn't do enough digging to make sure it wasn't faked.
1Seth Herd1h
That's a good suggestion. But at some point you have to let it die or wrap it up. It occurred to me while Eliezer was repeatedly trying to get Lex back onto the you're-in-a-box-thinking-faster thought experiment: when I'm frustrated with people for not getting it, I'm often probably boring them. They don't even see why they should bother to get it. You have to know when to let an approach die, or otherwise change tack.
Indeed, GPT-3 is almost exactly the same architecture as GPT-2, and only a little different from GPT.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Subscribe to Curated posts
Log In Reset Password
...or continue with

Editor's note: this post is several years out of date and doesn't include information on modern systems like GPT-4, but is still a solid layman's introduction to why superintelligence might be important, dangerous and confusing.

1: What is superintelligence?

A superintelligence is a mind that is much more intelligent than any human. Most of the time, it’s used to discuss hypothetical future AIs.

1.1: Sounds a lot like science fiction. Do people think about this in the real world?

Yes. Two years ago, Google bought artificial intelligence startup DeepMind for $400 million; DeepMind added the condition that Google promise to set up an AI Ethics Board. DeepMind cofounder Shane Legg has said in interviews that he believes superintelligent AI will be “something approaching absolute power” and “the number one risk for...

In 2023 is this still your go-to?

It's still my go-to for laymen, but as I looked at it yesterday I did sure wish there was a more up-to-date one.

This post is a container for my short-form writing. See this post for meta-level discussion about shortform.

One question I sometimes see people asking is, if AGI is so close, where are the self-driving cars? I think the answer is much simpler, and much stupider, than you'd think. Waymo is operating self-driving robotaxis in SF and a few other select cities, without safety drivers. They use LIDAR, so instead of the cognitive task of driving as a human would solve it, they have substituted the easier task "driving but your eyes are laser rangefinders". Tesla also has self-driving, but it isn't reliable enough to work without close human oversight. Until less than a month ago, they were using 1.2 megapixel black and white cameras. So instead of the cognitive task of driving as a human would solve it, they substituted the harder task "driving with a vision impairment and no glasses". If my understanding is correct, this means that Tesla's struggle to get neural nets to drive was probably not a problem with the neural nets, and doesn't tell us much of anything about the state of AI. (Crossposted with Facebook [], Twitter [])
My answer to this is quite different. The paradigm that is currently getting very close to AGI is basically having a single end-to-end trained system with tons of supervised learning.  Self-driving car AI is not actually operating in this current paradigm as far as I can tell, but is operating much more in the previous paradigm of "build lots of special-purpose AI modules that you combine with the use of lots of special-case heuristics". My sense is a lot of this is historical momentum, but also a lot of it is that you just really want your self-driving AI to be extremely reliable, so training it end-to-end is very scary.  I have outstanding bets that human self-driving performance will be achieved when people switch towards a more end-to-end trained approach without tons of custom heuristics and code.

My understanding is that they used to have a lot more special-purpose modules than they do now, but their "occupancy network" architecture has replaced a bunch of them. So they have one big end-to-end network doing most of the vision, which hands a volumetric representation over to the collection of special-purpose-smaller-modules for path planning. But path planning is the easier part (easier to generate synthetic data for, easier to detect if something is going wrong beforehand and send a take-over alarm.).

That... Would be hilarious, if true. So you think we will see self driving cars soon, then?

The sequence [2, 4, 6] is valid. Test other sequences to discover what makes a sequence valid. When you think you know, write down your guess, reveal the rule, and see how it compares.

(You should try to deduce the truth using as few tests as possible; however, your main priority is getting the rule right.)

You can play my implementation of the 2-4-6 problem here (should only take a few minutes). For those of you who already know the solution but still want to test your inductive reasoning skills, I've made some more problems which work the same way but apply different rules.

I knew about 2-4-6 problem from HPMOR, I really like the opportunity to try it out myself. These are my results on the four other problems:


Number of guesses:

8 guesses of which 3 were valid and 5 non-valid


"A sequence of integers whose sum is non-negative"

Result: Failure


Number of guesses:

39 of which 23 were valid 16 non-valid


"Three ordered real numbers where the absolute difference between neighbouring numbers is decreasing."

Result: Success


Number of guesses:

21 of which 15 were valid and 6 non-valid

Guess... (read more)

1Viktor Rehnberg1h
See FAQ for spoiler tags, it seems mods haven't seen your request. []