Richard Ngo's case for why artificial general intelligence may pose an existential threat. Written with an aim to incorporate modern advances in machine learning, without taking any previous claims about AGI risk for granted.

Recent Discussion

I sometimes talk to people who are nervous about expressing concerns that AI might overpower humanity. It’s a weird belief, and it might look too strange to talk about it publicly, and people might not take us seriously. 

How weird is it, though? Some observations (see Appendix for details):

  • There are articles about AI risk in the NYTCNBCTIME, and several other mainstream news outlets. Some of these articles interview experts in the AI safety community, explicitly mention human extinction & other catastrophic risks, and call for government regulation.
  • Famous People Who My Mom Has Heard Of™ have made public statements about AI risk. Examples include Bill Gates, Elon Musk, and Stephen Hawking.
  • The leaders of major AI labs have said things like “[AI] is probably the greatest threat to the continued existence

Another article in The Atlantic today explicitly mentions the existential risk of AI and currently sits as the 9th most popular article on the website.

COVID at least had some policy handles that the government could try to pull: lockdowns, masking, vaccines, etc. What could they even do against AGI?
1Chris van Merwijk3h
Linking to my post about Dutch TV:

I believe two things about rules (politicians, CEOs of big orgs):

  1. They only give others as much freedom as necessary to be useful to achieve ruler's goals
  2. They don't want actors more powerful than them anywhere nearby

From these I intuit that:

  1. Rules will not support development of powerful AGI as it might threaten to overpower them
  2. Rules might get rid of humans as soon as an AI can achieve goals more efficiently (but that's much lower bar for intelligence and power of AI, than that needed to overpower the Ruler)

Thus my immediate fears are not so much about aligning super-human AGI, but about aligning Rulers with needs of their constituents - for example a future in which we never get to smarter than humans AIs, but things a bit more powerful...

Rules will not support development of powerful AGI as it might threaten to overpower them

is probably true, but only because you used the word "powerful" rather than "capable". Rulers would definitely want development of capable AGIs as long as they believe (however incorrectly) in their ability to maintain power/control over those AGIs.

In fact, rulers are likely to be particularly good at cultivating capable underlings they they maintain firm control of. It may cause them to overestimate their ability to do the same for AGI. In fact, if they expect an A... (read more)

0Answer by qbolec8h
ChatGPT's answer:   (I am a bit worried by this given that China seems to restrict AIs more than US...)   I like how ChatGPT can help in operatinalizing fuzzy intuitions. I feel an eerie risk that it makes me think even less, and less carefully, and defer to AIs wisdom more and more... it's very tempting ... as if finding an adult who you can cede control to.

I am posting this draft as it stands today. Though I like it, it does not completely reflect my current best understanding of optimization, and is not totally polished. But I have been working on it for too long, and the world is moving too quickly. In the future, I might aggressively edit this post to become not-a-draft, or I may abandon it entirely and post a new introduction. (Unfortunately, all of the really juicy content is in other drafts that are way less developed.)

[TODO Acknowledgements, AGISF and SERI MATS]

Among the many phenomena that unfold across the universe, there is one that is not immediately evident from an inspection of the laws of physics, but which can nevertheless determine the fate of the system in which it...

In a polarized political environment like the US, ideas that start out neutral often end up aligned with one side or the other. In cases where there's an important idea or policy that's currently neutral, and multiple potential implementations that are also neutral, it would be much better to get polarization around which implementation to choose than on the core idea. Is there anything we can do to make this more likely?

Let's look at an example where this didn't happen: covid vaccination in the US ended mostly liberal-aligned. This was pretty unfortunate: the vaccines are very effective against death, and a lot of people died essentially because they had the bad luck to be part of a constituency that ended up opposed to them. This could have gone the other way: Operation Warp Speed...

My vague impression is that for a while the US did have something like starting under FDR, but it broke in the post-Nixon era when politicians stopped being able to collude as well.

Not sure what level of playing around you're talking about, but there was also research on mRNA therapeutics as early as the late 1980s.
Right, Wikipedia cites a 1972 paper using viruses to deliver DNA, but no vaccine until 1984. Whereas, mRNA in lipids went from delivery in 1989 to a vaccine in 1993-1994. So twenty years on one metric, but ten years on another metric that probably screens off the first one by virtue of coming later. But that's just playing around. Obstacles artificially created by the FDA are real obstacles. To the extent that the vaccine-hesitant mean anything by "old-fashioned," they mean large scale experience in humans. More people received vector vaccines in the Oxford trials than in all deployment before. If you want to know about Bell's palsy, that's the only way to find out. On the other hand, if you want years of follow-up, a 2015 trial of vector vaccines could have been an big advantage over mRNA vaccines, although I don't know if they actually followed up after years. With no placebo group, it's not clear what analysis they could make.
You may be right; I'm not very knowledgeable here and digging deeper into this isn't something I'm going to be able to do very well. For the point I was trying to make in the original article, it seems like your other vaccine examples would have been better.

“Consumerism” came up in my recent interview with Elle Griffin of The Post. Here’s what I had to say (off the cuff):

I have to admit, I’ve never 100% understood what “consumerism” is, or what it’s supposed to be. I have the general sense of what people are gesturing at, but it feels like a fake term to me. We’ve always been consumers, every living organism is a consumer. Humans, just like all animals, have always been consumers. It’s just that, the way it used to be, we didn’t consume very much. Now we’re more productive, we produce more, we consume more, we’re just doing the same thing, only more and better….

The term consumerism gets used as if consumption is something bad. I can understand that, people can


(I) overrated, insofar as you get stuck on a hedonic treadmill,

This is actually a good thing, primarily because such a mechanism is almost certainly key to how we avoid wireheading. In particular, it avoids the problem of RL agents inevitably learning to hack the reward, by always bringing it down to a set point of happiness and avoiding runaway happiness leading to wireheading.

The dictionary definition of consumerism is: [] 1: the theory that an increasing consumption [] of goods is economically desirable  also : a preoccupation with and an inclination toward the buying of consumer [] goods  2 : the promotion of the consumer's interests  This is also definition 2.1 from wikipedia ( Previously, from context, I believe it's quite clear that we're talking about definition 1 b (merriam webster) and 2.1 (wikipedia). The original post talks about how consumption is good even if frivolous, according to the OP; I believe this makes that quite clear. This is why the definitional issue of consumerism isn't quite relevant, and the definitional issue that is relevant is regarding what's frivolous. I see this a lot in internet discussion, where discussion revolves around a concept that is encapsulated by a word with multiple meanings, and a different-but-related meaning of the word keeps being brought up. It muddies the conversation. The discussion is about the concept, not the word; words are but the medium. Regarding your more on-point criticism, I generally agree. I think the key, so to speak, is two-fold: 1. Sometimes things just can't be equivalently-substituted not due to the goods/services, but due to the situation. That's just life. 2. Sometimes the situation or one's mindset, both of which are malleable, are the issue. The situation of amenities being too far away is one borne of bad urban planning. 2.5 mins, your benchmark, is quite short and good, however I do notice myself going out a lot less since I came to the US (almost a decade ago) because cities are extremely not walkable, so just going to the park is a whole thing. This is something you live with,
When people use consumerism is a derogatory way, they don't mean the idea of any consumption at all, they mean having no ideals of interests beyond consumption.

Tang Yu, an AI robot, was named CEO of a Hong Kong-based video game company called NetDragon Websoft. In addition to outperforming the Hong Kong stock market, the company saw a significant rise in its stock value, all within 8-10 months.

I'm suspicious of the strength of the claim this company is making. I think it's more likely this is a publicity stunt.

First, there's the legal issues. As far as I know, no jurisdiction allows software to serve as the officer of a company, let alone as the CEO. So to any extent an AI is calling the shots, there's got to be humans in the loop for legal reasons.

Second, sort of unclear what this AI is doing. Sounds more like they just have some fancy analytics software and they're saying it's the CEO because they mostly do whatever their analytics say to do?

T... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Subscribe to Curated posts
Log In Reset Password
...or continue with

As somebody who's been watching AI notkilleveryoneism for a very long time, but is sitting at a bit of a remove from the action, I think I may be able to "see the elephant" better than some people on the inside.

I actually believe I see the big players converging toward something of an unrecognized, perhaps unconscious consensus about how to approach the problem. This really came together in my mind when I saw OpenAI's plugin system for ChatGPT.

I thought I'd summarize what I think are the major points. They're not all universal; obviously some of them are more established than others.

  1. Because AI misbehavior is likely to come from complicated, emergent sources, any attempt to "design it out" is likely to fail.

    Avoid this trap by generating your AI


If this stuff keeps up, the populace is going to need to be inoculated against physical killer robots, not naughty memes. And the immune response is going to need to be more in the line of pitchforks and torches than being able to say you've seen it before. Not that pitchforks and torches would help in the long term, but it might buy some time.

I'm going to ask LWers not to do this in real life, and to oppose any organization or individual that tries to use violence to slow down AI, for the same reason I get really worried around pivotal acts:

If you fail... (read more)

You may have noticed that a lot of people on here are concerned about AI going rogue and doing things like converting everything into paperclips. If you have no effective way of assuring good behavior, but you keep adding capability to each new version of your system, you may find yourself paperclipped. That's generally incompatible with life. This isn't some kind of game where the worst that can happen is that somebody's feelings get hurt.
1M. Y. Zuo2h
This is only believed by a small portion of the population. Why do you think the aforementioned decision makers share such beliefs?
I doubt they do. And using the unqualified word "believe" implies a level of certainty that nobody probably has. I also doubt that their "beliefs" are directly and decisively responsible to their decisions. They are responding to their daily environments and incentives. Anyway, regardless of what they believe or of what their decision making processes are, the bottom line is that they're not doing anything effective to assure good behavior in the things they're building. That's the central point here. Their motivations are mostly an irrelevant side issue, and only might really matter if understanding them provided a path to getting them to modify their actions... which is unlikely. When I say "literal fear of actual death", what I'm really getting at is that, for whatever reasons, these people ARE ACTING AS IF THAT RISK DID NOT EXIST WHEN IT IN FACT DOES EXIST. I'm not saying they do feel that fear. I'm not even saying they do not feel that fear. I'm saying they ought to feel that fear. They are also ignoring a bunch of other risks, including many that a lot of them publicly claim they do believe are real. But they're doing this stuff anyway. I don't care if that's caused by what they believe, by them just running on autopilot, or by their being captive to Moloch. The important part is what they are actually doing. ... and, by the way, if they're going to keep doing that, it might be appropriate to remove their ability to act as "decision makers".

(This is a stylized version of a real conversation, where the first part happened as part of a public debate between John Wentworth and Eliezer Yudkowsky, and the second part happened between John and me over the following morning. The below is combined, stylized, and written in my own voice throughout. The specific concrete examples in John's part of the dialog were produced by me. It's over a year old. Sorry for the lag.)


J: It seems to me that the field of alignment doesn't understand the most basic theory of agents, and is missing obvious insights when it comes to modeling the sorts of systems they purport to study.

N: Do tell. (I'm personally sympathetic to claims of the form "none of you idiots have any idea wtf...

1Max H5h
This recent tweet of Eliezer's crystallized a concept for me which I think is relevant to the concepts of optimization and agents discussed in the dialogue: [] In complicated systems in real life, the thing that is better at "preimaging outcomes onto choices" is the scary one, and the interesting / complicated systems are the ones where the choices are complex. Sure, it's true that you can construct toy systems in restricted domains (like the mushrooms and peppers one) and define "agents" in these systems which technically violate certain efficiency assumptions. But the reason these examples aren't compelling (to me) is that it's kind of obvious what all the agents in them will do, once you write down their utility functions and the starting resources available to them. There's not much complexity "left over" for interesting decision algorithms. Two of the real-world examples in this dialogue actually demonstrate the difference between these kinds of systems nicely: I could not step into the shoes of a successful hedge fund trader, and, given all the same choices and resources available to the trader, make decisions which result in more money in my trading account than than the original trader could. OTOH, if I were some kind of ghost-in-the-machine of a bacterium making ATP, I could (probably) make the same (or better, in cases where that's possible) decisions that the actual bacterium is making, given all the same information and choices to available to it. (Though I might need a computer to keep track of all the hormones and blood-glucose levels and feedback loops.) I can see how both examples might tell us something useful about intelligent systems, but the markets example seems more likely to have something to say about what the actual scary thing looks like.
Am I correct that "knowing what system thinks is fair" is equivalent to "knowing under which bargaining solution system acts"?
It seems to me that this is basically solved by "you put probability distributions over all things that you don't actually know and may have disagreement about"

This is for logical coordination? How does it help you with that?

Arguably the most important topic about which a prediction market has yet been run:  Conditional on an okay outcome with AGI, how did that happen?

C. Solving prosaic alignment on the first critical try is not as difficult, nor as dangerous, nor taking as much extra time, as Yudkowsky predicts; whatever effort is put forth by the leading coalition works inside of their lead time.

This is the majority of my probability mass, in the 60-90% probability range, in that I believe that alignment is way easier than the majority of LWers think.

Specifically, I believe we have a pretty straightforward path to alignment, if somewhat tedious and slightly difficult.

I also believe that 2 problems of embedded agenc... (read more)

I don't understand the motivation for defining "okay" as 20% max value. The cosmic endowment, and the space of things that could be done with it, is very large compared to anything we can imagine. If we're going to be talking about a subjective "okay" standard, what makes 20% okay, but 0.00002% not-okay? I would expect 0.00002% (e.g., in scenarios where AI "'pension[s] us off,' giv[ing] us [a percentage] in exchange for being parents and tak[ing] the rest of the galaxy for verself", as mentioned in "Creating Friendly AI" (2001) []) to subjectively feel great. (To be clear, I understand that there are reasons to not expect to get a pension [].)
I sorta had a hard time with this market because the things I think might happen don't perfectly map onto the market options, and usually the closest corresponding option implies some other thing, such that the thing I have in mind isn't really a central example of the market option.
3Martin Randall5h
On multiple choice you can only sell positions where you have bought. To bid an answer down you need to instead bid all other positions up. The yes/no markets work better.