Richard Ngo's case for why artificial general intelligence may pose an existential threat. Written with an aim to incorporate modern advances in machine learning, without taking any previous claims about AGI risk for granted.
I sometimes talk to people who are nervous about expressing concerns that AI might overpower humanity. It’s a weird belief, and it might look too strange to talk about it publicly, and people might not take us seriously.
How weird is it, though? Some observations (see Appendix for details):
I believe two things about rules (politicians, CEOs of big orgs):
From these I intuit that:
Thus my immediate fears are not so much about aligning super-human AGI, but about aligning Rulers with needs of their constituents - for example a future in which we never get to smarter than humans AIs, but things a bit more powerful...
Rules will not support development of powerful AGI as it might threaten to overpower them
is probably true, but only because you used the word "powerful" rather than "capable". Rulers would definitely want development of capable AGIs as long as they believe (however incorrectly) in their ability to maintain power/control over those AGIs.
In fact, rulers are likely to be particularly good at cultivating capable underlings they they maintain firm control of. It may cause them to overestimate their ability to do the same for AGI. In fact, if they expect an A...
I am posting this draft as it stands today. Though I like it, it does not completely reflect my current best understanding of optimization, and is not totally polished. But I have been working on it for too long, and the world is moving too quickly. In the future, I might aggressively edit this post to become not-a-draft, or I may abandon it entirely and post a new introduction. (Unfortunately, all of the really juicy content is in other drafts that are way less developed.)
[TODO Acknowledgements, AGISF and SERI MATS]
Among the many phenomena that unfold across the universe, there is one that is not immediately evident from an inspection of the laws of physics, but which can nevertheless determine the fate of the system in which it...
In a polarized political environment like the US, ideas that start out neutral often end up aligned with one side or the other. In cases where there's an important idea or policy that's currently neutral, and multiple potential implementations that are also neutral, it would be much better to get polarization around which implementation to choose than on the core idea. Is there anything we can do to make this more likely?
Let's look at an example where this didn't happen: covid vaccination in the US ended mostly liberal-aligned. This was pretty unfortunate: the vaccines are very effective against death, and a lot of people died essentially because they had the bad luck to be part of a constituency that ended up opposed to them. This could have gone the other way: Operation Warp Speed...
My vague impression is that for a while the US did have something like starting under FDR, but it broke in the post-Nixon era when politicians stopped being able to collude as well.
“Consumerism” came up in my recent interview with Elle Griffin of The Post. Here’s what I had to say (off the cuff):
...I have to admit, I’ve never 100% understood what “consumerism” is, or what it’s supposed to be. I have the general sense of what people are gesturing at, but it feels like a fake term to me. We’ve always been consumers, every living organism is a consumer. Humans, just like all animals, have always been consumers. It’s just that, the way it used to be, we didn’t consume very much. Now we’re more productive, we produce more, we consume more, we’re just doing the same thing, only more and better….
The term consumerism gets used as if consumption is something bad. I can understand that, people can
(I) overrated, insofar as you get stuck on a hedonic treadmill,
This is actually a good thing, primarily because such a mechanism is almost certainly key to how we avoid wireheading. In particular, it avoids the problem of RL agents inevitably learning to hack the reward, by always bringing it down to a set point of happiness and avoiding runaway happiness leading to wireheading.
Tang Yu, an AI robot, was named CEO of a Hong Kong-based video game company called NetDragon Websoft. In addition to outperforming the Hong Kong stock market, the company saw a significant rise in its stock value, all within 8-10 months.
I'm suspicious of the strength of the claim this company is making. I think it's more likely this is a publicity stunt.
First, there's the legal issues. As far as I know, no jurisdiction allows software to serve as the officer of a company, let alone as the CEO. So to any extent an AI is calling the shots, there's got to be humans in the loop for legal reasons.
Second, sort of unclear what this AI is doing. Sounds more like they just have some fancy analytics software and they're saying it's the CEO because they mostly do whatever their analytics say to do?
T...
As somebody who's been watching AI notkilleveryoneism for a very long time, but is sitting at a bit of a remove from the action, I think I may be able to "see the elephant" better than some people on the inside.
I actually believe I see the big players converging toward something of an unrecognized, perhaps unconscious consensus about how to approach the problem. This really came together in my mind when I saw OpenAI's plugin system for ChatGPT.
I thought I'd summarize what I think are the major points. They're not all universal; obviously some of them are more established than others.
Because AI misbehavior is likely to come from complicated, emergent sources, any attempt to "design it out" is likely to fail.
Avoid this trap by generating your AI
If this stuff keeps up, the populace is going to need to be inoculated against physical killer robots, not naughty memes. And the immune response is going to need to be more in the line of pitchforks and torches than being able to say you've seen it before. Not that pitchforks and torches would help in the long term, but it might buy some time.
I'm going to ask LWers not to do this in real life, and to oppose any organization or individual that tries to use violence to slow down AI, for the same reason I get really worried around pivotal acts:
If you fail...
(This is a stylized version of a real conversation, where the first part happened as part of a public debate between John Wentworth and Eliezer Yudkowsky, and the second part happened between John and me over the following morning. The below is combined, stylized, and written in my own voice throughout. The specific concrete examples in John's part of the dialog were produced by me. It's over a year old. Sorry for the lag.)
J: It seems to me that the field of alignment doesn't understand the most basic theory of agents, and is missing obvious insights when it comes to modeling the sorts of systems they purport to study.
N: Do tell. (I'm personally sympathetic to claims of the form "none of you idiots have any idea wtf...
This is for logical coordination? How does it help you with that?
Arguably the most important topic about which a prediction market has yet been run: Conditional on an okay outcome with AGI, how did that happen?
C. Solving prosaic alignment on the first critical try is not as difficult, nor as dangerous, nor taking as much extra time, as Yudkowsky predicts; whatever effort is put forth by the leading coalition works inside of their lead time.
This is the majority of my probability mass, in the 60-90% probability range, in that I believe that alignment is way easier than the majority of LWers think.
Specifically, I believe we have a pretty straightforward path to alignment, if somewhat tedious and slightly difficult.
I also believe that 2 problems of embedded agenc...
Another article in The Atlantic today explicitly mentions the existential risk of AI and currently sits as the 9th most popular article on the website.