Andrew Ng wants to have a conversation about extinction risk from AI

Leon Lang

This is a linkpost for https://twitter.com/AndrewYNg/status/1665759430552567810

Andrew Ng writes:

I'd like to have a real conversation about whether AI is a risk for human extinction. Honestly, I don't get how AI poses this risk. What are your thoughts? And, who do you think has a thoughtful perspective on how AI poses this risk that I should talk to?

In the attached video, he states that he respects many of the people who signed the letter a lot, and will reach out to people whom he thinks have a thoughtful perspective. But he is also interested in further suggestions for whom to talk to.

Given that Andrew Ng is one of the top AI scientists in the world, it seems valuable for someone to think of a way to connect to him.

Hm. I'm sure plenty of people could do a fine job, myself included. But if every such person jumped in, it would be a mess. I assume that if Stuart Russell was the right person for the job, the job would already be over. Plausibly ditto Eliezer.

Rob Miles might be the obvious person for explaining things well. I totally endorse him doing attention-getting things I wouldn't endorse for people like me.

Also probably fine would be people optimized a little more for AI work than explaining things. Paul Christiano may be the Schelling-point tip of the iceberg of people-kinda-doing-Paul-like-things, or trading off even more for AI, it looks like Yoshua Bengio might be a solid choice.

A framing I've been thinking about recently is AutoGPT. Obviously it's not very good at navigating the world, but my point is actually about humans: the first thing people asked AutoGPT was simple tests like "fix this code" or "make a plan for an ad campaign." Soon after, the creator told it to "help humanity." A few days after that, someone else told it to "destroy humanity." I think this is a good way of dividing up the discussion of whether AI poses an existential threat. Taken backwards:

There's the sort of risk where a bad actor tells some real-world-navigating AI to destroy humanity. What factors would have to go wrong for them to succeed? This is a good frame question to talk about whether we expect AI to be a powerful technology at all, and how we expect the timescale of progress to compare to the timescales of diffusion of technology and adaptation to technology.
There's the sort of risk where someone tells an AI to help humanity, and it goes wrong. Why would it go wrong? Well, human values are complicated and often fragile. This is a good time to talk about what the state of the art is for getting computers to just "do what humans mean" and why that state of the art is lacking. The failure mode that shows up repeatedly is finding unintended optima, and this gets even worse when trying to generalize to totally unseen problems.
For most people you just need to stop at two, but the third category is also something people think about. Is there a risk from giving an AI a safe-sounding objective like "fix this code" or "run my ad campaign?" This is a good jumping off point for talking about instrumental goals, the progress we've made in the last few years on "artificial common sense" and how far we still have to go, and mesa-optimization that might cause RL to generalize poorly.

https://www.lesswrong.com/posts/QzkTfj4HGpLEdNjXX/an-artificially-structured-argument-for-expecting-agi-ruin

Rob Miles might be the obvious person for explaining things well. I totally endorse him doing attention-getting things I wouldn't endorse for people like me.

There's the sort of risk where a bad actor tells some real-world-navigating AI to destroy humanity. What factors would have to go wrong for them to succeed? This is a good frame question to talk about whether we expect AI to be a powerful technology at all, and how we expect the timescale of progress to compare to the timescales of diffusion of technology and adaptation to technology.
There's the sort of risk where someone tells an AI to help humanity, and it goes wrong. Why would it go wrong? Well, human values are complicated and often fragile. This is a good time to talk about what the state of the art is for getting computers to just "do what humans mean" and why that state of the art is lacking. The failure mode that shows up repeatedly is finding unintended optima, and this gets even worse when trying to generalize to totally unseen problems.
For most people you just need to stop at two, but the third category is also something people think about. Is there a risk from giving an AI a safe-sounding objective like "fix this code" or "run my ad campaign?" This is a good jumping off point for talking about instrumental goals, the progress we've made in the last few years on "artificial common sense" and how far we still have to go, and mesa-optimization that might cause RL to generalize poorly.

https://www.lesswrong.com/posts/QzkTfj4HGpLEdNjXX/an-artificially-structured-argument-for-expecting-agi-ruin

31

Andrew Ng wants to have a conversation about extinction risk from AI

31

31

31