[Written for a general audience; may not contain much that is new to LW. Posted here first for comment; now published.]

All powerful new technologies create both benefits and risks: cars, planes, drugs, radiation. AI is on a trajectory to become one of the most powerful technologies we possess; in some scenarios, it becomes by far the most powerful. It therefore will create both extraordinary benefits and extraordinary risks.

What are the risks? Here are several lenses for thinking about AI risks, each putting AI in a different reference class.

As software

AI is software. All software has bugs. Therefore AI will have bugs.

The more complex software is, and the more poorly we understand it, the more likely it is to have bugs. AI is so complex that it cannot be designed, but only “trained”, which means we understand it very poorly. Therefore it is guaranteed to have bugs.

You can find some bugs with testing, but not all. Some bugs can only be found in production. Therefore, AI will have bugs that will only be found in production.

We should think about AI as complicated, buggy, code, especially to the extent that it is controlling important systems (vehicles, factories, power plants).

As a complex system

The behavior of a complex system is highly non-linear, and it is difficult (in practice impossible) to fully understand.

This is especially true of the system’s failure modes. A complex system, such as the financial system, can seem stable but then collapse quickly and with little warning.

We should expect that AI systems will be similarly hard to predict and could easily have similar failure modes.

As an agent with unaligned interests

Today’s most advanced AIs—chatbots and image generators—are not autonomous agents with goal-directed behavior. But such systems will inevitably be created and deployed.

Anytime you have an agent acting on your behalf, you have a principal–agent problem: the agent is ultimately pursuing their goals, and it can be hard to align those goals with your own.

For instance, the agent may tell you that it is representing your interests while in truth optimizing for something else, like a demagogue who claims to represent the people while actually seeking power and riches.

Or the agent can obey the letter of its goals while violating the spirit, by optimizing for its reward metrics instead of the wider aims those metrics are supposed to advance. An example would be an employee who aims for promotion, or a large bonus, at the expense of the best interests of the company. Referring back to the first lens, AI as software: computers always do exactly what you tell them, but that isn’t always exactly what you want.

Related: any time you have a system of independent agents pursuing their own interests, you need some rules for how they behave to prevent ruinous competition. But some agents will break the rules, and no matter how much you train them, some will learn “follow these rules” and others will simply learn “don’t get caught.”

People already do all of these things: lie, cheat, steal, seek power, game the system. In order to counteract them, we have a variety of social mechanisms: laws and enforcement, reputation and social stigma, checks and balances, limitations on power. At minimum, we shouldn’t give AI any more power or freedom, with any less scrutiny, than we would give a human.

As a separate, advanced culture or species

In the most catastrophic hypothesized AI risk scenarios, the AI acts like a far more advanced culture, or a far more intelligent species.

In the “advanced culture” analogy, AI is like the expansionary Western empires that quickly dominated all other cultures, even relatively advanced China. (This analogy has also been used to hypothesize what would happen on first contact with an advanced alien species.) The best scenario here is that we assimilate into the advanced culture and gain its benefits; the worst is that we are enslaved or wiped out.

In the “intelligent species“ analogy, the AI is like humans arriving on the evolutionary scene and quickly dominating Earth. The best scenario here is that we are kept like pets, with a better quality of life than we could achieve for ourselves, even if we aren’t in control anymore; the worst is that we are exploited like livestock, exterminated like pests, or simply accidentally driven extinct through neglect.

These scenarios are an extreme version of the principal-agent problem, in which the agent is far more powerful than the principal.

How much you are worried about existential risk from AI probably depends on how much you regard these scenarios as “far-fetched” vs. “obviously how things will play out.”


I don’t yet have solutions for any of these, but I find these different lenses useful both to appreciate the problem and take it seriously, and to start learning from the past in order to find answers.

I think these lenses could also be useful to help find cruxes in debates. People who disagree about AI risk might disagree about which of these lenses they find plausible or helpful.

New to LessWrong?

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 11:20 PM

Great post. I think more people should use the technique of scoping out different lens for understanding something before settling on their own position. Looking forward to seeing where you end up falling on this issue when you've had more time to think about it. 

I'm pretty predisposed towards charismatic soundbyte-sized concepts. Not because they fit well on social media or tiktok or anything (perish the thought), but because they're more memorable and they're 1. easier to operationalize in your mind 2. you're more likely to recall them during a real conversation with someone and 3. you're more likely to recite the entire concept correctly, or the important bits (such as forgetting a billion von neumann but you can still say it's a trillion or a million geniuses).

This Yudkowsky tweet seems pretty relecant to lens #4:

Don't think of (powerful) AI as being like a smart guy who has to live on the human Internet. Think of it as an alien civilization initially stuck inside a computer, containing a billion von Neumanns running at 10,000X human speeds, which would like to achieve independence

People already do all of these things: lie, cheat, steal, seek power, game the system. In order to counteract them, we have a variety of social mechanisms: laws and enforcement, reputation and social stigma, checks and balances, limitations on power. 

 

At minimum, we shouldn’t give AI any more power or freedom, with any less scrutiny, than we would give a human.

For your 'third lens', how does the conclusion follow from the existence of such mechanisms?

I'd also add "As a tool" which like all tools can be used maliciously, disregarding others' wellbeing and attempting to enrich only the user.

AI x-risk is going mainstream.  With that I've seen a lot of ill-informed drive-by takes. Bravo on actually immersing yourself in the arguments. This looks excellent - I love the four perspectives. Next time a family member asks I will point them towards this post.