Reframing misaligned AGI's: well-intentioned non-neurotypical assistants

[-]Stuart_Armstrong8y130

I think this is the wrong way of looking at it. Because in this analogy, the PA is "genuinely trying their hardest to optimize for your values", it's just poor at understanding these values. That problem is basically ignorance, and so by making the PA smarter or more aware, we can solve the problem.

But an AGI that fully understood your values, would still not optimise for them if it had a bad goal. The AGI is not well-intentioned-but-weird-in-implementation; its intentions themselves are alien/weird to us.

[-]zhukeepa8y110

I wish I were clearer in my title that I'm not trying to reframe all misaligned AGI's, just a particular class of them. I agree that an AGI that fully understood your values would not optimize for them (and would not be "well-intentioned") if it had a bad goal.

That problem is basically ignorance, and so by making the PA smarter or more aware, we can solve the problem.

I think if we've correctly specified the values in an AGI, then I agree that when the AGI is smart enough it'll correctly optimize for our values. But it's not necessarily robust to scaling down, and I think it's likely to hit a weird place where it's trying and failing to optimize for our values. This post is about my intuitions for what that might look like.

[-]Stuart_Armstrong8y70

Ok; within that subset of problems, I agree.

[-]Ben Pace8y120

I've curated this post for these reasons:

You put solid work into understanding central ideas in alignment, and published 4 posts simultaneously (1, 2, 3) communicating some key intuitions you've picked up. Doing this communicative work is really valuable and something I want to celebrate.
The posts helped me develop clear intuitions about corrigibility, approval-directed agents, and agent-foundations.
The posts are short, concrete, and easy to understand, which is the opposite of basically all the rest of the writing about AGI alignment.

My biggest hesitation(s) with curating this post:

I am somewhat hesitant to share simple intuition pumps about important topics, in case those intuition pumps are misleading.
I had a real hard time which of the four to pick, and have gone with the one that I thought people would find the clearest to read.

I was really excited that you wrote these posts, and learned a lot from them (plus the ensuing discussion in the comments).

[-]KatjaGrace8y80

I am somewhat hesitant to share simple intuition pumps about important topics, in case those intuition pumps are misleading.

This sounds wrong to me. Do you expect considering such things freely to be misleading on net? I expect some intuition pumps to be misleading, but for considering all of the intuitions that we can find about a situation to be better than avoiding them.

[-]Ben Pace8y60

I feel like there are often big simplifications of complex ideas that just convey the wrong thing, and I was vaguely worried that in a field primarily dominated by things that are hard-to-read, things that are easy to understand will dominate the conversation even if they're pretty misguided. It's not a big worry for me here, but it was the biggest hesitation I had.

[-]Raemon8y60

Not sure what Ben meant, but my own take is "sharing is fine, but intuition pumps without rigor backing them are not something we should curate regularly as an exemplar of what LW is trying to be"

[-]zhukeepa8y40

Thanks a lot Ben! =D

I am somewhat hesitant to share simple intuition pumps about important topics, in case those intuition pumps are misleading.

On that note, Paul has recently written a blog post clarifying that his notion of "misaligned AI" does not coincide with what I wrote about here.

[-]John_Maxwell8y40

In Superintelligence, Nick Bostrom talks about various "AI superpowers". One of these is "Social manipulation", which he summarizes as

Social and psychological modeling, manipulation, rhetoric persuasion

Strategic relevance:

Leverage external resources by recruiting human support

Enable a “boxed” AI to persuade its gatekeepers to let it out

Persuade states and organizations to adopt some course of action

AI can expropriate computational resources over the Internet

And Eliezer Yudkowsky writes:

There’s a popular concept of “intelligence” as book smarts, like calculus or chess, as opposed to say social skills. So people say that “it takes more than intelligence to succeed in human society”. But social skills reside in the brain, not the kidneys. When you think of intelligence, don’t think of a college professor, think of human beings; as opposed to chimpanzees. If you don’t have human intelligence, you’re not even in the game.

In order to have elite social skills, you need to be able to form accurate models about the thoughts & intentions of others. But being able to form accurate models about the thoughts & intentions of an overseer is exactly the ability we'd like to see in a corrigible AI.

If we can build AI systems that form those models without being goal-driven agents, maybe it's possible to have the benefits of elite social skills without the costs. I'm optimistic that this is the case--many of our most powerful model-building techniques don't really behave as though they have some kind of goal they are trying to achieve in the world.

[-]Ben Pace8y30

Minor point: You changed the title! The title used to be better.

[-]zhukeepa8y40

How about now? :P

[-]Ben Pace8y30

I've honestly forgotten the exact original wording, but I like this one more than the thing I complained about. (The post is super short and sweet and I liked having the title be a clear handle to the idea - an "AGI reframing" is not as good a pointer as "a well-intentioned non-neurotypical super-powerful assistant".)

[-]Commander Zander8y20

I think this is a clever new way of phrasing the problem.

When you said 'friend that is more powerful than you', that also made me think of a parenting relationship. We can look at whether this well-intentioned personification of AGI would be a good parent to a human child. They might be able to give the child a lot of attention, a expensive education, and a lot of material resources, but they might take unorthodox actions in the course of pursuing human goals.

[+]Viktor Riabtsev8y-50

LESSWRONG
LW

LESSWRONG
LW

46

Reframing misaligned AGI's: well-intentioned non-neurotypical assistants

46

46