LESSWRONG
LW

AI
Frontpage

12

[ Question ]

What are the high-level approaches to AI alignment?

by Gordon Seidoh Worley
16th Jun 2020
AI Alignment Forum
1 min read
A
3
13

12

Ω 5

AI
Frontpage

12

Ω 5

What are the high-level approaches to AI alignment?
16riceissa
4Gordon Seidoh Worley
6evhub
2Gordon Seidoh Worley
4Gordon Seidoh Worley
10Ben Pace
2Gordon Seidoh Worley
5Rob Bensinger
4Ben Pace
10Richard_Ngo
2Gordon Seidoh Worley
4Steven Byrnes
2Gordon Seidoh Worley
New Answer
New Comment

3 Answers sorted by
top scoring

riceissa

Jun 17, 2020

Ω6160
  • Rohin Shah's talk gives one taxonomy of approaches to AI alignment: https://youtu.be/AMSKIDEbjLY?t=1643 (During Q&A Rohin also mentions some other stuff)
    • This podcast episode also talks about similar things: https://futureoflife.org/2019/04/11/an-overview-of-technical-ai-alignment-with-rohin-shah-part-1/
  • Wei Dai's success stories post is another way to organize the various approaches: https://www.lesswrong.com/posts/bnY3L48TtDrKTzGRb/ai-safety-success-stories
  • I started trying to organize AI alignment agendas myself a while back, but never got far: https://aiwatch.issarice.com/#agendas
  • This post by Jan Leike also has a list of agendas in the Outlook section: https://medium.com/@deepmindsafetyresearch/scalable-agent-alignment-via-reward-modeling-bf4ab06dfd84
Add Comment
[-]Gordon Seidoh Worley5yΩ240

Thanks. Your post specifically is pretty helpful because it helps with one of the things that was tripping me up, which is what standard names people call different methods. Your names do a better job of capturing them than mine did.

Reply

evhub

Jun 16, 2020

Ω360

You might be interested in this post I wrote recently that goes into significant detail on what I see as the major leading proposals for building safe advanced AI under the current machine learning paradigm.

Add Comment
[-]Gordon Seidoh Worley5yΩ120

Actually this post was not especially helpful for my purpose and I should have explained why in advance because I anticipated someone would link it. Although it helpfully lays out a number of proposals people have made, it does more to work out what's going on with those proposals rather than find ways they can be grouped together (except incidentally). I even reread this post before posting this question and it didn't help me improve on the taxonomy I proposed, which I already had in mind as of a few months ago.

Reply

Gordon Seidoh Worley

Jun 16, 2020

Ω240

My initial thought is that there are at least 3, which I'll give the follow names (with short explanations):

  • Iterated Distillation and Amplification (IDA)
    • Build an AI, have it interact with a human, create a new AI based on the interaction of the human and the AI, and repeat until the AI is good enough or it reaches a fixed point and additional iterations don't change it.
  • Inverse Reinforcement Learning (IRL)
    • Build an AI that tries to infer human values from observations and then acts based on those inferred values.
  • Decision Theorized Agent (DTA)
    • Build an AI that uses a decision theory that causes it to make choices that will be aligned with human interests.

All of these are woefully underspecified, so improved summaries of these approaches that you think accurately explain these approaches also appreciated.

Add Comment
[-]Ben Pace5yΩ3100

I think the last one seems odd / doesn't make much sense. All agents have a decision theory, including RL-based agents, so it's not a distinctive approach. 

If you were attempting to describe MIRI's work, remember that they're trying to understand basic concepts of agency better (meta level, object level), not in order to directly put the new concepts into the AI (in the same way current AIs do not always have a section for the 'probability theory' to be written in) but in order to be much less confused about what we're trying to do.

So if you want to describe MIRI's work, you'd call it "getting less confused about the basic concepts" and then later building an AI via a mechanism we'll figure out after getting less confused. Right now it's not engineering, it's science.

Reply
2Gordon Seidoh Worley5y
That's true, but there's a natural and historical relationship here with what was in the past termed "seed AI", even if this is not an approach anyone is actively pursuing, which is the kind of thing I was hoping to point at without using that outmoded term.
5Rob Bensinger5y
I agree with Ben and Richard's summaries; see https://www.lesswrong.com/posts/uKbxi2EJ3KBNRDGpL/comment-on-decision-theory: When I think about key distinctions and branching points in alignment, I usually think about things like: * Does the approach require human modeling? Lots of risks can be avoided if the system doesn't do human modeling, or if it only does small amounts of human modeling; but this constrains the options for value learning and learning-in-general. * Current ML is notoriously opaque. Different approaches try to achieve greater understanding and inspectability to different degrees and in different ways (e.g., embedded agency vs. MIRI's "new research directions" vs. the kind of work OpenAI Clarity does), or try to achieve alignment without needing to crack open the black box. * Is the goal to make a task-directed AGI system, vs. an open-ended optimizer? When you say "there's a natural and historical relationship here with what was in the past termed 'seed AI', even if this is not an approach anyone is actively pursuing", it calls to mind for me the transition from MIRI thinking about open-ended optimizers to instead treating task AGI as the place to start.
4Ben Pace5y
I'm not actually sure what you mean. I think 'seed AI' means something like 'first case in an iterative/recursive process' of self-improvement, which applies pretty well to the iterated amplification setup (which is a recursively self-improving AI) and lots of other examples that Evan wrote about in his 11-examples post. It still seems to me to be a pretty general term.
[-]Richard_Ngo5yΩ3100

I suspect that nobody is actually pursuing the third one as you've described it. Rather, my impression is that MIRI researchers tend to think of decision theory as a more fundamental problem in understanding AI, not directly related to human interests.

Reply
[-]Gordon Seidoh Worley5yΩ120

Based on comments/links so far it seems I should revise the names and add a fourth:

  • IDA = IDA
  • IRL -> Ambitious Value Learning (AVL)
  • DTA -> Embedded Agency (EA)
  • + Brain Emulation (BE)
    • Build AI that either emulates how humans brains work or is bootstrapped from human brain emulations.
Reply
Rendering 2/10 comments, sorted by
top scoring
(show more)
Click to highlight new comments since: Today at 4:06 AM
[-]Steven Byrnes5yΩ240

Sorta related is my appendix to this article.

Reply
[-]Gordon Seidoh Worley5yΩ120

Oh, I forgot about emulation approaches, i.e. bootstrap AI by "copying" human brains, which you mention. Thanks!

Reply
Moderation Log
Curated and popular this week
A
3
2

I'm writing a post comparing some high-level approaches to AI alignment in terms of their false positive risk. Trouble is, there's no standard agreement on what various high-level approaches to AI alignment there are today, either in terms of what constitutes these high-level approaches or where to draw the line in categorizing various specific approaches.

So, I'll open it up as a question to get some feedback before I get too far along. What do you consider to be the high-level approaches to AI alignment?

(I'll supply my own partial answer below.)

Mentioned in
5Comparing AI Alignment Approaches to Minimize False Positive Risk