New Answer

New Comment

3 Answers sorted by
top scoring

Jun 17, 2020

Ω6160

Rohin Shah's talk gives one taxonomy of approaches to AI alignment: https://youtu.be/AMSKIDEbjLY?t=1643 (During Q&A Rohin also mentions some other stuff)
- This podcast episode also talks about similar things: https://futureoflife.org/2019/04/11/an-overview-of-technical-ai-alignment-with-rohin-shah-part-1/
Wei Dai's success stories post is another way to organize the various approaches: https://www.lesswrong.com/posts/bnY3L48TtDrKTzGRb/ai-safety-success-stories
I started trying to organize AI alignment agendas myself a while back, but never got far: https://aiwatch.issarice.com/#agendas
This post by Jan Leike also has a list of agendas in the Outlook section: https://medium.com/@deepmindsafetyresearch/scalable-agent-alignment-via-reward-modeling-bf4ab06dfd84

Thanks. Your post specifically is pretty helpful because it helps with one of the things that was tripping me up, which is what standard names people call different methods. Your names do a better job of capturing them than mine did.

evhub

Jun 16, 2020

Ω360

You might be interested in this post I wrote recently that goes into significant detail on what I see as the major leading proposals for building safe advanced AI under the current machine learning paradigm.

[-]Gordon Seidoh Worley5yΩ120

Actually this post was not especially helpful for my purpose and I should have explained why in advance because I anticipated someone would link it. Although it helpfully lays out a number of proposals people have made, it does more to work out what's going on with those proposals rather than find ways they can be grouped together (except incidentally). I even reread this post before posting this question and it didn't help me improve on the taxonomy I proposed, which I already had in mind as of a few months ago.

Gordon Seidoh Worley

Jun 16, 2020

Ω240

My initial thought is that there are at least 3, which I'll give the follow names (with short explanations):

Iterated Distillation and Amplification (IDA)
- Build an AI, have it interact with a human, create a new AI based on the interaction of the human and the AI, and repeat until the AI is good enough or it reaches a fixed point and additional iterations don't change it.
Inverse Reinforcement Learning (IRL)
- Build an AI that tries to infer human values from observations and then acts based on those inferred values.
Decision Theorized Agent (DTA)
- Build an AI that uses a decision theory that causes it to make choices that will be aligned with human interests.

All of these are woefully underspecified, so improved summaries of these approaches that you think accurately explain these approaches also appreciated.

[-]Ben Pace5yΩ3100

I think the last one seems odd / doesn't make much sense. All agents have a decision theory, including RL-based agents, so it's not a distinctive approach.

If you were attempting to describe MIRI's work, remember that they're trying to understand basic concepts of agency better (meta level, object level), not in order to directly put the new concepts into the AI (in the same way current AIs do not always have a section for the 'probability theory' to be written in) but in order to be much less confused about what we're trying to do.

So if you want to describe MIRI's work, you'd call it "getting less confused about the basic concepts" and then later building an AI via a mechanism we'll figure out after getting less confused. Right now it's not engineering, it's science.

2Gordon Seidoh Worley5y

That's true, but there's a natural and historical relationship here with what was in the past termed "seed AI", even if this is not an approach anyone is actively pursuing, which is the kind of thing I was hoping to point at without using that outmoded term.

5Rob Bensinger5y

I agree with Ben and Richard's summaries; see https://www.lesswrong.com/posts/uKbxi2EJ3KBNRDGpL/comment-on-decision-theory: When I think about key distinctions and branching points in alignment, I usually think about things like: * Does the approach require human modeling? Lots of risks can be avoided if the system doesn't do human modeling, or if it only does small amounts of human modeling; but this constrains the options for value learning and learning-in-general. * Current ML is notoriously opaque. Different approaches try to achieve greater understanding and inspectability to different degrees and in different ways (e.g., embedded agency vs. MIRI's "new research directions" vs. the kind of work OpenAI Clarity does), or try to achieve alignment without needing to crack open the black box. * Is the goal to make a task-directed AGI system, vs. an open-ended optimizer? When you say "there's a natural and historical relationship here with what was in the past termed 'seed AI', even if this is not an approach anyone is actively pursuing", it calls to mind for me the transition from MIRI thinking about open-ended optimizers to instead treating task AGI as the place to start.

4Ben Pace5y

I'm not actually sure what you mean. I think 'seed AI' means something like 'first case in an iterative/recursive process' of self-improvement, which applies pretty well to the iterated amplification setup (which is a recursively self-improving AI) and lots of other examples that Evan wrote about in his 11-examples post. It still seems to me to be a pretty general term.

[-]Richard_Ngo5yΩ3100

I suspect that nobody is actually pursuing the third one as you've described it. Rather, my impression is that MIRI researchers tend to think of decision theory as a more fundamental problem in understanding AI, not directly related to human interests.

[-]Gordon Seidoh Worley5yΩ120

Based on comments/links so far it seems I should revise the names and add a fourth:

IDA = IDA
IRL -> Ambitious Value Learning (AVL)
DTA -> Embedded Agency (EA)
+ Brain Emulation (BE)
- Build AI that either emulates how humans brains work or is bootstrapped from human brain emulations.

Rendering 2/10 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 8:08 PM

[-]Steven Byrnes5yΩ240

Sorta related is my appendix to this article.

[-]Gordon Seidoh Worley5yΩ120

Oh, I forgot about emulation approaches, i.e. bootstrap AI by "copying" human brains, which you mention. Thanks!

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

12

[ Question ]

What are the high-level approaches to AI alignment?

12

Ω 5

12

Ω 5

3 Answers sorted by
top scoring

Jun 17, 2020

Jun 16, 2020

Jun 16, 2020

12

[ Question ]

What are the high-level approaches to AI alignment?

12

Ω 5

12

Ω 5

3 Answers sorted by top scoring

Jun 17, 2020

Jun 16, 2020

Jun 16, 2020

3 Answers sorted by
top scoring