29 My research priorities for AI control

6th Dec 2015

1 min read

29

I've been thinking about what research projects I should work on, and I've posted my current view. Naturally, I think these are also good projects for other people to work on as well.

Brief summaries of the projects I find most promising:

Elaborating on apprenticeship learning. Imitating human behavior seems especially promising as a scalable approach to AI control, but there are many outstanding problems.
Efficiently using human feedback. The limited availability of human feedback may be a serious bottleneck for realistic approaches to AI control.
Explaining human judgments and disagreements. My preferred approach to AI control requires humans to understand AIs’ plans and beliefs. We don’t know how to solve the analogous problem for humans.
Designing feedback mechanisms for reinforcement learning. A grab bag of problems, united by a need for proxies of hard-to-optimize, implicit objectives.

The post briefly discusses where I am coming from, and links to a good deal more clarification. I'm always interested in additional thoughts and criticisms, since changing my views on these questions would directly influence what I spend my time on.

Personal Blog

29

My research priorities for AI control

5Vaniver

8paulfchristiano

2solipsist

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:46 PM

[-]Vaniver10y50

I agree that those directions look promising.

My impression is that act-based approaches are good for human replacements, but not good for human meta-replacements. That is, if we consider the problem of fulfilling orders in an Amazon warehouse, we have a number of different problems:

Move a worker and a crate of goods adjacent to each other.
Move a single good from the crate to the order box.
Organize the facility to make the above two as cost-efficient as possible.

The first is already replaced by robots, the second seems like a good candidate for imitation (we need robot eyes and robot hands that are about as good as human eyes and hands, and they can work basically the same way) / low-level control theory.

But the third is a problem of cost functions, models, simuation, calculation, and possibility enumeration. It's where creativity most comes into play, and that's something I'm pessimistic about getting out of interpolative systems instead of extrapolative systems.

There are still major gains made by the reduced scope--a warehouse management system seems easier to make safe than a generic moneymaker--but I think there's a fundamental connection between the opportunity and danger of AI (the ability to extrapolate into previously unseen realms).

[-]paulfchristiano10y80

Some things can be done by imitation based on our current understanding (and will get better as machine learning improves). The interesting part of the project is figuring out how to do the trickier things, which will require new ideas.

It's not clear that imitation impairs your ability to generalize to new domains. An RL agent faces the question: in this new domain, how should I behave to receive rewards? It has not been trained in the domain, but must learn to reason about the domain and figure out what policies will work well. An imitation learner faces the question: in this new domain, how would the expert behave / what behavior would they approve of? The two questions seem similar in difficulty, and indeed you could use the same algorithmic ingredients.

It's also not clear that it's relevant if a task involves thinking about cost functions, models, simulation, calculation, etc... These are techniques one could apply either to achieve a high reward, or to produce actions the expert would approve of / like / do themselves. You might say that at that point these rich internal behaviors must be guided by some non-trivial internal dynamic. But then we will just have the same discussion a level lower.

[-]solipsist10y20

Minor naming feedback. You switched from calling something "supervised learning" to "reinforcement learning". The first images that come to my mind when I hear "reinforcement learning" are TD-Gammon and reward signals. So, when I read "reinforcement learning", I first think of a computer getting smarter through iterative navel-gazing, then think of a computer trying to wirehead itself, then stumble to the meaning I think you intend. I am a lay reader.

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

29

My research priorities for AI control

29

29