0 A scheme for safely handling a mixture of good and bad predictors

by jessicata

17th Feb 2016

AI Alignment Forum

2 min read

3

0

Personal Blog

0

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:01 AM

[-]paulfchristiano10yΩ000

The problem with applying this approach in the to the kinds of schemes I have been considering is that the behavior of the overall system implicitly depends on its answers to a very large number of questions. See the tree-structured case here.

So I'm not sure that we can avoid the problem you describe by selective sampling.

As you've pointed out, the underlying learning problem is also insoluble. So there are really two apparently distinct problems to be resolved.

Reply

[-]jessicata10yΩ000

It does seem bad if the tree is so large that it contains questions that are unsafe to ask a real human. But I'm not sure why you would want to do this. If the question is unsafe to ask a real human, then it seems like most ways of asking the question within the tree structure are also unsafe. Unless you're doing something like processing the answer to a question using a computer program instead of actually showing the answer to a human?

If none of the questions in (a sample of) the tree is unsafe to ask the human, then there's a simple recursive algorithm that will find where predictors disagree in the tree (whenever these disagreements propagate to the root node). You'd probably want to set it up so that the human takes as input a question X, and returns either an answer to X or a pair of 2 questions Y and Z; Y is asked to a second HCH, and (Z ++ answer to Y) is asked to a third HCH to get the answer to X. To find a disagreement, you can start by looking at the root node X, seeing if the predictors disagree on its answer; if they do, then see whether they disagree on what the root human does; if they don't disagree on what the root human does, then see if they disagree on the answer to Y; and so on.

EDIT: now that I thought about it more, it seems like your original active learning proposal will work fine with 100,000 questions. I think I mentally replaced 100,000 with $2^{1000000}$ at some point, and then criticized using active learning with this many questions.

Reply

[-]paulfchristiano10yΩ000

If the question is unsafe to ask a real human, then it seems like most ways of asking the question within the tree structure are also unsafe

I agree, and retract my complaint. The hierarchical structure does make the problem more subtle, but doesn't rule out the approach you outlined.

A second potential problem is that I would really like to synthesize potentially problematic data in advance. This can't be done using quite the technique you suggest, though you could imagine somehow forcing the synthesized data to look much like data that will actually appear (and really you want to do something like that anyway). The situation seems tricky and pretty subtle.

Reply

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

0

A scheme for safely handling a mixture of good and bad predictors

0

0