5 A Safer Oracle Setup?

by Ofer

9th Feb 2018

4 min read

4

5

Oracle AI

Personal Blog

5

New Comment

4 comments, sorted by

top scoring

Click to highlight new comments since: Today at 2:27 PM

[-]Ofer7y170

Update: The setup described in the OP involves a system that models humans. See this MIRI article for a discussion on some important concerns about such systems.

Reply

[-]Stuart_Armstrong8y80

Hey there!

The main problem is that you're trying to answer the test question "Limited to 100 words, what might humans answer in 2100 when asked how people in 2018 could most effectively improve their wellbeing?", while training on "Limited to 100 words, what might humans answer in 2018 when asked how people in 1950 could most effectively improve their wellbeing?"

This is a major problem, because the training question can be answered by a "look at current medical/social science, and distill answers" approach, while the test question needs extrapolation - they are actually very different questions (today).

You might be interested in some of my ideas on Oracles, that should allow the safe answering of many different types of questions: https://arxiv.org/abs/1711.05541

Reply

[-]Ofer8y10

ETA: Please don't spend time on this comment. I no longer think this setup deserves attention. Thanks again!

Thanks so much for the comment!
I fully agree.

Suppose we first write the code for "agent X", and then we wait for a year. We then invoke X with a dataset containing questions that humanity answered in the past year. Then we take "agent Y" (the output of X) and invoke it with a useful question that we don't know the answer to. If it's a question that we could find the answer to anyway in the very near future (or perhaps even WOULD have found the answer to in the past year had we were luckier), then it's plausible we'll get a useful answer from Y. The larger the dataset is with respect to the final n (the length of the code of the Y created at the last iteration), the less plausible it is for any code of that size to always correctly detect that a given useful question is not from our dataset (which is a necessary condition for not giving us useful answers for any useful question not from the dataset).

P.S., I'll fully digest your paper soon (I'm in an intense period of finishing my MSc...), but going over it a few months ago (along with other things you wrote on boxed AIs) had a huge and useful impact on me, so thanks for that too! :)

LESSWRONG
LW

LESSWRONG
LW

5

A Safer Oracle Setup?

5

5

The Setup

Why Might this Setup be Safer than the Naive One?