LESSWRONG
LW

48
Wikitags
Main
LW Wiki

Humans consulting HCH

Edited by paulfchristiano last updated 12th Feb 2016

Consider a human who has access to a question-answering machine. Suppose the machine answers questions by perfectly imitating what the human would do if asked that question.

To make things twice as tricky, suppose the human-to-be-imitated is herself able to consult a question-answering machine, which answers questions by perfectly imitating what the human would do if asked that question…

Let’s call this process HCH, for “Humans Consulting HCH.”

I’ve talked about many variants of this process before, but I find it easier to think about with a nice handle. (Credit to Eliezer for proposing using a recursive acronym.)

HCH is easy to specify very precisely. For now, I think that HCH is our best precisely specified model for “a human’s enlightened judgment.” It’s a pretty problematic model, but we don’t yet have many contenders.

Elaborations

We can also define realizable variants of this inaccessible ideal.

  • For a particular prediction algorithm P, define HCHᴾ as:
    “P’s prediction of humans consulting HCHᴾ”
  • For a reinforcement learning algorithm A, define max-HCHᴬ as:
    “A’s output when maximizing the score assigned to that output by humans consulting max-HCHᴬ”
  • For a given market structure and participants, define HCHᵐᵃʳᵏᵉᵗ as:
    “market estimates about humans consulting HCHᵐᵃʳᵏᵉᵗ”

Note that e.g. max-HCHᴬ is totally different from “A’s output when maximizing the score assigned to that output by HCH.”

The latter proposal is essentially abstract approval-direction. It seems much more likely to yield good actions than max-HCHᴬ. But we can’t provide any feedback on the scores assigned by HCH, which makes it impossible to train the abstract version using conventional techniques. On the other hand it is much easier to implement HCHᴾ or max-HCHᴬ or HCHᵐᵃʳᵏᵉᵗ.

Hope

The best case is that HCHᴾ, max-HCHᴬ, and HCHᵐᵃʳᵏᵉᵗ are:

  • As capable as the underlying predictor, reinforcement learner, or market participants.
  • Aligned with the enlightened judgment of the human, e.g. as evaluated by HCH.

(At least when the human is suitably prudent and wise.)

It is clear from the definitions that these systems can’t be any more capable than the underlying predictor/learner/market. I honestly don’t know whether we should expect them to match the underlying capabilities. My intuition is that max-HCHᴬ probably can, but that HCHᴾ and HCHᵐᵃʳᵏᵉᵗ are much dicier.

It is similarly unclear whether the system continues to reflect the human’s judgment. In some sense this is in tension with the desire to be capable — the more guarded the human, the less capable the system but the more likely it is to reflect their interests. The question is whether a prudent human can achieve both goals.

Parents:
Paul Christiano's AI control blog
Subscribe
Discussion
Subscribe
Discussion
Posts tagged Humans consulting HCH
39Humans Consulting HCH
Ω
paulfchristiano
7y
Ω
9
15HCH and Adversarial Questions
David Udell
4y
7
128Paul's research agenda FAQ
Ω
zhukeepa
7y
Ω
74
50Relating HCH and Logical Induction
Ω
abramdemski
5y
Ω
4
75A guide to Iterated Amplification & Debate
Ω
Rafael Harth
5y
Ω
12
60Garrabrant and Shah on human modeling in AGI
Ω
Rob Bensinger
4y
Ω
10
43HCH Speculation Post #2A
Ω
Charlie Steiner
4y
Ω
7
42HCH is not just Mechanical Turk
Ω
William_S
7y
Ω
6
34Idealized Factored Cognition
Ω
Rafael Harth
5y
Ω
6
33Can HCH epistemically dominate Ramanujan?
Ω
zhukeepa
7y
Ω
6
23Clarifying Factored Cognition
Ω
Rafael Harth
5y
Ω
2
17FC final: Can Factored Cognition schemes scale?
Rafael Harth
5y
0
17Epistemology of HCH
Ω
adamShimi
5y
Ω
2
7Predicting HCH using expert advice
Ω
jessicata
9y
Ω
2
1HCH as a measure of manipulation
Ω
orthonormal
9y
Ω
7
Load More (15/26)
Add Posts