Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Relating HCH and Logical Induction

4paulfchristiano

4abramdemski

2paulfchristiano

3Pongo

New Comment

There are two salient ways to get better predictions: deliberation and trial+error. HCH is about deliberation, and logical inductors are about trial and error. The benefit of trial and error is that it works eventually. The cost is that it doesn't optimize what you want (unless what you want is the logical induction criterion) and that it will generally get taken over by consequentialists who can exercise malicious influence a constant number of times before the asymptotics assert themselves. The benefit of deliberation is that its preferences are potentially specified indirectly by the original deliberator (rather than externally by the criterion for trial and error) and that if the original deliberator is strong enough they may suppress internal selection pressures, while the cost is that who knows if it works.

HCH is about deliberation, and logical inductors are about trial and error.

I think that's true of the way I describe the relationship in the OP, but not quite true in reality. I think there's also an aspect of deliberation that's present in logical induction and not in HCH. If we think of HCH as a snapshot of a logical inductor, the logical inductor is "improving over time as a result of thinking longer". This is partly due to trial-and-error, but there's also a deliberative aspect to it.

I mean, partly what I'm saying is that it's hard to draw a sharp line between deliberation and trial-and-error. If you try to draw that line such that logical induction lands to one side, you're putting Bayes' Law on multiple hypotheses on the "trial-and-error" side. But it's obvious that one would want it to be on both sides. It's definitely sort of about trial-and-error, but we also definitely want to apply Bayes' Law in deliberation. Similarly, it might turn out that we want to apply the more general logical-induction updates within deliberation.

But part of what I'm saying is that LIC (logical induction criterion) *is a theory of rational deliberation* in the sense of revising beliefs over time. The LIA (logical induction algorithm) captures the trial-and-error aspect, running lots of programs without knowing which ones are actually needed to satisfy LIC. But the LIC is a normative theory of deliberation, saying that what it means for belief revisions over time to be rational is that they not be too exploitable.

The cost is that it doesn't optimize what you want (unless what you want is the logical induction criterion) and that it will generally get taken over by consequentialists who can exercise malicious influence a constant number of times before the asymptotics assert themselves.

Yeah, if you take the LIA as a design proposal, it's pretty unhelpful. But if you take the LIC as a model of rational deliberation, you get potentially useful ideas.

The benefit of deliberation is that its preferences are potentially specified indirectly by the original deliberator (rather than externally by the criterion for trial and error), and that if the original deliberator is strong enough they may suppress internal selection pressures.

For example, the LIC is a context in which we can formally establish a version of "if the deliberator is strong enough they can suppress internal selection pressures".

Think of it this way. We want to use a BRO to answer questions. We know it's very powerful, but at first, we don't have a clue as to how to answer questions with it. So we implement a Bayesian mixture-of-experts, which we call the "market". Each "trader" is a question-answering strategy: a way to use the BRO to answer questions. We give each possible strategy for using the BRO some weight. However, our "market" is itself a BRO computation. So, each trader has access to the market itself (in addition to many other computations which the BRO can access for them).

But a BRO only has oracle access to machines using smaller BROs, right? So a trader can't access the market?

(I don't think very much directly about the tree-size-limited version of HCH, normally I think of bounded versions like HCH(P) = "Humans consulting P's predictions of HCH(P)".)

HCH(P) = "Humans consulting M's predictions of HCH(P)"

Should this be "consulting P's predictions"? If not, what are M and P?

If so, should I be thinking of P as the one obtained in the limit of HCH(P) = "Humans consulting P's predictions of HCH(P_previous)"?

I'd like to communicate a simple model of the relationship between logical induction and HCH which I've known about for some time. This is more or less a combination of ideas from Sam, Tsvi, and Scott, but I don't know that any of them endorse the full analogy as I'll state it.

## HCH as Bounded Reflective Oracles

An HCH is a human consulting the computational power of an HCH. It's very natural to model this within a reflective oracle setting, so that we can talk about computations which have oracle access to the output of any other computation. However, reflective oracles (ROs) are too powerful to capture a realistic HCH: access to an RO is tantamount to logical omniscience, since you can ask the RO the output of any computation, no matter how large.

Instead, we can think of HCH as a human with access to a bounded reflective oracle. A bounded reflective oracle (BRO) limits the time spent by the Turing machine (similar to how HCH requires the human to return an answer within a set time period, such as an hour, exporting any additional computation required to answer the question to the HCH calls the human makes in that time), plus, limits the size of the call trees which can be created as a consequence of the recursive calls an oracle machine makes (similar to how some versions of HCH give the human a limited number of recursive calls, which the human must then allocate between the HCH calls which the human makes, bounding the total size of the HCH call tree).

So, we can think of HCH and BRO as interchangeable: an HCH is just a BRO computation which starts with a simulation of a human with access to a BRO, and with the property that the human only ever makes calls to simulations of themselves-with-access-to-BRO, recursively. Similarly, a BRO-machine can be computed within an HCH if we have the human carry out the steps of the BRO-machine. Whenever the BRO machine makes a BRO call, the human in the HCH makes a corresponding HCH call, asking another human to simulate another BRO machine and report back with the results.

## BROs Learning to Think

Bounded Oracle Induction (BOI) is a variant of logical induction based on BROs. Standard logical induction constructs market prices each day by finding the equilibrium prices, where all bets are balanced by opposite bets (or every "buy" has a corresponding "sell"). BOI uses BROs to find this equilibrium, so we can think of traders as probability distributions which can be computed via oracle access to the market (just as the market is something we can compute given oracle access to the traders).

Think of it this way. We want to use a BRO to answer questions. We know it's very powerful, but at first, we don't have a clue as to how to answer questions with it. So we implement a Bayesian mixture-of-experts, which we call the "market". Each "trader" is a question-answering strategy: a way to use the BRO to answer questions. We give each possible strategy for using the BRO some weight. However, our "market" is itself a BRO computation. So, each trader has access to the market itself (in addition to many other computations which the BRO can access for them). Some traders may mostly trust the market, providing only small adjustments to the market answer. Other traders may attempt to provide their own answers entirely from scratch, without asking the market.

Obviously, our initial market won't be very informative; it's just an arbitrary collection of traders. But we can get feedback, telling us how well we did on some of the questions we tried to answer. We use the logical induction algorithm (the LIA) to update the weights of each trader. This procedure has the advantage of satisfying the logical induction criterion (LIC): the market beliefs over time will not be too exploitable.

So, a BOI is someone using this strategy to learn to use a BRO. Like HCH, a BOI on any one market day gives its opinion with-access-to-our-own-opinion: to answer any given question, the BOI can ask itself a series of other questions.

Unlike HCH, a BOI has a concept of learning to give better answers over time. An HCH is an amplified version of a fixed person at a fixed moment in time. It does not allow that person to learn better question-answering strategies.

In this view, an HCH is a freeze-frame of the logical-induction deliberation process. All the recursive calls of an HCH, building exponential-sized trees of cognitive labor, is considered "one cognitive moment" in logical induction terms.

## Notions of Amplification

HCH gives us a notion of amplification

assuming black-box access to an agent we want to amplify. Assuming we can steal the human question-answering strategy, HCH gives us a notion of much-better-thought-out answers to our questions. HCH does not rely on anyformalnotion of rationality, but assumes that the human question-answering strategy is competent in some sense, so that the amplified human which HCH gives us is highly capable.Logical induction gives us a very different notion of amplification. The LIC is a rationality notion, telling us something about what it is to form better beliefs as we think longer. Assuming some things about the structure of an agent's beliefs, we can

run the agent's thinking process forwardto get an amplified version of it: what it would think if it considered every question more thoroughly. Unlike HCH, this would require much more than black-box access to the question-answering strategy, however. I don't (currently) know how to take a human and define their amplification within the BOI framework (or LIC/LIA).It is a theorem that a logical inductor trusts its future opinions more than its present opinions. This gives a formal notion of trust in amplification. We don't get similar reassurances within an HCH framework.

Furthermore, LIC gives us a notion of

trust in a process. A logical inductor trusts each successive market state more. The process of logical induction doesn't have any nice termination point, but, it does the best it can with its bounded resources at each step (in a certain sense provided by LIC).HCH gives us an

idealized fixed-point, rather than a notion of successive improvement. "HCH consulting HCH" is just HCH. However, wecantry to climb up to that fixed-point by iterating: human-consulting-human, human-consulting-human-consulting-human, ... Certainly the hope is to get some kind of trust-in-the-process argument which works under plausible assumptions.Each and every market day in a BOI is

alreadyat the HCH fixed-point of rationality under self-consultation. This makes direct comparisons between the two notions of amplification trickier. One interpretation is that the LIC notionis a notion of amplification for HCH fixed points: once you've bootstrapped up to HCH, how do you go further? Logical induction gives a theory of what it means to get better at that point, and if we obey its rationality notion, we get some nice guarantees about self-trust and endorsing the process of improvement.Why do we

needa notion of further-amplifying-beyond-HCH? Becausebeing the human in HCH is hard: you don't automatically know what strategy to use to answer every question, and there's a significant sense in which you can learn to be better.## Applications?

I don't know if this analogy between HCH and logical induction is useful for anything. It would be interesting to see a variant of IDA which didn't just approximate an HCH fixed-point, instead somehow approximating the way a BOI learns to use BROs more effectively over time. It would be very interesting if some assumptions about the human (EG, the assumption that human deliberation eventually notices and rectifies any efficiently computable Dutch-book of the HCH) cound guarantee trust properties for the combined notion of amplification, along the lines of the self-trust properties of logical induction.

More broadly, it would be really nice if "the logical induction notion of amplification" I've outlined here could be turned into a real notion of amplification in the sense of HCH -- taking information about a human and using it to define an amplified human. (And not just "simulate the human thinking longer".)