A mostly critical review of infra-Bayesianism

Thank you for this detailed review, David. Replies to selected points:

I think that "trying to come up with your own answers first" is reasonable advice for studying many topics. The details of how that looks like, and how much time to spend on that might vary from person to person. I think that every aspiring alignment researcher should decide for themself where is the best balance for their own personal skill set and learning style. However, I do think that you should at least study the foundation (statistical & computational learning theory and algorithmic information theory) before thinking about new mathematical approaches to alignment. Moreover, after coming up with your own answer, you definitely should study the answers of other people to understand the overlap, the differences and the advantages of each. I feel that in alignment especially, people too often reinvent the wheel and write long posts about "what if just make the AI do X", unaware that X was already discussed 20 times in the past.
Regard "idealized models" vs "inscrutable kludgery": The property of "inscrutability" is in the map, not in the territory. That fact we don't understand how ANNs work is a fact about our current state ignorance, not about the inherent inscrutability of ANNs. There is also no law of nature which says that any future AI paradigm cannot have better theoretical foundation. Moreover, the way you talk about "naive Bayes" seems confused. In learning theory, we have a relatively small number of metrics and desiderata for learning algorithms (e.g. sample complexity, regret) and a large variety of actual algorithms (kernel methods, convex optimization, UCB, Thompson sampling, Q-learning etc). Many different algorithms can satisfy similar desiderata. And, e.g. algorithms that have low regret are automatically asymptotically Bayes-optimal. So, if e.g. transformers have low regret w.r.t. some natural prior (which I think is very likely), then they are approximately "naive" Bayes. Ofc finding this prior and characterizing the rate of convergence is a difficult task, but even so considerable progress has already been made in simple cases (e.g. feedforward ANNs with 2 or 3 layers), and there are some promising avenues for further progress (e.g. singular learning theory).
From my perspective, the main motivation to think about "ontological crises" is not that we're worried our AI will have a crisis and start doing something wrong, but that the fact we're confused about this question means we don't know what is even the correct type signature of values, which seems pretty important to figure out before we can talk about aligning the AI's values to our own values. Similarly, issues like Occam's razor and anthropics are important primarily not because of specific failure modes (although I do think simulation hypotheses are a serious concern), but because our confusion about them is a symptom of our confusion about the concepts of "intelligence" and "agency" in general (see also the rocket alignment parable and cannonballs that circumnavigate the Earth vs sending a spaceship to the Moon).

(according to Vanessa there are only three people in the world who fully understand the infra-Bayesian sequence)

Vanessa, Diffractor, and who is the third one?

https://www.lesswrong.com/users/matolcsid?from=post_header now?

Is Knightian uncertainty more responsive to non-infraBayesian distributions? [these distributions being convex puts strong constraints on what they could be, but Knightian uncertainty assumes openness to any uncertainty.

Is "portfolio optimization" infra-Bayesianism given it tends to be convex? [eg sometimes the payoff is a non-convex combination of the probability distribution payoff of the distribution payoffs of two separate stocks, perhaps if investing in one item in the portfolio affects performance on the other item, if "spreading your bets" disproportionately hits you relative to being all-in?]

[-]Noosphere893y30

I'll mention my own issues with IBP, and where the fatal issue lies in my opinion.

The most fatal objection, is as you said the monotonicity principle issue, and I suspect this is an issue because IBP is trying to both unify capabilities and values/morals, when I think they are strictly separate types of things, and in general the unification heuristic is going too far.

To be honest, if Vanessa managed to focus on how capable the IBP agent is, without trying to shoehorn an alignment solution into it, I think the IBP model might actually work.

I disagree on whether maximization of values is advisable, but I agree that the monotonicity principle is pointing to a fatal issue in IBP.

Another issue is that it's trying to solve an impossible problem, that is it's trying to avoid simulation hypotheses forming if the AI already has a well calibrated belief that we are being simulated by a superintelligence. But even under the most optimistic assumptions, if the AI is actually acausally cooperating with the simulator, we are no more equipped to fight against it than we are against alien invasions. Worst case, it would be equivalent to fighting an omnipotent and omniscient god, which pretty obviously is known to be unsolvable.

[-]Review Bot2y*10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

[-]Chris_Leong3y1-1

I’m curious why you say it handles Newcomb’s problem well. The Nirvana trick seems like an artificial intervention where we manually assign certain situations a utility of infinity to enforce a consistent condition which then ensures they are ignored when calculating the maximin. If we are manually intervening, why not just manually cross out the cases we wish to ignore, instead of adding them with infinite value then immediately ignoring them.

Just because we modelled this using infrabayesianism, it doesn’t follow that it contributed anything to the solution. It feels like we just got out what we put in, but that this is obscured by a philosophical shell game. The reason why it feels compelling is though we’re only adding in an option to then immediately ignore it, this is sufficient to give us a fake sense of having made a non-trivial decision.

It would seem that infrabayesianism might be contributing to our understanding of the problem if the infinite utility arose organically, but as far as I can tell, this is a purely artificial intervention.

I think this is made clearer by Thomas Larson’s explanation of infrabayesianism failing Transparent Newcomb’s. It seems clear to me that this isn’t an edge case; instead it demonstrates that rather than solving counterfactuals, all this trick does is give you back what you put in (one-boxing in the case where you see proof you one-box, two-boxing in the case where you see proof you two-box).

(Vanessa claims to have a new intervention that makes the Nirvana trick redundant, if this doesn’t fall prey to the same issues, I’d love to know)

[-]Vanessa Kosoy3y*40

You don't need the Nirvana trick if you're using homogeneous or fully general ultracontributions and you allow "convironments" (semimeasure-environments) in your notion of law causality. Instead of positing a transition to a "Nirvana" state, you just make the transition kernel vanish identically in those situations.

However, this is a detail, there is a more central point that you're missing. From my perspective, the reason Newcomb-like thought experiments are important is because they demonstrate situations in which classical formal approaches to agency produce answers that seems silly. Usually, the classical approaches examined in this context are CDT and EDT. However, CDT and EDT are both too toyish for this purpose, since they ignore learning and instead assume the agent already knows how the world works, and moreover this knowledge is represented in the preferable form of the corresponding decision theory. Instead, we should be thinking about learning agents, and the classical framework for those is reinforcement learning (RL). With RL, we can operationalize the problem thus: if a classical RL agent is put into an arbitrary repeated^[1] Newcomb-like game, it fails to converge to the optimal reward (although it does success for the original Newcomb problem!)

On the other hand, an infra-Bayesian RL agent provably does converge to optimal reward in those situations, assuming pseudocausality. Ofc IBRL is just a desideratum, not a concrete algorithm. But examples like Tian et al and my own upcoming paper about IB bandits show that there are algorithms with reasonably good IB regret bounds for natural hypotheses classes. While an algorithm with a good regret bound^[2] for ultra-POMDPs^[3] has not yet been proposed, it seems very like that it exists.

Now, about non-pseudocausal scenarios (such as noiseless transparent Newcomb). While this is debatable, I'm leaning towards the view that we actually shouldn't expect agents to succeed there. This became salient to me when looking at counterfactuals in infra-Bayesian physicalism. [EDIT: actually, things are different in IBP, see comment below.] The problem with non-pseudocausal updatelessness is that you expect the agent to follow the optimal policy even after making an observation that, according to the assumptions, can never happen, not even with low probability. This sounds like it might make sense when viewing an individual problem, but in the context of learning it is impossible. Learning requires that an agent that sees an observation which is impossible according to hypothesis H, discards hypothesis H and acts on the other hypotheses in has. There is being updateless, and then there is being too updateless :)

Scott Garrabrant wrote somewhere recently that there is tension between Bayesian updates and reflective consistency, and that he thinks reflective consistency is so important that we should sacrifice Bayesian updates. I agree that there is tension, and that reflective consistency is really important, and that Bayesian updates should be partially sacrificed, but it's possible to take this too far. In Yudkowsky's original paper on TDT he gives the example of an alphabetizing agent as something that can be selected for by certain decision problems. Ofc this doesn't prove non-alphabetizing is irrational. He argues that we need some criterion of "fairness" to decide which decisions problem count. I think that pseudocausality should be a part of the fairness criterion, because without that we don't get learnability^[4]: and learnability is so important that I'm willing to sacrifice reflective consistency in non-pseudocausal scenarios!

Instead of literal repetition, we could examine more complicated situations where information accumulates over time so that the nature of the game can be confidently inferred in the limit. But, the principle is the same. ↩︎
If you don't care about the specific regret bound then it's easy to come up with an algorithm based on Exp3, but that's just reducing the problem to blind trial and error of different policies, which is missing the point. The point being, the ability to exploit regularities in the world which also applies to Newcomb-like scenarios. ↩︎
You need ultra-POMDPs to model e.g. counterfactual mugging. Even ordinary POMDPs have been relatively neglected in the literature, because the control problem is PSPACE-hard. Dealing with that is an interesting question, but it seems orthogonal to the philosophical issues that arise from Newcomb. ↩︎
Although there is still room for a fairness criterion weaker than pseudocausality but stronger that the imagined fairness criterion of UDT. ↩︎

[-]Chris_Leong3y10

Thanks for the detailed response.

To be honest, I’ve been persuaded that we disagree enough in our fundamental philosophical approaches, that I’m not planning to deeply dive into infrabayesianism, so I can’t respond to many of your technical points (though I am planning to read the remaining parts of Thomas Larson’s summary and see if any of your talks have been recorded).

“However, CDT and EDT are both too toyish for this purpose, since they ignore learning and instead assume the agent already knows how the world works, and moreover this knowledge is represented in the preferable form of the corresponding decision theory” - this is one insight I took from infrabayesianism. I would have highlighted this in my comment, but I forgot to mention it.

“ Learning requires that an agent that sees an observation which is impossible according to hypothesis H, discards hypothesis H and acts on the other hypotheses in has” - I have higher expectations from learning agents - that they learn to solve such problems despite the difficulties.

[-]Vanessa Kosoy3y40

I have higher expectations from learning agents - that they learn to solve such problems despite the difficulties.

I'm saying that there's probably a literal impossibility theorem lurking there.

But, after reading my comment above, my spouse Marcus correctly pointed out that I am mischaracterizing IBP. As opposed to IBRL, in IBP, pseudocausality is not quite the right fairness condition. In fact, in a straightforward operationalization of repeated full-box-dependent transparent Newcomb, an IBP agent would one-box. However, there are more complicated situations where it would deviate from full-fledged UDT.

Example 1: You choose whether to press button A or button B. After this, you play Newcomb. Omega fills the box iff you one-box both in the scenario in which you pressed button A and in the scenario in which you pressed button B. Random is not allowed. A UDT agent will one-box. An IBP agent might two-box because it considers the hypothetical in which it pressed a button different from what it actually intended to press to be "not really me" and therefore unpredictable. (Essentially, the policy is ill-defined off-policy.)

Example 2: You see either a green light or a red light, and then choose between button A and button B. After this, you play Newcomb. Omega fills the box iff you either one-box after seeing green and pressing A or one-box after seeing green and pressing B. However, you always see red. A UDT agent will one-box if it saw the impossible green and two-box if it saw red. An IBP agent might two-box either way, because if it remembers seeing green then it decides that all of its assumptions about the world need to be revised.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

108

A mostly critical review of infra-Bayesianism

108

108

Introduction

Personal meta-note

Motivations behind the learning theoretical agenda

Is this actually useful?

Does Infra-Bayesianism actually solve the problems of embedded agency?

Specific issues of AIXI and how well infra-Bayesianism handles them

AIXI's prior is uncomputable and AIXI would require tremendous sample complexity to learn anything

Traps

Non-realizability

Newcomb-like problems

Motivations of Infra-Bayesian Physicalism

How should we think about Occam's razor and anthropics?

Simulation arguments

Ontological crisis

Is infra-Bayesian Physicalism a good solution to these problems?

Newcomb's problem and the five-and-ten problem

Occam's razor and anthorpics

Simulation arguments

Ontological crisis

The most serious objections

Ambitious value learning vs corrigibility

Conclusion