# 14

Personal Blog

Can a computable human beat a Solomonoff hyperintelligence at making predictions about an incoming sequence of bits? If the sequence is computable, he probably can't. I'm interested in what happens when the sequence can be uncomputable. The answer depends on what you mean by "beat".

Game 1: on each round you and Omega state your probabilities that the next bit will be 1. The logarithm of the probability you assigned to the actual outcome gets added to your score. (This setup is designed to incentivize players to report their true beliefs, see Eliezer's technical explanation.)

Game 2: you both start with a given sum of money. On each round you're allowed to bet some of it on 0 or 1 at 1:1 odds. You cannot go below zero. (This is the "martingale game", for motivation see the section on "constructive martingales" in the Wikipedia article on Martin-Löf randomness.)

Game 3: on each round you call out 0 or 1 for the next bit. If you guess right, you win 1 dollar, otherwise you lose 1 dollar. Going below zero is allowed. (This simple game was suggested by Wei Dai in this thread on one-logic.)

As it turns out, in game 1 you cannot beat Omega by more than an additive constant, even if the input sequence is uncomputable and you know its definition. (I have linked before to Shane Legg's text that can help you rederive this result.) Game 2 is a reformulation of game 1 in disguise, and you cannot beat Omega by more than a multiplicative constant. In game 3 you can beat Omega. More precisely, you can sometimes stay afloat while Omega sinks below zero at a linear rate.

Here's how. First let's set the input sequence to be very malevolent toward Omega: it will always say the reverse of what Omega is projected to say based on the previous bits. As for the human, all he has to do is either always say 0 or always say 1. Intuitively it seems likely that at least one of those strategies will stay afloat, because whenever one of them sinks, the other rises.

So is Solomonoff induction really the shining jewel at the end of all science and progress, or does that depend on the payoff setup? It's not clear to me whether our own universe is computable. In the thread linked above Eliezer argued that we should be trying to approximate Solomonoff inference anyway:

If you're dealing with non-exceptional situations - non-devilish environments - then shouldn't a proof of epistemic error-boundedness generally carry over to a proof of decision error-boundedness?  In other words, are you sure you're not assuming that the environment is a superintelligent adversary which is strictly more superintelligent than you, which is the sort of reasoning that leads people to adopt randomized algorithms?

Eliezer's argument sounds convincing, but to actually work it must rely on some prior over all of math, including uncomputable universes, to justify the rarity of such "devilish" or "adversarial" situations. I don't know of any prior over all of math, and Wei's restating of Berry's paradox (also linked above) seems to show that inventing such a prior is futile. So we seem to lack any formal justification for adopting the Solomonoff distribution in reasoning about physics etc., unless I'm being stupid and missing something really obvious again.

(posting this to discussion because I'm no longer convinced that my mathy posts belong in the toplevel)

# 14

New Comment

It occurs to me that an important question is, if it turns out that our universe actually contains a halting oracle (or even more powerful oracles), can an AI programmed with the Solomonoff prior exploit it properly (i.e., do no worse than a human) to accomplish its goals? This question is not really captured by any of the three games, since they all force the player to make guesses without being able to interact with the universe.

It seems hard to think of a game that does capture this question. Can anyone think of such a game?

It seems that you could design such games by taking any of the games we are discussing and modifying it so the agent is allowed to ask an oracle questions and get answers before making its decision. Predicting the results of such games seems harder.

Actually, I think a human can easily win such games. Consider game 1 with Chaitin's Ω as the string to be predicted. A human can use the halting oracle to compute Ω exactly, whereas the Solomonoff prior treats it like a random string and would converge to answering .5 for the probability of the next bit being 1.

But this isn't completely convincing (for the case that Solomonoff isn't sufficient) because perhaps an AI programmed with the Solomonoff prior can do better (or the human worse) if the halting oracle is a natural part of the environment, instead of something they are given special access to.

whereas the Solomonoff prior treats it like a random string and would converge to answering .5 for the probability of the next bit being 1.

Isn't something like "If you have access to a halting oracle, ask it for Chaitin's Ω" among the computable generators included in the Solomonoff prior? If so, the Solomonoff agent should converge to the correct sequence.

No, at least not in the usual formulation of the Solomonoff prior that I'm familiar with.

An idea from discussion with cousin_it:

What if we let AIXI play game 1, and allow probabilities to be written in some specific non-computable notation, such as a programming language that allows calls to a halting oracle. Does AIXI win this game?

How does AIXI play regular game 1? Finite binary notation isn't enough to specify the uncomputable probability values it wants to output.

If we handwave this difficulty away and say that the AI's output may include infinite sums etc., then there's a simple agent that avoids losing your game by more than a constant: the mixture of all computable consistent mixtures expressible in your notation. Proof of optimality is unchanged.

On the other hand, if only finite notation is allowed, then it's not clear what AIXI is supposed to output at each step. The AIXI paper would advise us to construct the optimal agent tailor-made for your game, using a universal prior over computable input sequences. I have no idea how such an agent would fare on uncomputable input.

First let's set the input sequence to be very malevolent toward Omega: it will always say the reverse of what Omega is projected to say based on the previous bits.

I'm not sure I buy this, but let me make sure I understand correctly. There is some super-super-intelligence generating bits, which is smarter than Omega and also antagonistic to it. That intelligence basically simulates Omega, figures out what Omega will guess, and sends the opposite value.

First, can't Omega just figure out that it's being simulated, and then find a source of pure randomness, and just guess according to that?

Second, can't Omega at least make sure it remains level with the pitiful human, by figuring out what the human will guess and copying that?

Third, how do you know there even exists a stream of bits that will make Omega lose constantly? It's not at all obvious to me that such strings exist. It seems like you're asserting the existence of N-bit strings that have Kolmogorov complexity substantially larger than N.

My Omega uses Solomonoff induction. Its beliefs about the next bit are defined mathematically and depend only on the previous bits.

I'm suspicious of this. My understanding is that true Solomonoff induction is incomputable (because it requires ordering by Kolomogorov complexity.). Thus, you can't just create an algorithm to predict its next move.

edit: That is, just because it is "defined mathematically" doesn't mean we can predict its next move.

The input sequence doesn't have to be computable, it only has to exist.

From what I understand, Omega is defined simply as a function F : L -> {0,1}, where L is the set of all strings over {0,1}, and F((b_1,...,b_n)) is interpreted as its prediction of what the bit b_{n+1} will be given the initial bits b_1,...,b_n. Then you can always define the "malevolent" bit sequence simply as b_{n+1} = ~F((b_1,...,b_n)). F can be wildly uncomputable, of course, so it makes no sense to ask about the Kolmogorov complexity of these bit sequences.

The generalization where Omega also assigns probabilities rather than just making binary guesses seems obvious.

it makes no sense to ask about the Kolmogorov complexity of these bit sequences.

The bit sequences are finite length at each step, so they do have a Kolmogorov complexity.

Game 3 with a bit stream that is maximally malevolent to epistemology X seems to be a fully general counterargument against epistemology X.

Yeah, it's a fully general counterargument against epistemologies that claim to be able to win all games.

But even fully general counterarguments are sometimes right.

True, though I think it is important to note the difference between an argument that a particular decision theory is doing something stupid, and an argument that no decision theory can do better than every other decision theory in every environment.

Sorry for editing my comment.

The point of my post was that we seem to have no justification for using Solomonoff induction if the universe can be uncomputable. Or rather that Eliezer has put forward a justification, but I don't think it's valid. Why should game 1 be a strong argument in favor of Solomonoff, if game 3 isn't allowed to be a strong argument against?

Why should game 1 be a strong argument in favor of Solomonoff, if game 3 isn't allowed to be a strong argument against?

Game 3 is also an argument against all competing epistemologies, but game 1 is an argument in favor of only Solomonoff induction.

It seems easy to invent games that favor specific epistemologies. Or do you think game 1 means something more, that it's a "typical" game in some sense, while game 3 isn't? I'd love to see a result like that.

How about if we restrict attention to games where at any stage the players are allowed to choose a probability distribution over the set of available moves, rather than being forced to choose one move? Is it then possible for Solomonoff induction to lose (whatever 'lose' means) with non-zero probability, in the limit?

In other words, does Solomonoff induction win all variants of game 1 that use different proper scoring rules in place of log score? Nice question. I'm going to sleep, will try to solve it tomorrow unless someone else does it first.

Game 1 is designed to test epistemologies as epistemologies, without considering any decision theories that use that epistemology. Figuring out a good decision theory is harder. What game 3 shows is that even starting with an ideal epistemology, you can't build a decision theory that outperforms all others in all environments.

Some approaches may converge quickly to guessing 50-50 against malevolent sequences, and so can't be beaten by more than a constant in this way.

In game 3, you can't guess 50-50, you have to choose one or the other.

Suppose we are playing any game in which we repeatedly choose an option from some potentially infinite class and then receive a bounded payoff (dependent only on my choice in this round). Then I can do as well as the best computable strategy in expectation (to within a constant) by using multiplicative weights over the space of computable strategies (I've mentioned this before, but didn't realize the difference between these set-ups at the time). Note that this strategy may be randomized, if you only allow me to bet all-or-nothing (or more generally if the space of strategies isn't convex in the suitable sense).

This lets you win probabilistically at game 3, but its not immediately clear whether it lets you win at games 1 and 2, since the payoff is not bounded. I suspect it wins at games 1 and 2, but not very strongly.

Edit: Why should we restrict attention to deterministic strategies? It is immediately clear that no deterministic strategy can outperform all other deterministic strategies, but it seems like compelling evidence in Solomonoff induction's favor if it outperforms any randomized computable strategy.

Five minutes' thought allowed me to prove the following stupid "theorem":

Consider any game (haha). The only restriction is that the game must never make total wins go below zero, as in my game 2. Then there's a general-purpose winning agent: choose one strategy at the outset, sampled from the space of all computable strategies according to some distribution, and then follow it for all eternity. Obviously, this agent's expected accumulated utilities at all times cannot be worse than any individual strategy by more than a multiplicative constant, which is equal to that strategy's weight in the initial distribution.

Perhaps this result is easy in retrospect. Now I'd like to know what happens if utility can become negative (taking the exponent doesn't seem to work), and also how to improve the agent because it looks kinda stupid (even though it solves game 2 about as well as Solomonoff does). Sorry if this all sounds obvious, I've only been studying the topic for several days.

Sorry for the many replies, your comment continues to intrigue me.

Game 2 is different from games 1 and 3 because the available options/payoffs in game 2 also depend on your current utility or previous betting history - you cannot bet more than you've won so far. In general, if such things are allowed, there's no predictor that can win all games up to an additive or multiplicative constant, even when payoffs are bounded. Here's a game that shows why: on round 1 the player chooses x=1 or x=-1, then the universe chooses y=1 or y=-1, then the game effectively ends and the player receives utility x*y on every round thereafter. Whatever mixture of x=1 and x=-1 is chosen by your proposed predictor on the first round, the universe can choose y so as to make you go into the negatives or 0 in expectation, yet there exists a computable human that receives linearly increasing utility in that universe.

Note that my reasoning in the previous comment solves game 2 just fine, but it doesn't work for game 1 or game 3 because they can go into negative utilities. Maybe we're facing different classes of game that call for different solutions? Right now it seems we have a general theorem for game 1 (log score - regular Solomonoff induction), another general theorem for games of type 2 (positive utilities - my other comment), and a third general theorem for games of type 3 (bounded payoffs - your claimed result). We ought to unify some of them.

Good idea. So you're claiming that some sort of "randomized Solomonoff induction" can win all games, including uncomputable ones, that have bounded payoffs per step (and possibly all other games too) against all computable agents who are also allowed to randomize? If true, this looks like a new and important result. We need to pin it down formally.

To recap my position on this, intuitively it seems obvious that we don't want to assign a zero probability to the universe being uncomputable, but there are purportedly arguments showing that even if the universe is uncomputable, an AI programmed with the Solomonoff prior will do nearly as well as any human. I think there are valid counterarguments to all such arguments so far, so it seems reasonable to stick with the intuitive position until we have better arguments one way or another.

Regarding my version of Berry's Paradox, I really don't understand what it shows myself. But it seems clearly related to the fact that mathematical truth cannot be defined using any formal language. There's a cluster of related problems that are all perhaps waiting for some single major insight.

Yes, this is my position as well.

Mathematical truth is something of a mystery, I wrote a post about that sometime ago. I remember you liked it. There hasn't been any progress since then so I'll just leave the link here.

Game 3 leads to a question: Is there some way we can meaningfully define a sequence as non-hostile?

When people analyze algorithms, they generally don't speak of "hostile" inputs that give you license to lay down and die. Instead they speak of average-case and worst-case performance. If you can't show that your favorite agent (e.g. Solomonoff) wins all games, the next best thing is to show that it outperforms other agents on weighted average across all games. But that would require us to invent and justify some prior over all possible input sequences including uncomputable ones, which is difficult and might even be impossible, as I pointed out in the post.

Game 1: on each round you and Omega state your probabilities that the next bit will be 1. The logarithm of the probability you assigned to the actual outcome gets added to your score.
[...]
As it turns out, in game 1 you cannot beat Omega by more than an additive constant, even if the input sequence is uncomputable and you know its definition.

I don't get this. A trivial algorithm always assigning p(0) = p(1) = 1/2 will have its score dropping by |log(1/2)| for each bit. At the same time, with the bit sequence maximally hostile to Omega, where b_{n+1} = ~OmegaPrediction((b_1...b_n)), Omega's score will always drop by at least |log(1/2)| and typically more, sometimes far more (whenever it hyper-computes any probabilities very far from 1/2).

Unless I'm missing something (as I probably am), how can the difference between the score of the trivial algorithm and Omega remain limited to "an additive constant" in any sense of the term?

The drop in Omega's score on each step will approach |log(1/2)| fast enough that the two sums won't differ by more than a constant.

That seems to be equivalent to saying that Omega's probability estimates will approach fifty-fifty fast enough. So if I understand correctly, it's a provable property of Solomonoff induction that faced with a "maximally hostile" bit-string, its probability estimates will converge to fifty-fifty fast enough that it can't blunder worse than a random guesser by more than a constant term in the log(P) game? If true, I find that quite fascinating.

Let's regard Omega's prior as being given by M(x) as shown here. Now let's divide our monotone UTM's programs into the following classes.

1. Ones that just say "Print the following: ... "
2. Every other program.

You can imagine Omega as a Bayesian reasoner trying to decide between the two hypotheses "the data was generated by a program in class 1" and "the data was generated by a program in class 2". Omega's prior will give each of these two hypotheses a non-zero probability.

To "cut to the chase", what happens is that the "extra damage" to the score caused by "class 2" falls off quickly enough, relative to the current posterior probability of "class 2", that the extra loss of score has to be finite.

I see! That's a very good intuitive explanation, thanks for writing it down.

Yes, it's a provable property of Solomonoff induction. It was a surprise to me too when I worked it out some days ago (using Shane Legg's paper linked from the post), but users AlephNeil and paulfchristiano didn't seem surprised, and Eliezer always considered it obvious I guess.

But then it seems to me that Game 3 is unfair, in that it artificially amplifies infinitesimal errors. Faced with a hostile input, a Solomonoff predictor will correctly converge to saying "I don't know," i.e. producing p(0) ~ p(1) ~ 1/2. However, a game that forces it to nevertheless make binary predictions in this situation will force an answer based on the infinitesimal differences between the calculated probabilities and 1/2, and moreover, the correct answers are contrived so that these infinitesimal differences always point in the wrong direction.

If you instead let the Solomonoff predictor just honestly say "I don't know," as in the first two games, the problem disappears.

If you instead let the Solomonoff predictor just honestly say "I don't know," as in the first two games, the problem disappears.

True, though sometimes in real life, you can't just say "I don't know", you have to choose an action. Game 3 is unfair, but real life can be unfair too.

There are serious weaknesses even over computable universes. Since Solomonoff induction doesn't search for ambient dependencies, considering instead explicit dependencies of reward on a particular action-symbol, it counts the same world multiple times if the agent is instantiated multiple times in it and receives the same prefix of the sequence of observations, and considers incorrect counterfactuals in these worlds, CDT-style. So two identical AIXI agents would defect in PD (CDT-style reasoning problem). In Newcomb's problem, its decision depends on the number of various explicit dependencies that carve the problem as two-boxing or one-boxing, so at least it's unclear what it'll do, and there seems to be no reason that it should reliably one-box. Given an anthropic problem where on a toss of a coin in one branch we create very many identical AIXI agents for the purpose of making some decision, while on the other branch there's only one agent (assuming the branches have sufficiently similar K-complexity), probability (complexity) of the branch with more agents would be double-counted as many times, so the consequences of actions in one-agent world won't be taken into consideration compared to consequences in the many-agents world. In Counterfactual Mugging, AIXI would not give money to Omega, since it updates. And so on.

Someone should really write up an UDT-AIXI that doesn't have these problems, I expect it should be straightforward enough (so that the problems like those I listed don't count against the arguments AIXI is used in, as a definition of impressive capability; make it stronger, not laugh at its flaws).

I'd say ambient dependencies are the least of AIXI's problems. It has >99% probability of wireheading, and in >99% of the remaining outcomes it disassembles itself with its mining claws. timtyler put the nail in the coffin quite a while ago.

Someone should really write up an UDT-AIXI that doesn't have these problems

Some days ago I tried to do just that, and stopped because I don't know enough. Now I'm back to studying more math. The most obvious problem is that AIXI is uncomputable, so a UDT-AIXI will probably also be uncomputable. But how does an uncomputable agent get to find itself embedded in a computable universe?

It has >99% probability of wireheading, and in >99% of the remaining outcomes it disassembles itself with its mining claws.

Wireheading is just what reinforcement learning agents are built to do, so it's not actually a problem. Hurting own hardware because anticipation admits quantum suicide is partially the same problem as relying on explicit dependencies, although it's still hard to define reward, to count the worlds that include or not include your instance (with given reward-observations), but this should be solved in any case for UDT-AIXI, and the only way to solve this that I see (which doesn't involve privileging particular physical implementation of the agent, but accepts it on any substrate within a world-program) again involves looking for ambient dependencies (namely, see if a dependence is present, and count a world-program only if it is).

So these problems are also automatically taken care of in UDT-AIXI, to the extent they are problematic.

Wireheading is just what reinforcement learning agents are built to do, so it's not actually a problem.

This comment led me to the following tangential train of thought: AIXI seems to capture the essence of reinforcement learning, but does not feel pain or pleasure. I do not feel morally compelled to help an AIXI-like agent (as opposed to a human) gain positive reinforcements and avoid negative reinforcements (unless it was some part of a trade).

After writing the above, I found this old comment of yours, which seems closely related. But thinking about an AIXI-like agent that has only "wants" and no "likes", I feel myself being pulled towards what you called the "naive view". Do you have any further thoughts on this subject?

The most obvious problem is that AIXI is uncomputable, so a UDT-AIXI will probably also be uncomputable. But how does an uncomputable agent get to find itself embedded in a computable universe?

You are interpreting AIXI too literally. Think of it as advice to aspiring superintelligences, not a magical uncomputable algorithm. It is normative decision theory, definition of the action that should be taken as a mathematical structure, even though it's not possible to have an agent that actually takes only the actions that should be taken, to compute all the necessary properties of this mathematical structure.

An UDT-AIXI would say which action a given reinforcement-learning agent should take. It's all already in the UDT post, just add UDT 1.1 provision that one really shouldn't update on observations and optimize over strategies instead, stipulate that you use a universal prior over programs, and assign utility according to expected reward, where reward is one of the channels of observation, and is counted over world-programs that "contain" corresponding agent. And then do some AIXI-style math to explore properties of the resulting structure.

My email from Nov 15 may be relevant here:

What exactly is the optimization problem to which UDT is the solution?

The question is not trivial, because of the way we define UDT. It assumes a prior over possible programs and a utility function over their execution histories. But once you fix these two mathematical structures, there's nothing left to optimize. Whatever happens, happens. So an answer to the question is bound to involve some new formal tricks. Any ideas to what they may be?

After that I went on to invent just such a formal trick (W/U/A), but it failed to clear things up.

But once you fix these two mathematical structures, there's nothing left to optimize. Whatever happens, happens.

It's a free will/epistemology (morality/truth) clash problem, expressed perhaps in agent-provability. What you'll do is defined by the laws of physics, but you can't infer what you'll do by considering the laws of physics, since there are other relevant (moral) considerations that go into deciding what to do. So you can't really say in the context of discussing decision theory that "whatever happens, happens". It's not a relevant consideration in arriving at a decision.

But it seems to be a relevant consideration when looking at the situation "from the outside" like your proposed UDT-AIXI does, right?

What do you mean? Whatever happens, happens, if you are not deciding. A normative idea of a correct decision can be thought of from the inside, even if it's generally uncomputable, and so only glimpses of the answer can be extracted from it.

From the outside, counterfactual consequences don't appear consistent. If the agent actually chooses action A, the idealized UDT-AIXI thingy will see that choosing action B would have given the agent a billion dollars, and choosing C would have given a trillion. Do you see a way around that?

UDT-AIXI could ask which moral arguments the agent would discover if it had more time to think. It won't of course examine the counterfactuals of a fact known to the context in which the resulting mathematical structure is to be interpreted. You can only use a normative consideration from the inside, so whenever you step outside, you must also shift the decision problem to allow thinking about moral considerations.

counted over world-programs that "contain" corresponding agent.

How do you formalize this? I couldn't figure it out when I tried this.

Select the worlds whose world history is ambiently controlled by the agent, that is the ambient dependence is non-constant, the conclusion of which world-history is implemented by given world-program depends on which strategy we assume the agent implements. Then read out the utility of reward channel from that strategy in that world.

Hmm... This is problematic if the same world contains multiple agent-instances that received different rewards (by following the same strategy but encountering different observations). What is the utility of such a world? This is a necessary question of specifying the decision problem. Perhaps it is a point where the notion of reinforcement learning breaks.

First let's set the input sequence to be very malevolent toward Omega: it will always say the reverse of what Omega is projected to say based on the previous bits.

This is probably a very uninformed question, but couldn't Omega break even vs. a intended-to-be malevolent bit stream by choosing a random sequence?

In the post I wasn't talking about any possible Omega, but about one specific Omega that has a mathematical definition. But I wouldn't call your question uninformed, because it might be the correct question to ask here: paulfchristiano has already suggested that Solomonoff induction augmented with randomizing has the potential to win more games than the regular kind.

[-][anonymous]10y 0

Suppose the data consisted of iid Bernoulli random variables which had probability 2/3 of being 1. In game 3, does Solomonoff induction's success rate converge to 2/3 (with probability 1)?