Idealized rational agents are often discussed as operating in an uncertain environment, or as using probabilities to correctly reason anthropically (especially in MWI or Tegmark style multiverses). I believe that both of these sources of uncertainty are essential for a rational agent, but thinking about them obscures an important (I suspect ultimately more important) type of uncertainty, faced by computationally bounded agents making predictions about a fixed, perfectly understood, deterministic universe. This type of uncertainty is described by Eliezer in his description of TDT, but he doesn't deal there with any of the mathematical theory that would govern such uncertainties. I am somewhat surprised that I have not encountered more discussion of this form of uncertainty and the difficulty of dealing with it mathematically.

Consider a system consisting of two interacting algorithms: a predictor and a universe. Periodically the universe sends observations to the predictor, or asks the predictor to make a prediction (ie to answer some question about the state of the universe). The observer must answer these queries using some bounded resource as dictated by the universe; typically we imagine that the observer is allotted a fixed length of time. We rate the quality of a predictor by measuring how often its predictions agree with what the universe expects.

Now suppose I give you the description of the universe, and ask for the description of a good predictor. In some universes, I believe that there are very good predictors who perform something like Bayesian inference over mathematical statements---despite the fact that any consistent assignment of probabilities gives most statements of interest probability either 0 or 1.

For example, suppose that a particular theorem X would be useful to make future predictions. I would like to know whether or not X is true, but the proof may be too complex to deal with (or X may be independent of whatever axioms I have programmed the predictor to believe, or finding a perfectly valid proof may be too hard a problem...). To this end, the predictor may develop much more powerful rules of inference which allow it to establish the truth of X more directly, but which are only valid heuristically. That is, it may use rules of the form "If A and B are true, C is probably true."

This immediately raises two questions: how can we formally describe such a rule, and how could a predictor ever acquire this sort of probabilistic knowledge?

This does not answer the question of how such knowledge is acquired. I can be included as an axiom in a system: for example, I can believe the axioms of PA with probability 1, that ZFC is consistent with probability 99%, that the RH is true with probability 95%, that the RH is provable in ZFC with probability 90%, or that the results of my inference algorithm are "well-calibrated" in a precise sense with probability 50%. I can also derive probabilistic knowledge from exact knowledge: for example, I can believe that random strings probably have high Kolmogorov complexity; I can believe that if you randomly choose whether to negate a statement, the result is true with probabiltiy 50%. I can also learn acquire new beliefs by performing inference, once I already have some beliefs which are probabilistic. This is particularly important when combined with access to observations from the universe. For example, if I believe that algorithm A probably plays Go optimally, then when I observe A's play it alters many beliefs: it will affect my confidence that A actually plays optimally and all of the beliefs which shaped my belief that A plays optimally, it will affect my beliefs about the optimal play in that position, it may affect my beliefs about other Go playing algorithms I have considered, etc.

The possibility of probabilistic rather than exact beliefs may also allow us to get around hard problems with self-reference. For example, it may be that I can be "quite confident" that "almost everything" I am "quite confident" about is true, while it can never be the case that I am certain that everything I am certain about is true. I feel that questions of this form are currently of fundamental importance to our efforts to understand automated reasoning (especially for agents who are actually embedded in the universe they are reasoning about) which makes me particularly interested in this form of uncertainty. Unfortunately, even to evaluate the possibility of an end-run around incompleteness requires a much deeper understanding of mathematical uncertainty than anyone seems to have yet developed.

I haven't said anything of mathematical substance. What I mean to argue is that:

1. This is an interesting and important mathematical question.

2. There are a lot of mathematical subtleties involved; moreover, they don't just govern weird corner cases. It is generally unclear how to perform such inference or evaluate the correctness of a particular inference / the reasonableness of a set of beliefs. .

I agree this is one of the really central problems in rationality. There was some previous discussion on this topic at http://lesswrong.com/lw/f9/a_request_for_open_problems/bun including links to some books/articles on the topic, but unfortunately none of it goes very far towards solving the problem. I think the lack of further discussion on logical/mathematical uncertainty isn't due to lack of interest, but instead due to lack of new ideas.

It seems like you can start looking at easier issues even if the general picture remains opaque.

For example, suppose I have an algorithm A that assigns probabilities to assertions. Can A axiomatically assign probability >90% to an assertion like "If A assigns probability of 90% to X, then X is true." without automatically concluding X with probability 90%?

Another example: what sort of consistency properties can we consistently enforce of beliefs? That is, we can define a bunch of rules that are satisfied by "reasonable" beliefs, such as P(A or B) <= P(A) + P(B). What sets of rules can we satisfy simultaneously? Which sets of rules should we try to have our beliefs satisfy? Can we even describe inference as having some implicitly defined probability assignment which is updated over time?

Another example: we would like to make ambient control work with probabilistic beliefs. Can we make any progress towards having this work? There seem to be a lot of approachable technical details, which you might be able to solve while still having some abstraction like a black box that directs the inference problem.

Sure, if you see any loose threads around, definitely pull on it.

What I've been thinking about lately is whether it makes sense to assign probabilities to mathematical statements at all. Intuitively it seems "obvious" that it does, but logical uncertainty doesn't clearly satisfy VNM axioms. There do seem to be examples where such probabilities clearly make sense.

For example if you're looking at the output of a pseudorandom number generator with a random seed, which is computationally indistinguishable from a string picked with a uniformly random distribution, it seems clear that you're justified in assigning probability 0.5 to the next bit being 1, even if you've seen enough of the string that a logically omniscient Bayesian can deduce the seed and predict the next bit with near certainty.

Can we find other examples? How far can we generalize them?

I'm curious what you mean by "approachable technical details". Can you give some examples?

For example, suppose you have an algorithm that does inference for some amount of time, and then is able to answer queries of the form "What is the probability of A," implicitly representing a (not very consistent) probability distribution. You could use this algorithm in place of a theorem prover when controlling a constant program, to choose the action that maximizes the expected utility of the implicit conditional probability distribution. I was thinking that there would be some technical issues in formalizing this, but I guess the only issue is in the first example I gave.

That said, there are fundamental seeming questions about UDT which this change might affect. For example, what properties of the inference engine might allow you to prove cooperation between two UDT agents playing a prisoner's dilemma? As far as I know, no such conditions are known for explicit reasoners, and it seems like the difficulties there are fundamental.

For each statement, consider the maximal odds at which you would bet on it? Seems like that rule associates a number to every mathematical statement. I don't know what you call those numbers, but it seems like you should be doing inference with them.

That's exactly what I did in my first post about UDT. Did you miss that part? (See the paragraph just before "Example 1: Counterfactual Mugging".)

Definitely, a lot of issues in UDT seem to hinge on getting a better understanding of logical uncertainty, but I'm still not sure what "approachable technical details" we can try to solve now, as opposed to trying to better understand logical uncertainty in general.

But there is no single odds at which I'm willing to bet on a particular statement. For one thing, I'm risk averse so the odds depends on the amount of the bet. Now if VNM axioms apply, then that risk aversion can be folded into a utility function and my preferences can be expressed as expected utility maximization, so it still makes sense to talk about probabilities. But it's not clear that VNM axioms apply.

The "approachable technical details" I was imagining were of the form "What would an inference algorithm have to look like--how would it have to implicitly represent its probability distribution, and how inconsistent could that distribution be--in order for it to make sense to use it with UDT." After having thought about it more, I realized these questions aren't very extensive and basically boil down to the first example I gave.

Yes. Moreover, I think I had never read your first post about UDT.

I don't quite understand your objection, but probably because I am confused. What I imagine doing, very precisely, is this: using your preferences over outcomes and VNM, make a utility function, defined on outcomes. Using this utility function, offer wagers on mathematical statements (ie, offer bets of the form "You get outcome A; or, you get outcome B if statement X is true, and outcome C if statement X is false.", where A, B, C have known utilities)

It's not clear to me that you often bet on "mathematical statements". I suspect that the character of what exactly you are making decisions about is a more general consideration than any assignment of "probabilities" to those things.

You mean as a matter of practice? If I knew a TOE and the initial conditions of the universe, its possible that every question I care about is a mathematical one. So I care about the probabilities of mathematical questions to exactly the same extent that I care about probabilities of events, right?

Moreover, if I am dealing with another UDT agent, statements about their decisions are mathematical statements. So in the prisoner's dilemma, for example, you are forced to bet on a statement like "The other player in this prisoner's dilemma will cooperate". As far as I can tell from a brief discussion with cousin_it, there is no clear way for an explicit reasoner to be able to prove such a statement with certainty (because of incompleteness issues).

This is technically wrong, see the posts linked from ADT wiki page. While you might be unable to prove what you'll do, and what other agents whose actions depend on yours would do, you could prove such statements if you assume that you'll perform a given possible action, which is how you decide which of the possible actions to pick.

So where you won't be able to unconditionally simulate other agents, you might be able to simulate them conditionally on your own possible decision (with actual decision depending on the conclusions drawn from that simulation).

Fine. You need to bet on statements of the form "If I cooperate, my opponent will cooperate." The difference makes no difference to my post.

None of the posts linked to from the ADT page resolve my concern, which is this: if you have two ADT agents playing the prisoner's dilemma, unless they use very similar theorem provers there is no way they can cooperate. This is a very simple problem, and you have basically no hope of cooperating in any non-trivial situation.

The problem seems to be that there is no way both ADT agents can prove that the others' axioms are consistent (cousin_it raised the concern, but it also seems critical to me). So I can't rule out the possibility that the other ADT agent will randomly prove something like "If I defect, I will get 100000 utility" and then defect regardless of what my action is.

Do you know some way around this difficulty?

This is not needed for the basic cooperation scenario discussed in those posts. It's possible to prove that the other agent performs the same action as you.

(Similarly, if you've proved that 2+2=4, it can still be wrong if your axioms were inconsistent, but this is not a concern that can be addressed on the same level where you're proving things like 2+2=4.)

(You can, see the posts.)

Which post am I supposed to see?

Here is how I imagine ADT working. I search for proofs that [A = X => U >= Y], or something like that, and then output X that maximizes the best provable lower bound on my utility.

In this view, I do not see how to prove cooperation in any non-trivial situation whatsoever. Could you link to a particular argument which shows how to prove cooperation in some situation?

cousin_it's AI cooperation in practice post shows how a ADT-like system can cooperate--one which looks explicitly for proofs of the form "The other guy will do what I do" and then if it finds one acts appropriately. Could you describe how you would generalize this ad-hoc solution to other cooperation problems? It seems like you really want the general version of ADT to be able to work on the prisoner's dilemma, and I don't see how it would work.

AI cooperation in practice gives a specific setup that allows cooperation. The post links to a sketch of a proof for AIs cooperating in the setup. Each of these AIs will do well against arbitrary programs. This is a counterexample to your statement that

What a reduction of "could" could look like gives a more flexible scheme that's applicable to this case. Notion of Preference in Ambient Control gives a logical framework for describing the reasoning process of such agents.

It works in this special case. The special case shows how it could work. I make no further claims.

Sorry, I think we are talking across each other.

The setup in AI cooperation in practice is a specific algorithm that looks like ADT and cooperates. I agree that this is very good to know, and I am familiar with the argument.

I would like to be able to prove that some sort of algorithms running a general form of ADT can cooperate on the prisoner's dilemma. I think I am failing to communicate the distinction between this and cousin it's result.

The issue is that the algorithms considered in cousin it's result aren't using ADT. They are looking for a proof of a particular statement, constructed in an ad-hoc way (how do you generalize the statement "Me and the other AI have the same output"?), and if they find it they cooperate. They are not looking for proofs of statements of the form "If I cooperate, then I will get a certain utility." As far as I can tell, if you were to consider two algorithms actually using ADT, instead of the ADT-like algorithm cousin it describes, then they would

notbe able to find a proof of a statement of the form "My opponent will do the same thing I will." The reason they cannot find such a proof is that they cannot exclude the possibility that the other AI will find a proof such as "By defecting I attain utility 100000." Therefore they cannot prove, under any conditions, that the other AI will cooperate. Unless the other AI is using a very similar algorithm to search for proofs, knowledge of my own output does not help me prove that the other AI won't find a proof of "By defecting I attain utility 100000."The description of what you are talking about helps.

This discussion is related to the problem that an ADT agent considering standard Newcomb's problem could, in principle, find the following complete collection of moral arguments: [two-box => 1000; one-box => 0]. Since as a result, the agent would two-box, both statements are true and therefore permitted to be provable by agent's inference system. We only expect that it'll instead find the arguments [two-box => 1000; one-box => 1000000] out of some unstated assumptions about its inference system that have nothing to do with logical validity of the statements it proves.

And so, since the system is not fully specified by merely saying that it "implements ADT", and its decisions depend on the missing details, we can't prove that systems that "implement ADT" will in fact cooperate. Stating further assumptions is necessary.

(This is basically a proof that definition of ADT agents can't be proven to imply their cooperation in PD.)

I see that being an ADT agent could never be enough to guarantee cooperation.

The thing I would like to see--the thing whose absence strikes me as a serious issue--is a generally applicable instantiation of ADT which cooperates with a slightly modified version of itself.

For example, consider the following agent. It knows its own source code. Two copies of it are then run, given large integer inputs N, M, and play the prisoner's dilemma with each other. The first instance searches for proofs of length <= N that "cooperating -> utility >= A" or "defecting -> utility >= B". It then considers the best lower bound it has proved, and outputs the corresponding decision. The second instance does the same thing, but searching for proofs of length <= M. It seems like if N, M are sufficiently large these two algorithms should cooperate, but they don't.

Is there something confused about my desire for a "less ad hoc" agent which can cooperate on PD? Is there some reason that we should expect to be able to find ADT agents which cooperate in any reasonable situation without finding a less ad hoc algorithm that cooperates on PD? Is there something confused about my belief that inability to consistently assume consistency is the reason for this particular failure to cooperate? Is there likely to be some other way to get around the difficulty?

Edit: To make this more explicit, I feel intuitively that AIs in this situation should be able to cooperate, because I feel as though it is reasonable to make some probabilistic assumption of consistency. If you could formalize this notion, I suspect you would find reasonable agents who cooperate on PD. To me this seems like the easiest way to resolve the problem, because I don't see where else there is to go.

Why not? My argument shows that they don't necessarily cooperate, but I expect that they will cooperate in any natural implementation not specifically contrived to prevent that (although I don't have a proof). You seem to expect to have shown that they necessarily don't cooperate. If we specify ADT agents additionally, then they can of course know each other's additional specification, and so know enough to conditionally prove each other's cooperation. In this particular case, with proof length limits M and N, if they can both prove that faster than the lowest time limit, the fact that the time limits are different shouldn't matter (other than as a technical issue in the proofs).

I expect most reasonable implementations of ADT agents with insane amount of computational resources to act correctly, for reasons analogous to those given in cousin_it's proof (there are some simplifications in the proof that generalize better).

I don't see what consistency has to do with anything here. The problem with possible implementations of ADT agents I pointed out is not caused by unsoundness of their inference system with respect to the intended semantics. All the statements those agents prove are true.

I certainly haven't proved that any particular reasonable ADT agents don't cooperate. But I have convinced myself that cousin_it's argument is unlikely to generalize; I don't see how they could prove conditionally the other would coordinate, because if their axioms are inconsistent the guy with the longer proof length may, even if he searches for proofs in a very natural way, prove that defecting is unconditionally better than cooperating.

Again, that's not the problem. ADT agents can just make the decision the moment they find a complete collection of moral arguments (i.e. a utility function, in the sense of my post), which means that if a given utility function composed of agent-provable moral arguments is found, the decision follows from that.

(If the moral arguments have the form [A=A1 => U=U1], then there can be only one such agent-provable moral argument for each A1, so once the agent found one for each A1, it can as well stop looking. Consequences appear consistent.)

Consider a pair (A,B) of provable (but not necessarily agent-provable) moral arguments, for example A="one-box => utility=1000000" and B="two-box => utility=1000". These are statements provable (true) in the intended semantics, which means that if an agent manages to prove B'="two-box => utility=1001000", and two-box as a result of finding (A,B'), its inference system must be inconsistent (or something). But there is a different way in which an agent can decide to two-box, not requiring an inconsistency. It can just find a moral argument A'="one-box => utility=23" before finding A, and decide to choose two-boxing based on (A',B). Both A' and B would be true.

(I know I'm reiterating the same thing I said before, but you haven't signaled that you've understood the argument, and you keep repeating something about inconsistency that I don't understand the relevance of, without answering my questions about that.)

I now understand what you are saying; thank you for being patient.

What I am confused by now is your optimism. I presented an argument based on incompleteness which convinced me that reasonable ADT agents won't cooperate. In response you pointed out that incompleteness isn't really the problem--there are other failure modes anyway. So why is it that you believe "reasonable" ADT agents will in fact cooperate, when I (unaware of this other failure mode) already believed they wouldn't?

Part of what convinces me that something bad is going on is the following. Consider an agent in Newcomb's problem who searches exhaustively for a proof that two-boxing is better than one-boxing; if he finds one he will two-box, and if he can't he will one-box by default. By an argument like the one given in AI coordination in practice, this agent will two-box.

Your optimism seems to depend heavily on the difference between "enumerating all proofs up to a certain length and looking for the best provable utility guarantees" and "enumerating proofs until you find a complete set of moral arguments, and behaving randomly if you can't." Why do you believe that a complete set of moral arguments is provable in reasonable situations? Do you know of some non-trivial example where this is really the case?

Yes, and this answers your preceding question:

The strategy of "enumerating proofs until you find a complete set of moral arguments" doesn't suffer from the incompleteness issue (whatever it is, if it's indeed there, which I doubt can have the simple form you referred to).

I don't believe it is provable in any reasonable time, but perhaps given enough time it can often be proven. Building a set of mathematical tools for reasoning about this might prove a fruitful exercise, but I have shelved this line of inquiry for the last few months, and wasn't working on it.

My point was that in principle they can cooperate, and the reasons for difficulty in arranging cooperation in practice are not related to incompleteness. Considering conditional statements gets rid of the incompleteness-related problems.

Good post, I think you're looking in the right direction :-)

More generally, you could think about formal systems where each axiom comes with a prior probability. I'm not sure if such things have been studied, but it sounds promising. But the million dollar question here is this: when you're given a new mathematical statement that doesn't seem to be immediately provable or disprovable from your axioms, how do you assign it a probability value? I remember thinking about this a lot without success.

True, but I'm not sure how such things can help you when you're facing "true" logical uncertainty in a deterministic universe. Possibly relevant discussion on MathOverflow: Can randomness add computability?

50%? Sometimes you can infer its likely truth value from the conditions in which you've come upon it, but given that there are about as many false statements as true ones of any form, this default seems right. Then, there could be lots of syntactic heuristics that allow you to adjust this initial estimate.

"Syntactic heuristics" is a nice turn of phrase, but you could have just as well said "you can always say 50%, or maybe use some sort of clever algorithm". Not very helpful.

I don't expect there is any clever simple trick to it. (But I also don't think that assigning probabilities to logical statements is a terribly useful activity, or one casting foundational light on decision theory.)

Can you explain this a bit more? Do you have any reasons for this suspicion?

I don't have any reasons for the suspicion that assigning probabilities to logical statements casts foundational light on decision theory. I don't see how having such an assignment helps any.

I suspect that if you are presented with overwhelming data and have to answer quickly, there is no strategy that tends to beat making the correct inference from a limited set of data, chosen at pseudo-random.

That doesn't constitute a strategy. What is the set of data you use to infer whether the RH is true? To infer whether ZFC is consistent? To decide what play to make in a game of Go against an algorithm you know, but running on a faster computer? To decide whether to one-box or two-box in Newcomb's problem against a flawed simulator? To decide whether to cooperate or defect in the prisoner's dilemma against an opponent you understand completely and who understands you completely?

(These last two problems cannot be posed in the framework I gave, but they do involve broadly similar issues. A deeper understanding of probabilistic reasoning seems to me to be essential to get the "right answer" in these cases.)

I was thinking more of problems along the lines of "here is the entire history of object X's behavior and lots of related stuff, which you do not have enough time to process completely. What will X do next?"

A good set of data for the Riemann hypothesis and similar things would be the history of mathematics and the opinions of mathematicians. How often have similar opinions been accurate/inaccurate? This seems roughly like what I Was talking about, though the inhomogeneity of real-world data means you can certainly beat random picking by going after low-hanging fruit.

The Go problem is interesting. Since Go can be "solved," if the other algorithm is optimal you're screwed, or will play optimally too if your computer is fast enough. If both of those are false, and you have time to prepare, you could train an algorithm specifically to beat the known opponent, which, if done optimally, would again not pick randomly... guess I was wrong about that.

The last two don't seem to exhibit the property you're talking about, and instead the solutions should be fairly complete.

I don't think that's true. There are lots of questions humans deal with that have the following properties:

Typically, they are questions about human affairs rather than the hard sciences.

We answer them, far from perfectly, but still much better than any computer program yet devised.

We do this at least in part by integrating large volumes of weak evidence.

It seems likely - I would go so far as to say beyond reasonable doubt - that no algorithm that tried to instead use precise inference on smaller volumes of evidence could match human performance on such questions, given realistic constraints on available data and thinking time.

I think this free book (A New Kind of Science, by Stephen Wolfram) is all about that. But I haven't read it so I could be wrong. Here is a quote:

This comment seem to be pure mysterious answer to a mysterious question.

That describes all of NKS, as far as I can tell.

Be fair. Some of it's made up of mysterious answers to perfectly simple questions too...

No comment on that. I referred to Xi's citing it in this context.

I think you're posing the problem of induction, which is a classic problem in philosophy.

Whether you call this "the problem of induction" or not, I have read an awful lot of literature on "induction" and very little that speaks to or acknowledges this problem.

I don't think he is. Solomonoff answers the problem of induction (in its classical philosophical form at least) but not the problem this post discusses.

The questions in the post

seemto be sub-problems of the question of: how do you build a computable approximation to Solomonoff induction. The literature on that is the literature that deals with compression.