royf

Comments

An overall schema for the friendly AI problems: self-referential convergence criteria

It seems that your research is coming around to some concepts that are at the basis of mine. Namely, that noise in an optimization process is a constraint on the process, and that the resulting constrained optimization process avoids the nasty properties you describe.

Feel free to contact me if you'd like to discuss this further.

Utility vs Probability: idea synthesis

This is not unlike Neyman-Pearson theory. Surely this will run into the same trouble with more than 2 possible actions.

[LINK] Causal Entropic Forces

Our research group and collaborators, foremost Daniel Polani, have been studying this for many years now. Polani calls an essentially identical concept empowerment. These guys are welcome to the party, and as former outsiders it's understandable (if not totally acceptable) that they wouldn't know about these piles of prior work.

A Little Puzzle about Termination

You have a good and correct point, but it has nothing to do with your question.

a machine can never halt after achieving its goal because it cannot know with full certainty whether it has achieved its goal

This is a misunderstanding of how such a machine might work.

To verify that it completed the task, the machine must match the current state to the desired state. The desired state is any state where the machine has "made 32 paperclips". Now what's a paperclip?

For quite some time we've had the technology to identify a paperclip in an image, if one exists. One lesson we've learned pretty well is this: don't overfit. The paperclip you're going to be tested on is probably not one you've seen before. You'll need to know what features are common in paperclips (and less common in other objects) and how much variability they present. Tolerance to this variability will be necessary for generalization, and this means you can never be sure if you're seeing a paperclip. In this sense there's a limit to how well the user can specify the goal.

So after taking a few images of the paperclips it's made, the machine's major source of (unavoidable) uncertainty will be "is this what the user meant?", not "am I really getting a good image of what's on the table?". Any half-decent implementation will go do other things (such as go ask the user).

Right for the Wrong Reasons

The "world state" of ASH is in fact an "information state" of p("heads")>SOME_THRESHOLD

Actually, I meant p("heads") = 0.999 or something.

(C), if I'm following you, maps roughly to the English phrase "I know for absolutely certain that the coin is almost surely heads".

No, I meant: "I know for absolutely certain that the coin is heads". We agree that this much you can never know. As for getting close to this, for example having the information state (D) where p("heads") = 0.999999: if the world is in the state "heads", (D) is (theoretically) possible; if the world is in the state "ASH", (D) is impossible.

Can you give me some examples of the kinds of cases you have in mind?

Mundane examples may not be as clear, so: suppose we send a coin-flipping machine deep into intergalactic space. After a few billion years it flies permanently beyond our light cone, and then flips the coin.

Now any information state about the coin, other than complete ignorance, is physically impossible. We can still say that the coin is in one of the two states "heads" and "tails", only unknown to us. Alternatively we can say that the coin is in a state of superposition. These two models are epistemologically equivalent.

I prefer the latter, and think many people in this community should agree, based on the spirit of other things they believe: the former model is ontologically more complicated. It's saying more about reality than can be known. It sets the state of the coin as a free-floating property of the world, with nothing to entangle with.

Right for the Wrong Reasons

I probably need to write a top-level post to explain this adequately, but in a nutshell:

I've tossed a coin. Now we can say that the world is in one of two states: "heads" and "tails". This view is consistent with any information state. The information state (A) of maximal ignorance is a uniform distribution over the two states. The information state (B) where heads is twice as likely as tails is the distribution p("heads") = 2/3, p("tails") = 1/3. The information state (C) of knowing for sure that the result is heads is the distribution p("heads") = 1, p("tails") = 0.

Alternatively, we can say that the world is in one of these two states: "almost surely heads" and "almost surely tails". Now information state (A) is a uniform distribution over these states; (B) is perhaps the distribution p("ASH") = 0.668, p("AST") = 0.332; but (C) is impossible, and so is any information state that is more certain than reality in this strange model.

Now, in many cases we can theoretically have information states arbitrarily close to complete certainty. In such cases we must use the first kind of model. So we can agree to just always use the first kind of model, and avoid all this silly complication.

But then there are cases where there are real (physical) reasons why not every information state is possible. In these cases reality is not constrained to be of the first kind, and it could be of the second kind. As a matter of fact, to say that reality is of the first kind - and that probability is only in the mind - is to say more about reality than can possibly be known. This goes against Jaynesianism.

So I completely agree that not knowing something is a property of the map rather than the territory. But an impossibility of any map to know something is a property of the territory.

Right for the Wrong Reasons

To clarify further: likelihood is a relative quantity, like speed - it only has meaning relative to a specific frame of reference.

If you're judging my calibration, the proper frame of reference is what I knew at the time of prediction. I didn't know what the result of the fencing match would be, but I had some evidence for who is more likely to win. The (objective) probability distribution given that (subjective) information state is what I should've used for prediction.

If you're judging my diligence as an evidence seeker, the proper frame of reference is what I would've known after reasonable information gathering. I could've taken some actions to put myself in a difference information state, and then my prediction could be better.

But it's unreasonable to expect me to know the result beyond any doubt. Even if Omega is in an information state of perfectly predicting the future, this is never a proper frame of reference by which to judge bounded agents.

And this is the major point on which I'm non-Yudkowskian: since Omega is never a useful frame of reference, I'm not constraining reality to be consistent with it. In this sense, some probabilities are in the territory.

Right for the Wrong Reasons

This is perhaps not the best description of actualism, but I see your point. Actualists would disagree with this part of my comment:

If I believed that "you will win" (no probability qualifier), then in the many universes where you didn't I'm in Bayes Hell.

on the grounds that those other universes don't exist.

But that was just a figure of speech. I don't actually need those other universes to argue against 0 and 1 as probabilities. And if Frequentists disbelieve in that, there's no place in Bayes Heaven for them.

Right for the Wrong Reasons

we've already seen [...] or [...] in advance

Does this answer your question?

Right for the Wrong Reasons

Predictions are justified not by becoming a reality, but by the likelihood of their becoming a reality [1]. When this likelihood is hard to estimate, we can take their becoming a reality as weak evidence that the likelihood is high. But in the end, after counting all the evidence, it's really only the likelihood itself that matters.

If I predict [...] that I will win [...] and I in fact lose fourteen touches in a row, only to win by forfeit

If I place a bet on you to win and this happens, I'll happily collect my prize, but still feel that I put my money on the wrong athlete. My prior and the signal are rich enough for me to deduce that your victory, although factual, was unlikely. If I believed that you're likely to win, then my belief wasn't "true for the wrong reasons", it was simply false. If I believed that "you will win" (no probability qualifier), then in the many universes where you didn't I'm in Bayes Hell.

Conversely in the other example, your winning itself is again not the best evidence for its own likelihood. Your scoring 14 touches is. My belief that you're likely to win is true and justified for the right reasons: you're clearly the better athlete.

[1] Where likelihood is measured either given what I know, or what I could know, or what anybody could know - depending on why we're asking the question in the first place.

Load More