Probability, knowledge, and meta-probability

I guess my position is thus:

While there are sets of probabilities which by themselves are not adequate to capture the information about a decision, there always is a set of probabilities which *is* adequate to capture the information about a decision.

In that sense I do not see your article as an argument against using probabilities to represent decision information, but rather a reminder to use the correct set of probabilities.

18yMy understanding of Chapman's broader point (which may differ wildly from his
understanding) is that determining which set of probabilities is correct for a
situation can be rather hard, and so it deserves careful and serious study from
people who want to think about the world in terms of probabilities.

Probability, knowledge, and meta-probability

I don't think it's correct to equate probability with expected utility, as you seem to do here. The probability of a payout is the same in the two situations. The point of this example is that the probability of a particular event does not determine the optimal strategy. Because utility is dependent on your strategy, that also differs.

Hmmm. I was equating them as part of the standard technique of calculating the probability of outcomes from your actions, and then from there multiplying by the utilities of the outcomes and summing to find the expected u... (read more)

08yYes, I see your point (although I don't altogether agree). But, again, what I'm
doing here is setting up analytical apparatus that will be helpful for more
difficult cases later.
In the mean time, the LW posts I pointed to here
[http://lesswrong.com/lw/igv/probability_knowledge_and_metaprobability/9r49] may
motivate more strongly the claim that probability alone is an insufficient guide
to action.

Probability, knowledge, and meta-probability

The subtlety is about what numerical data can formally represent your full state of knowledge. The claim is that a mere probability of getting the $2 payout does not.

However, a single probability for each outcome given each strategy *is* all the information needed. The problem is not with using single probabilities to represent knowledge about the world, it's the straw math that was used to represent the technique. To me, this reasoning is equivalent to the following:

"You work at a store where management is highly disorganized. Although they pr... (read more)

18ySo, my observation is that without meta-distributions (or A_p), or conditioning
on a pile of past information (and thus tracking /more/ than just a probability
distribution over current outcomes), you don't have the room in your knowledge
to be able to even talk about sensitivity to new information coherently. Once
you can talk about a complete state of knowledge, you can begin to talk about
the utility of long term strategies.
For example, in your example, one would have the same probability of being paid
today if 20% of employers actually pay you every day, whilst 80% of employers
never paid you. But in such an environment, it would not make sense to work a
second day in 80% of cases. The optimal strategy depends on what you know, and
to represent that in general requires more than a straight probability.
There are different problems coming from the distinction between choosing a long
term policy to follow, and choosing a one shot action. But we can't even
approach this question in general unless we can talk sensibly about a sufficient
set of information to keep track of about. There are two distinct problems, one
prior to the other.
Jaynes does discuss a problem which is closer to your concerns (that of
estimating neutron multiplication in a 1-d experiment 18.15, pp579
[http://commonsenseatheism.com/wp-content/uploads/2013/06/Jaynes-The-Ap-distribution-and-rule-of-succession.pdf]
. He's comparing two approaches, which for my purposes differ in their prior A_p
distribution.

08yIt may be helpful to read some related posts (linked by lukeprog in a comment on
this post): Estimate stability [http://lesswrong.com/lw/h78/estimate_stability/]
, and Model Stability in Intervention Assessment
[http://lesswrong.com/lw/hnf/model_stability_in_intervention_assessment/], which
comments on Why We Can't Take Expected Value Estimates Literally (Even When
They're Unbiased)
[http://lesswrong.com/lw/745/why_we_cant_take_expected_value_estimates/]. The
first of those motivates the A_p (meta-probability) approach, the second uses
it, and the third explains intuitively why it's important in practice.

08yJeremy, I think the apparent disagreement here is due to unclarity about what
the point of my argument was. The point was not that this situation can't be
analyzed with decision theory; it certainly can, and I did so. The point is that
different decisions have to be made in two situations where the probabilities
are the same.
Your discussion seems to equate "probability" with "utility", and the whole
point of the example is that, in this case, they are not the same.

Probability, knowledge, and meta-probability

The exposition of meta-probability is well done, and shows an interesting way of examining and evaluating scenarios. However, I would take issue with the first section of this article in which you establish single probability (expected utility) calculations as insufficient for the problem, and present meta-probability as the solution.

In particular, you say

... (read more)What’s interesting is that, when you have to decide whether or not to gamble your first coin, the probability is exactly the same in the two cases (p=0.45 of a $2 payout). However, the rational course

28yA single probability cannot sum up our knowledge.
Before we talk about plans, as you went on to, we must talk about the world as
it stands. We know there is a 50% chance of a 0% machine and a 50% chance of a
90% machine. Saying 45% does not encode this information. No other number does
either.
Scalar probabilities of binary outcomes are such a useful hammer that we need to
stop and remember sometimes that not all uncertainties are nails.

18yJeremy, thank you for this. To be clear, I wasn't suggesting that
meta-probability is the solution. It's a solution. I chose it because I plan to
use this framework in later articles, where it will (I hope) be particularly
illuminating.
I don't think it's correct to equate probability with expected utility, as you
seem to do here. The probability of a payout is the same in the two situations.
The point of this example is that the probability of a particular event does not
determine the optimal strategy. Because utility is dependent on your strategy,
that also differs.
Yes, absolutely! I chose a particularly simple problem, in which the correct
decision-theoretic analysis is obvious, in order to show that probability does
not always determine optimal strategy. In this case, the optimal strategies are
clear (except for the exact stopping condition), and clearly different, even
though the probabilities are the same.
I'm using this as an introductory wedge example. I've opened a Pandora's Box:
probability by itself is not a fully adequate account of rationality. Many odd
things will leap and creep out of that box so long as we leave it open.

58yThe substantive point here isn't about EU calculations per se. Running a full
analysis of everything that might happen and doing an EU calculation on that
basis is fine, and I don't think the OP disputes this.
The subtlety is about what numerical data can formally represent your full state
of knowledge. The claim is that a mere probability of getting the $2 payout does
not. It's the case that on the first use of a box, the probability of the payout
given its colour is 0.45 regardless of the colour.
However, if you merely hold onto that probability, then if you put in a coin and
so learn something about the boxes you can't update that probability to figure
out what the probability of payout for the second attempt is. You need to go
back and also remember whether the box is green or brown. The point of Jaynes
and the A_p distribution is that it actually does screen off all other
information. If you keep track of it you never need to worry about remembering
the colour of the box, or the setup of the experiment. Just this
"meta-distribution".

Doublethink (Choosing to be Biased)

It's also irrelevant to the point I was making. You can point to different studies giving different percentages, but however you slice it a significant portion of the men she interacts with would have sex with her if she offered. So maybe 75% is only true for a certain demographic, but replace it with 10% for another demographic and it doesn't make a difference.

12yIt does affect your point.

29yOh, it certainly doesn't affect your point. I agree with your point completely.
I was just nitpicking the numbers.

Causal Universes

I was reading a lesswrong post and I found this paragraph which lines up with what I was trying to say

Some boxes you really can't think outside. If our universe really is Turing computable, we will never be able to concretely envision anything that isn't Turing-computable—no matter how many levels of halting oracle hierarchy our mathematicians can talk about, we won't be able to predict what a halting oracle would actually say, in such fashion as to experimentally discriminate it from merely computable reasoning.

2012 Less Wrong Census/Survey

Analysis of the survey results seems to indicate that I was correct: http://lesswrong.com/lw/fp5/2012_survey_results/

Causal Universes

Yes, I agree. I can imagine some reasoning being concieving of things that are trans-turing complete, but I don't see how I could make an AI do so.

Causal Universes

As mentioned below, we you'd need to make infinitely many queries to the Turing oracle. But even if you could, that wouldn't make a difference.

Again, even if there was a module to do infinitely many computations, the code I wrote still couldn't tell the difference between that being the case, and this module being a really good computable approximation of one. Again, it all comes back to the fact that I am programming my AI on a turing complete computer. Unless I somehow (personally) develop the skills to program trans-turing-complete computers, then wh... (read more)

09yAh, I see. I think we were answering different questions. (I had this feeling
earlier but couldn't pin down why.) I read the original question as being
something like "what kind of hypotheses should a hypothetical AI hypothetically
entertain" whereas I think you read the original question as being more like
"what kind of hypotheses can you currently program an AI to entertain." Does
this sound right?

Causal Universes

I don't see how this changes the possible sense-data our AI could expect. Again, what's the difference between infinitely many computations being performed in finite time and only the computations numbered up to a point too large for the AI to query being calculated?

If you can give me an example of a universe for which the closest turing machine model will not give indistinguishable sense-data to the AI, then perhaps this conversation can progress.

09yWell, for starters, an AI living in a universe where infinitely many
computations can be performed in finite time can verify the responses a Turing
oracle gives it. So it can determine that it lives in a universe with Turing
oracles (in fact it can itself be a Turing oracle), which is not what an AI
living in this universe would determine (as far as I know).

Causal Universes

Even if the world weren't computable, any non-computable model would be useless to our AI, and the best it could do is a computable approximation.

Again, what distinguishes a "turing oracle" from a finite oracle with a bound well above the realizable size of a computer in the universe? They are indistinguishable hypotheses. Giving a turing complete AI a turing oracle doesn't make it capable of understanding anything more than turing complete models. The turing-transcendant part must be an integral part of the AI for it to have non-turing-com... (read more)

29ySuppose the AI lives in a universe where infinitely many computations can be
performed in finite time...
(I'm being mildly facetious here, but in the interest of casting the
"coherently-thinkable" net widely.)

Causal Universes

Well I suppose starting with the assumption that my superintelligent AI is merely turing complete, I think that we can only say our AI has "hypothesis about the world" if it has a computable model of the world. Even if the world weren't computable, any non-computable model would be useless to our AI, and the best it could do is a computable approximation. Stable time loops seem computable through enumeration as you show in the post.

Now, if you claim that my assumption that the AI is computable is flawed, well then I give up. I truly have no idea how to program an AI more powerful than turing complete.

39ySuppose the AI lives in a universe with Turing oracles. Give it one.

Money: The Unit of Caring

If you don't spend two months salary on a diamond ring, it doesn't mean you don't love your Significant Other. ("De Beers: It's Just A Rock.") But conversely, if you're always reluctant to spend any money on your SO, and yet seem to have no emotional problems with spending $1000 on a flat-screen TV, then yes, this does say something about your relative values.

I disagree, or at least the way it's phrased is misleading. The obvious completion of the pattern is that you care more about a flat screen TV than your SO. But that's not a valid com... (read more)

2012 Less Wrong Census/Survey

From what I could read on the iqtest page, it seemed that they didn't do any correction for self-selection bias, but rather calculated scores as if they had a representative sample. Based on this I would guess that the internet IQ test will underestimate your score (p=0.7)

19yUnless there are significant numbers of people, myself for example, who take the
test multiple times with varied random algorithms just to see how it affects the
outcome. I'd only put a (p=0.55) at the test underestimating your score,
conditional that it doesn't correct for self-selection bias.
Though, given that the lowest score appears to be "less than 79", rather than an
exact number, they may simply drop any scores under 79 from their pool, or at
the very least weight them differently. Has anybody identified a similar maximum
score which would support this hypothesis of discarding outliers?

2012 Less Wrong Census/Survey

Luckily it will remain possible for everyone to do so for the foreseeable future.

How to Deal with Depression - The Meta Layers

Thanks for this. Although I don't suffer from depression, the comments about meta-suffering really resonate with me. I think (this is unverified as of yet) that my life can be improved by getting rid of meta-suffering.

Circular Altruism

I certainly wouldn't pay that cent if there was an option of preventing 50 years of torture using that cent. There's nothing to say that my utility function can't take values in the surreals.

New study on choice blindness in moral positions

I'll make sure to keep you away from my body if I ever enter a coma...

Oh don't worry, there will always be those little lapses in awareness. Even supposing you hide yourself at night, are you sure you maintain your sentience while awake? Ever closed your eyes and relaxed, felt the cool breeze, and for a moment, forgot you were aware of being aware of yourself?

Less Wrong Polls in Comments

So what did you guess then?

6[anonymous]9yI guessed "the only winning move is not to play"
(I didn't guess. rationalization: I didn't want to do the thinking, and can't
see the results anyway)

Less Wrong Polls in Comments

Or maybe that's what I want you to think I'd say...

0[anonymous]9yThe noise in my simulations quickly drown out any actual logic and the markov
chain reaches its stable distribution.

Less Wrong Polls in Comments

Hey everyone, I just voted, and so I can see the correct answer. The average is 19.2, so you should choose 17%!

0[anonymous]9yOf course that's what you'd say...

Doublethink (Choosing to be Biased)

Perhaps I am just contrarian in nature, but I took issue with several parts of her reasoning.

"What you're saying is tantamount to saying that you want to fuck me. So why shouldn't I react with revulsion precisely as though you'd said the latter?"

The real question is why should she react with revulsion if he said he wanted to fuck her? The revulsion is a response to the tone of the message, not to the implications one can draw from it. After all, she can conclude with >75% certainty that any male wants to fuck her. Why doesn't she show r... (read more)

19y... she can? Really? That seems pretty damn high for something as variable as
taste in partners.
EDIT: wait, that's a reference to how many guys on a university campus will
accept offers of one night stands, right? It's still too high, or too general.

Rationality Quotes May 2012

No, you can only get an answer up to the limit imposed by the fact that the coastline is actually composed of atoms. The fact that a coastline *looks* like a fractal is misleading. It makes us forget that just like everything else it's fundamentally discrete.

This has always bugged me as a case of especially sloppy extrapolation.

The island of knowledge is composed of atoms? The shoreline of wonder is not a fractal?

510yOf course you can't really measure on an atomic scale anyway because you can't
decide which atoms are part of the coast and which are floating in the sea. The
fuzziness of the "coastline" definition makes measurement meaningless on scales
even larger than single atoms and molecules, probably. So you're right, and we
can't measure it arbitrarily large. It's just wordplay at that point.

Decision Theories: A Semi-Formal Analysis, Part III

You're right, if the opponent is a TDT agent. I was assuming that the opponent was simply a prediction=>mixed strategy mapper. (In fact, I always thought that the strategy 51% one-box 49% two box would game the system, assuming that Omega just predicts the outcome which is most likely).

If the opponent is a TDT agent, then it becomes more complex, as in the OP. Just as above, you have to take the argmax over all possible y->x *mappings*, instead of simply taking the argmax over all outputs.

Putting it in that perspective, essentially in this case we ... (read more)

010yYup, this is where I'm going in a future post. See the footnote on this post
about other variants of TDT; there's a balance between missing workable deals
against genuinely stubborn opponents, and failing to get the best possible deal
from clever but flexible opponents. (And, if I haven't made a mistake in the
reasoning I haven't checked, there is a way to use further cleverness to do
still better.)
For now, note that TDT wouldn't necessarily prefer to be a hard-coded 99%
cooperator in general, since those get "screw you" mutual defections from some
(stubborn) agents that mutually cooperate with TDT.

110yIncidentally, my preferred version of Newcomb is that if the Predictor decides
that your chance of one-boxing is p, it puts (one million times p) dollars in
the big box. Presumably, you know that the Predictor is both extremely
well-calibrated and shockingly accurate (it usually winds up with p near 0 or
near 1).

Decision Theories: A Semi-Formal Analysis, Part III

Well, it certainly will defect against any mixed strategy that is hard coded into the opponent’s source code. On the other hand, if the mixed strategy the opponent plays is dependent on what it predicts the TDT agent will play, then the TDT agent will figure out which outcome has a higher expected utility:

(I defect, Opponent runs "defection predicted" mixed strategy)

(I cooperate, Opponent runs "cooperation detected" mixed strategy)

Of course, this is still simplifying things a bit, since it assumes that the opponent can perfectly predic... (read more)

010yWon't that let the opponent steal utility from you? Consider the case where
you're going up against another TDTer which is willing to consider both the
strategy "if they cooperate only if I cooperate, then cooperate with 99%
probability" and "if they cooperate only if I cooperate, then cooperate." You
want your response to the first strategy to be defection and your response to
the second strategy to be cooperation, so it's in their interests to play the
second strategy.

The So-Called Heisenberg Uncertainty Principle

Okay, I completely understand that the Heisenberg Uncertainty principle is simply the manifestation of the fact that observations are fundamentally interactions.

However, I never thought of the *uncertainty principle* as the part of quantum mechanics that causes some interpretations to treat observers as special. I was always under the impression that it was quantum entanglement... I'm trying to imagine how a purely wave-function based interpretation of quantum entanglement would behave... what is the "interaction" that localizes the spin wavefunction, and why does it seem to act across distances faster than light? Please, someone help me out here.

SotW: Check Consequentialism

Er, this is assuming that the information revealed is not intentionally misleading, correct? Because certainly you could give a TDT agent an extra option which would be rational to take on the basis of the information available to the agent, but which would still be rigged to be worse than all other options.

Or in other words, the TDT agent can never be aware of such a situation.

210yAmendment accepted.

I've had it with those dark rumours about our culture rigorously suppressing opinions

Isn't this an invalid comparison? If The Nation were writing for an audience of reader which *only* read The Nation, wouldn't it change what it prints? The point is these publications are fundamentally part of a discussion.

Imagine if I thought there were fewer insects on earth then you did, and we had a discussion. If you compare the naive person who reads only my lines vs the naive person who reads only your lines, your person ends up better off, because on the whole, there are indeed a very large number of insects on earth This will be the case regardl... (read more)

The Singularity Institute's Arrogance Problem

Here: http://lesswrong.com/lw/ua/the_level_above_mine/

I was going to go through quote by quote, but I realized I would be quoting the entire thing.

Basically:

A) You imply that you have enough brainpower to consider yourself to be approaching Jaynes's level. (approaching alluded to in several instances) B) You were surprised to discover you were not the smartest person Marcello knew. (or if you consider surprised too strong a word, compare your reaction to that of the merely very smart people I know, who would certainly not respond with "Darn"). C)... (read more)

To me the part that stands out the most is the computation of P() by the AI.

From this description, it seems that P is described as essentially omniscient. It knows the locations and velocity of every particle in the universe, and it has unlimited computational power. Regardless of whether possessing and computing with such information is possible, the AI will model P as being literally omni... (read more)