DavidS

Newcomblike problems are the norm

"Naively in the actual Newcombe's problem if omega is only correct 1/999,000+epsilon percent of the time…"

I'd like to argue with this by way of a parable. The eccentric billionaire, Mr. Psi, invites you to his mansion for an evening of decision theory challenges. Upon arrival, Mr. Psi's assistant brings you a brandy and interviews you for hours about your life experiences, religious views, favorite philosophers, ethnic and racial background … You are then brought into a room. In front of you is a transparent box with a $1 bill in it, and an opaque box. Mr. Psi explains:

"You may take just the solid box, or both boxes. If I predicted you take one box, then that box contains $1000, otherwise it is empty. I am not as good at this game as my friend Omega, but out of my last 463 games, I predicted "one box" 71 times and was right 40 times out of 71; I picked "two boxes" 392 times and was right 247 times out of 392. To put it another way, those who one-boxed got an average of (40*$1000+145*$0)/185 = $216 and those who two-boxed got an average of (31*$1001+247*$1)/278=$113. "

So, do you one-box?

"Mind if I look through your records?" you say. He waves at a large filing cabinet in the corner. You read through the volumes of records of Mr. Psi's interviews, and discover his accuracy is as he claims. But you also notice something interesting (ROT13): Ze. Cfv vtaberf nyy vagreivrj dhrfgvbaf ohg bar -- ur cynprf $1000 va gur obk sbe gurvfgf naq abg sbe ngurvfgf. link.

Still willing to say you should one-box?

By the way, if it bothers you that the odds of $1000 are less than 50% no matter what, I also could have made Mr. Psi give money to 99/189 one boxers (expected value $524) and only to 132/286 two boxers (expected value $463) just by hfvat gur lrne bs lbhe ovegu (ROT13). This strategy has a smaller difference in expected value, and a smaller success rate for Mr. Psi, but might be more interesting to those of you who are anchoring on $500.

Explanations for Less Wrong articles that you didn't understand

A few years ago, I tried to write a friendly introduction to this technical part.

[Open Thread] Stupid Questions (2014-02-17)

The grammar of the sentence is a bit hard to follow. When I am presenting this paradox to friends (I have interesting friends), I hand them a piece of paper with the following words on it:

Take another piece of paper and copy these words:

"Take another piece of paper and copy these words: "QQQ" Then replace the three consecutive capital letters with another copy of those words. The resulting paragraph will make a false claim."

Then replace the three consecutive capital letters with another copy of those words. The resulting paragraph will make a false claim.

I urge you to carry out the task. You should wind up with a paper that has the exact same words on it as the paper I gave you.

If you believe that the statement on my paper is true, then you should believe that the statement on your paper is false, and vice versa. Yet they are the same statement! Assuming that you think truth or falsehood is a property of grammatical sentences, independent of where they are written, this should bother you. Moreover, unlike the standard liar paradox, the paper I gave never talks about itself, it only talks about a message you will write on some other piece of paper (which does not, in turn, talk about the original message) when you perform some simple typographical operations.

Quine constructed this example to demonstrate the sort of subtleties that come up in order to invent a mathematical formalism that can talk about truth, and can talk about manipulating symbols, without bringing in the liar paradox. (To learn how this problem is solved, take a course on mathematical logic and Goedel's theorem.)

A Fervent Defense of Frequentist Statistics

Well, I was trying to make the simplest possible example. We can of course add the monkey to our pool of experts. But part of the problem of machine learning is figuring out how long we need to watch an expert fail before we go to the monkey.

A Fervent Defense of Frequentist Statistics

Suppose there are two experts, and two horses. Expert 1 always predicts horse A, expert 2 always predicts horse B, the truth is that the winning horse cycles ABABABABABA... The frequentist randomizes choice of expert according to weights; the Bayesian always chooses the expert who currently has more successes, and flips a coin when the experts are tied. (Disclaimer: I am not saying that this is the only possible strategies consistent with these philosophies, I am just saying that that these seem like the simplest possible instantiations of "when I actually choose which person to follow on a given round, I randomize according to my weights, whereas a Bayesian would always want to deterministically pick the person with the highest expected value.")

If the frequentist starts with weights (1,1), then we will have weights (1/2^k, 1/2^k) half the time and (1/2^k, 1/2^{k+1}) half the time. In the former case, we will guess A and B with equal probability and have a success rate of 50%; in the latter case, we will guess A (the most recent winner) with probability 2/3 and B with probability 1/3, for a success rate of 33%. On average, we achieve 5/12 = 41.7% success. Note that 0.583/0.5 = 1.166... < 1.39, as predicted.

On half the other horses, expert 1 has one more correct guess than expert 2, so the Bayesian will lose everyone of these bets. In addition, the Bayesian is guessing at random on the other horses, so his or her total success rate is 25%. Note that the experts are getting 50% success rates. Note that 0.75/0.5 = 1.5 < 2.41, as we expect.

As is usually the case, reducing the penalty from 1/2 to (for example) 0.9, would yield to slower convergence but better ultimate behavior. In the limit where the penalty goes to 1, the frequentist is achieving the same 50% that the "experts" are, while the Bayesian is still stuck at 25%.

Now, of course, one would hope that no algorithm would be so Sphexish as to ignore pure alternation like this. But the point of the randomized weighted majority guarantee is that it holds (up to the correctness of the random number generator) regardless of how much more complicated reality may be than the experts' models.

Even Odds

I thought it was interesting too. As far as I can tell, your result is special to the situation of two bettors and two events. The description I gave describes a betting method when there are more than two alternatives, and that method is strategy proof, but it is not fair, and I can't find a fair version of it.

I am really stumped about what to do when there are three people and a binary question. Naive approaches give no money to the person with the median opinion.

Even Odds

Here is another attempt to present the same algorithm, with the goal of making it easier to memorize:

"Each puts in the square of their surprise, then swap."

To spell this out, I predict that some event will happen with probability 0.1, you say it is 0.25. When it happens, I am 0.9 surprised and you are only 0.75 surprised. So I put down (0.9)^2 * D, you put down (0.75)^2 * D, and we swap our piles of money. Since I was more surprised, I come out the loser on the deal.

"Square of the surprise" is a quantity commonly used to measure the failure rate of predicative agents in machine learning; it is also known as Brier score. So we could describe this rule as "each bettor pays the other his or her Brier score." There was some discussion of the merits of various scoring systems in an earlier post of Coscott's.

Mental Context for Model Theory

The operations on a Rubix cube aren't abelian. Is that just a typo on your part, or am I missing some subtle point you are making?

I'm not sure what you are getting at when you say you don't want to found math on sets. I definitely intended to use the word "set" in a naive sense, so that it is perfectly fine for sets to contain numbers, or rotations of a Rubix cube, or for that matter rocks and flowers. I wasn't trying to imply that the elements of a model had to be recursively constructed from the nullset by the axioms of ZFC. If you prefer "collection of things", I'd be glad to go with that. But I (and more to the point, model theorists) do want to think of a model as a bunch of objects with functions that take as inputs these objects and make other objects, and relations which do and do not hold between various pairs of the objects.

I'm retracting a bunch of the other things I wrote after this because, on reflection, the later material was replying to a misreading of what you wrote in your following post. I still think your de-emphasis on the fact that the model is the universe is very confusing, especially when you then talk about the cardinality of models. (What is the cardinality of a rule for assigning truth values?) But on careful reading, you weren't actually saying something wrong.

Mental Context for Model Theory

The reals can be studied as models of many theories. They (with the operation +, relation = and element 0) are a model of the axioms of an abelian group. They are also a model of the axioms of a group. The reals with (+, *, 0, 1, =) are a model of the axioms of a field. The reals with (+, *, 0, 1, =, <) are a model of the axioms of an ordered field. Etcetera...

Models are things. Theories are collections of statements about things. A model can satisfy many theories; a theory can have many models. I agree completely with So8res statement that it is important to keep the two straight.

In addition, your example of Dedekind completeness is an awkward one because the Dedekind completeness axiom is a good example of the kind of thing you can't say in first order logic. (There are partial ways around this, but I'm trying to keep my replies on the introductory level of this post.) But I can just imagine that you had distinguished the reals and the rationals by saying that, in R, ∃ x : x^2=1+1 is true and in Q it is false, so I don't need to focus on that.

I am curious what kind of analysis you plan to run on the calibration questions. Obvious things to do:

For each user, compute the correlation between their probabilities and the 0-1 vector of right and wrong answers. Then display the correlations in some way (a histogram?).

For each question, compute the mean (or median) of the probability for the correct answers and for the wrong answers, and see how separated they are.

But neither of those feels like a really satisfactory measure of calibration.