I was recently disturbed by my perception that, despite years of studying and debating probability problems, the LessWrong community as a whole has not markedly improved its ability to get the right answer on them.

I had expected that people would read posts and comments by other people, and take special note of comments by people who had a prior history of being right, and thereby improve their own accuracy.

But can that possibly work? How can someone who isn't already highly-accurate, identify other people who are highly accurate?

Aumann's agreement theorem (allegedly) says that Bayesians with the same priors agree. But it doesn't say that doing so *helps*. Under what circumstances does revising your opinions, by updating in response to people you consider reliable, actually improve your accuracy?

To find out, I built a model of updating in response to the opinions of others. It did, eventually, show that Bayesians improve their collective opinions by updating in response to the opinions of other Bayesians. But this turns out not to depend on them satisfying the conditions of Aumann's theorem, or on doing Bayesian updating. It depends only on a very simple condition, established at the start of the simulation. Can you guess what it is?

I'll write another post describing and explaining the results if this post receives a karma score over 10.

That's getting a bit ahead of ourselves, though. This post models only non-Bayesians, and the results are very different.

Here's the model:

- There are G people in a group such as LessWrong.
- There are N problems being discussed simultaneously.
- Problems are binary problems, with an answer of either 1 or 0.
- Each person's opinion on each problem is always known to all people.
- Each person i has an
*accuracy*: Their probability p_{i}of getting any arbitrary problem correct on the first guess. - g
_{ivt}is what person i believes at time t is the answer to problem v (1 or 0). - p
_{ij}expresses person i's estimate of the probability that an arbitrary belief of person j is correct. - Without loss of generality, assume the correct answer to every problem is 1.

Algorithm:

# Loop over T timesteps

For t = 0 to T-1 {

# Loop over G people

For i = 0 to G-1 {

# Loop over N problems

For v = 0 to N-1 {

If (t == 0)

# Special initialization for the first timestep

If (random in [0..1] < p_{i}) g_{ivt} := 1; Else g_{ivt} := 0

Else {

# Product over all j of the probability that the answer to v is 1 given j's answer and estimated accuracy

m1 := ∏_{j} [ p_{ij}g_{jv(t-1)} + (1-p_{ij})(1-g_{jv(t-1)}) ]

# Product over all j of the probability that the answer to v is 0 given j's answer and estimated accuracy

m0 := ∏_{j} [ p_{ij}(1-g_{jv(t-1)}) + (1-p_{ij})g_{jv(t-1)} ]

p1 := m1 / (m0 + m1) # Normalize

If (p1 > .5) g_{ivt} := 1; Else g_{ivt} := 0

}

}

# Loop over G other people

For j = 0 to G-1

# Compute person i's estimate of person j's accuracy

p_{ij} := { Σ_{s in [0 .. t]} Σ_{v in [s..N]} [ g_{ivt}g_{jvs} + (1-g_{ivt})(1-g_{jvs}) ] } / N

}

}

p1 is the probability that agent i assigns to problem v having the answer 1. Each term p_{ij}g_{jv(t-1)} + (1-p_{ij})(1-g_{jv(t-1)}) is the probability of problem v having answer 1 computed using agent j's beliefs, by adding either the probability that j is correct (if j believes it has answer 1), or the probability that j is wrong (if j believes it has answer 0). Agent i assumes that everyone's opinions are independent, and multiplies all these probabilities together. The result, m1, is very small when there are very many agents (m1 is on the order of .5^{G}), so it is normalized by computing a similar product m0 for the probability that v has answer 0, and setting p1 = m1 / (m0 + m1).

The sum of sums to compute p_{ij} (i's opinion of j's accuracy) computes the fraction of problems, summed over all previous time periods, on which person j has agreed with person i's current opinions. It sums over previous time periods because otherwise, p_{ii} = 1. By summing over previous times, if person i ever changes its mind, that will decrease p_{ii}. (The inner sum starts from s instead of 0 to accomodate an addition to the model that I'll make later, in which the true answer to problem *t* is revealed at the end of time t. Problems whose answer is public knowledge should not be considered in the sum after the time they became public knowledge.)

Now, what distribution should we use for the p_{i}?

There is an infinite supply of problems. Many are so simple that everyone gets them right; many are so hard or incomprehensible that everyone performs randomly on them; and there are many, such as the Monty Haul problem, that *most* people get wrong because of systematic bias in our thinking. The range of population average performance p_{ave} on all possible problems thus falls within [0 .. 1].

I chose to model person accuracy instead of problem difficulty. I say "instead of", because you can use either person accuracy or problem difficulty to set p_{ave}. Since a critical part of what we're modeling is person i's estimate of person j's accuracy, person j should actually *have* an accuracy. I didn't model problem difficulty partly because I assume we only talk about problems of a particular level of difficulty; partly because a person in this model can't distinguish between "Most people disagree with me on this problem; therefore it is difficult" and "Most people disagree with me on this problem; therefore I was wrong about this problem".

Because I assume we talk mainly about high-entropy problems, I set p_{ave} = .5. I do this by drawing p_{i} from [0 .. 1], with a normal distribution with a mean of .5, truncated at .05 and .95. (I used a standard deviation of .15; this isn't important.)

Because this distribution of p_{i} is symmetric around .5, there is no way to know whether you're living in the world where the right answer is always 1, or where the right answer is always 0. This means there's no way, under this model, for a person to know whether they're a crackpot (usually wrong) or a genius (usually right).

Note that these agents don't satisfy the preconditions for Aumann agreement, because they produce 0/1 decisions instead of probabilities, and because some agents are biased to perform worse than random. It's worth studying non-Bayesian agents before moving on to a model satisfying the preconditions for the theorem, if only because there are so many of them in the real world.

An important property of this model is that, if person i is highly accurate, and knows it, p_{ii} will approach 1, greatly reducing the chance that person i will change their mind about any problem. Thus, the more accurate a person becomes, the less able they are to change their minds when they are wrong - and this is not an error. It's a natural limit on the speed at which one can converge on truth.

An obvious problem is that at t=0, person i will see that it always agrees with itself, and set p_{ii} = 1. By induction, no one will ever change their mind. (I consider this evidence for the model, rather than against it.)

The question of how people ever change their mind is key to this whole study. I use one of these two additions to the model to let people change their mind:

- At the end of each timestep t, the answer to problem number t becomes mutual knowledge to the entire group. (This solves the crackpot/genius problem.)
- Each person has a maximum allowable p
_{ij}(including p_{ii}).

This model is difficult to solve analytically, so I wrote a Perl script to simulate it.

- What do you think will happen when I run the program, or its variants?
- What other variants would you like to see tested?
- Is there a fundamental problem with the model?

What matters isn't so much finding the right answer, as having the right approach.

At least as far as I'm concerned, that's the main reason to spend much time here. I don't care whether the answer to Sleeping Beauty is 1/2 or 1/3, that's a mere curio.

I care about the general process whereby you can take a vague verbal description like that, map it into a formal expression that preserves the properties that matter, and use that form to check my intuitions. That's of rather more value, since I might learn how my intuitions could mislead me in situations where that matters.

Is there any real-group analog to the answer to problem t becoming mutual knowledge to the entire group? I can't think of a single disagreement here EVER to which the answer has been revealed. Further, I don't expect much revelation until Omega actually shows up.

The Sleeping Beauty problem and the other "paradoxes" of probability are problems that have been selected (in the evolutionary sense) because they contain psychological features that cause people's reasoning to go wrong. People come up with puzzles and problems all the time, but the ones that gain prominence and endure are the ones that are discussed over and over again without resolution: Sleeping Beauty, Newcomb's Box, the two-envelope problem.

So I think there's something valuable to be learned from the fact that these problems are hard. Here a... (read more)

I would like you to publish any results you may generate with your script, and promise to upvote them even if the results do not prove anything, as long as they are presented roughly as clearly as this post is.

So... why does this post have such a low rating? Comments? I find it bewildering. If you're interested in LessWrong, you should be interested in finding out under what conditions people become less wrong.

Re: "I had expected that people would read posts and comments by other people, and take special note of comments by people who had a prior history of being right, and thereby improve their own accuracy."

FWIW, I think that was how I originally approached the problem. Rather than trying to solve it directly, I first looked at your response, and Robin Hanson's response. After a little reflection, I concluded that you agreed with each other - and that you had both focused on the key issues - and got the answer right.

At that time, most of the rest o... (read more)

This is a really interesting topic, there are heaps of things I want to say about it. I was initially waiting to see what your results were first, to avoid spoilers with my guesses, but that's no way to have a conversation.

First - I think there's an error in the program: When you compute p[i][j] you take a sum then divide by N, but it looks like you should divide by the number of guesses you are adding, which can be more than N since it includes multiple rounds of guesses.

My (inconsistent) thoughts about how the model would behave:

They'd quickly learn th

The Sleeping Beauty ChallengeMaybe I'm naive, but I actually think that we can come close to consensus on the solution to this problem. This is a community of high IQ, aspiring rationalists.

I think it would be a good exercise to use what we know about rationality, evidence, biases, etc. and work this out.

I propose the following:

I will write up my best arguments in favor of the 1/2 solution. I'll keep it shorter than my original post.

Someone representing the thirders will write up their best arguments in favor of the 1/3 solution

Before reading t

I'd like a variant where there is both person accuracy p[i] and problem easiness E[j], and the odds of person i getting the correct answer initially on problem j are p[i] E[j] : (1-p[i])(1-E[j])

Ideally the updating procedure for this variant wouldn't treat everyone's opinions as independent, but it would also be interesting to see what happens when it mistakenly does treat them as independent.

The poll on the subject:

http://lesswrong.com/lw/28u/conditioning_on_observers/1ztb

...currently has 75% saying 1/3 and 25% saying 1/2. (12:4)

Collective intelligence in action?

In modeling Bayesians (not described here), I have the problem that saying "I assign this problem probabilty .5 of being true" really means "I have no information about this problem."

My original model treated that p=.5 as an estimate, so that a bunch of Bayesians who all assign p=.5 to a problem end up respecting each other more, instead of ignoring their own opinions due to having no information about it themselves.

I'm reformulating it to weigh opinions according to the amount of information they claim to have. But what's the right way to do that?

This suggestion contains a classic bootstrapping problem. If only I knew I was good at statistics, then I'd be confident of analysing this problem which tells me whether or not I'm good at statistics. But since I'm not sure, I'm not sure whether this will give me the right answer.

I think I'll stick to counting.

Comment moved.

I'm interested in hearing others responses to these questions:

As for this one:

As you know, that depends on what we want to use the model for. It ignores all sorts of structure in the real world, but that could end up being a feature rather than a bug.