How would you answer this without looking at the csv?
I wrote a post on my prior over Bernoulli distributions, called "Rethinking Laplace's Law of Success". Laplace's Law of Succession is based on a uniform prior over [0,1], whereas my prior is based on the following mixture distribution:
w1 * logistic-normal(0, sigma^2) + w2 * 0.5(dirac(0) + dirac(1)) + w3 * thomae_{100}(α) + w4 * uniform(0,1)
where:
- The first term captures logistic transformations of normal variables (weight w1), resolving the issue that probabilities should be spread across log-odds
- The second term captures deterministic programs (weight w2), allowing for exactly zero and one
- The third term captures rational probabilities with simple fractions (weight w3), giving weight to simple ratios
- The fourth term captures uniform interval (weight w4), corresponding to Laplace's original prior
The default parameters (w1=0.3, w2=0.1, w3=0.3, w4=0.3, sigma=5, alpha=2) reflect my intuition about the relative frequency of these different types of programs in practice.
Using this prior, we get the result [0.106, 0.348, 0.500, 0.652, 0.894]
The numbers are predictions for P(5th trial = R | k Rs observed in first 4 trials):
The Laplace's Rule of Succession five numbers using the are [0.167, 0.333, 0.500, 0.667, 0.833], but I think this is too conservative because it underestimate the likelihood of near-deterministic processes.
I'm not entirely sure what's being asked here. Is this asking "if we do experiment 1000001 and see k Rs in the first four trials, then what credence do you assign to the 5th trial being R?"
Or is it "if we take a random experiment out of the million and see k Rs in the first four trials, then what credence do you assign to the 5th trial being R"? This isn't the same question as the first.
Or is it something else again?
It's asking, "If I draw a histogram of the frequency of R of the fifth trial, with buckets corresponding to the number of Rs in the first four trials, what will the heights of the bars be?"
We are not doing any more experiments. All the experiments have already been done in the 1,000,000 provided experiments. I've just left out the fifth trial from these experiments.
This is almost the same question as, "If we do experiment 1000001 and see k Rs in the first four trials, then what credence do you assign to the 5th trial being R," but not quite. Your goal is to predict the marginals frequencies for the experiments I have actually conducted, not any idealized "next experiment". Because 1,000,000 trials is so many, this should be close, but they are not quite the same. The actual marginal frequencies will have some noise, for example.
I hope this helps! If you need more explanation, feel free to ask.
Also tried this, and basically ended up with the same answer as commenter One.
Key idea is that we really only care about drawing 5 trials from this process. So we just have to find a probability distribution over 6 outcomes: a count of for our 5 trials from 0-5. 10^6 datapoints is enough to kill a fair amount of noise by self-averaging, so I treated the fact that hiding a random trial has to reproduce the observed 4-trial distribution as just a hard constraint. (It's a linear constraint in the probabilities.) Then did maximum entropy optimization subject to that constraint. The output distribution in terms of 5-trial counts looked pretty symmetric and was heavier towards the extremes.
Another quick computation from these values yields the p(R | k) numbers asked for in the question: [0.11118619, 0.32422537, 0.49942029, 0.67519768, 0.88914787]
Explanation:
Hypothesis 1: The data are generated by a beta-binomial distribution, where first a probability x is drawn from a beta(a,b) distribution, and then 5 experiments are run using that probability x. I had my coding assistant write code to solve for the a,b that best fit the observed data and show the resulting distribution for that a,b. It gave (a,b) = (0.6032,0.6040) and a distribution that was close but still meaningfully off given the million experiment sample size (most notably, only .156 of draws from this model had 2 R's compared with the observed .162).
Hypothesis 2: With probability c the data points were drawn from a beta-binomial distribution, and with probability 1-c the experiment instead used p=0.5. This came to mind as a simple process that would result in more experiments with exactly 2 R's out of 4. With my coding assistant writing the code to solve for the 3 parameters a,b,c, this model came extremely close to the observed data - the largest error was .0003 and the difference was not statistically significant. This gave (a,b,c) = (0.5220,0.5227,0.9237).
I could have stopped there, since the fit was good enough so that anything else I'd do would probably only differ in its predictions after a few decimal places, but instead I went on to Hypothesis 3: the beta distribution is symmetric with a=b, so the probability is 0.5 with probability 1-c and drawn from beta(a,a) with probability c. I solved for a,c with more sigfigs than my previous code used (saving the rounding till the end), and found that it was not statistically significantly worse than the asymmetric beta from Hypothesis 2. I decided to go with this one because on priors a symmetric distribution is more likely than an asymmetric distribution that is extremely close to being symmetric. Final result: draw from a beta(0.5223485278, 0.5223485278) distribution with probability 0.9237184759 and use p=0.5 with probability 0.0762815241. This yields the above conditional probabilities out to 6 digits.
Answer:
[0.111020, 0.324512, 0.5, 0.675488, 0.888980]
I will provide my solution when the market is resolved.
Decided to provide my solution since others have done so as well.
Solution
The public dataset is approximately symmetrical, so it is very likely that the distribution of the Bernoulli rate is also symmetrical (probability at p is equal to probability at 1-p). Let the probabilities of getting k Rs over all 5 trials for k=0...5 be . Then, from the public dataset, we have . These have standard deviation which is negligible, so we can treat these as linear equations. Solving, we get , and we can then solve for the marginal frequencies etc.
Not sure if this (experiment set?) is a good test of priors, since I got an exact answer without having to consider priors, other than the data being symmetrical. (This also means that any symmetric distribution for the Bernoulli rate will result in the same answer.) Though @DaemonicSigil has a similar solution without using symmetry, instead using
maximum entropy as a prior (if i understand it correctly).
Still, almost all reasonable priors will result in very similar outcomes, differing by a factor probably on the order of the standard deviation (around .) This is likely less than, or at least comparable to, the noise in the actual marginal frequencies.
You're mostly right. The other solves have given pretty much identical distributions.
Some of your distributions are worse than other distributions. If I run 100,000,000 experiments and calculate the frequencies, some of you will be more off at the fourth decimal point.
The market doesn't have that kind of precision, and even if it did, I wouldn't change the resolution criterion. But I can still score you guys myself later on.
I do agree that I should have given much fewer public experiments. Then it would be a better test on priors.
You do get one guarantee, though: All the experiments are Bernoulli processes. In particular, the order of the trials is irrelevant.
I think those aren't quite equivalent statements? If I pick my favorite string of bits, and shuffle it by a random permutation, then the probability of each bit being 1 is equal, the order is totally irrelevant (it was chosen at random), but it's not Bernoulli because the trials aren't independent of each other (if you know what my favorite string of bits is, you can learn the final bit as soon as you've observed all the rest.)
That's what "in particular" means, i.e. the "the order of the trials is irrelevant" is a particular feature
Correct, they are not equivalent. The second statement is a consequence of the first. I made this consequence explicit to justify my choice later on to bucket by the number of s but not their order.
The first statement, though, is also true. It's your full guarantee.
To clarify, the ground truth P(R) is constrained to be constant over the 5 trials of any given experiment?
No; your distribution gives probabilities [0.253247, 0.168831, 0.155844, 0.168831, 0.253247] for the number of Rs in the first four trials. This predicts that the number of experiments with two Rs is binomially (i.e. approximately normally) distributed with mean ~155844 and standard deviation ~363, but the actual number is 161832, around 16 standard deviations away from the mean.
I have run 1,000,000 experiments. Each experiment consists of 5 trials with binary outcomes, either (for left) or (for right).
However, I'm not going to tell you how I've picked my experiments. Maybe I'm just flipping a fair coin each time. Maybe I'm using a biased coin. Or maybe I'm doing something completely different, like dropping a bouncy ball down a mountain and checking whether it hits a red rock or a white rock first--and different experiments are conducted on different mountains. I might be doing some combination of all three.
You do get one guarantee, though: All the experiments are Bernoulli processes. In particular, the order of the trials is irrelevant.
Your goal is to guess the marginal frequencies of the fifth trial. For each , you need to tell me the frequency that the fifth trial is an given that of the outcomes of the first four trials are .
For example, if every experiment is just flipping a fair coin, then the fifth trial will be an with probability , no matter what the first four are. However, if I'm using biased coins, then the frequency of will increase the more s seen.
To help you in your guessing, I have provided a csv of all the public trials. As an answer, please provide a list like [0.3, 0.4, 0.5, 0.6, 0.7] of your frequencies--the kth element of your list is the marginal frequency over the experiments with of the first four trials being .
I haven't yet looked at the frequencies myself, but I will do so shortly after posting this. If you want to test your guesses against others, I have created a market on Manifold Markets. I will resolve the market before I reveal the correct frequencies, which will happen in around two weeks, but maybe earlier or later depending on trading volume.
Good luck!