Here are the relevant quotes:
- Gather proposals for a hundred RCTs ...
- Randomly pick 5% of the proposed projects, fund them as written, and pay off the investors who correctly predicted what would happen.
- Take the other 95% of the proposed projects, give the investors their money back, and use the SWEET PREDICTIVE KNOWLEDGE [to take useful actions]
Other than the difference in the portion of the markets you run (1/20 vs 1/1000), this is equivalent.
(It does not discuss liquidity costs, just the the randomization as a way to avoid having to take many random actions.)
Just for the record, Dynomight proposed this back in 2022: https://dynomight.net/prediction-market-causation/#commit-to-randomization. (I assume that the idea has been around for longer.)
(Also I would phrase it as being able to use the same money to trade on all 1000 of the markets at once. I think that is equivalent to your free loan.)
I was able to deduce them by
making a scatter-plot of Colleen vs Liboulen's predictions. You can see that this plot has the points on a "flattened prism" in 3 directions, and manually count the shifts and see that each of the underlying components has 10 possible values.
Once you have that structure, you can pick out points on the extremes and use their slopes to calculate some of the relevant slopes. Finally, I brought in Bella's info and used that to work out the remaining stats. (I used chatGPT for some help throwing together some linear regressions
At this point I am throwing everything that I found in a linear regression, because I ran out of time. My pick is:
Candidate 11, with an estimated 0.91 chance of success.
Candidates 19 and 7 would be my next choices, with 0.87 and 0.85 estimated chances of success respectively.
If I had had more time to work on this, I would have like to look at:
A summary of some interesting results. I am leaving how I found some of this out for now, for brevity's sake.
I have manage to extract 6 integer variables that range from 1-10.
3 of them are from the components of (Coleen, Linestra, Liboulen, Bella), the other 3 are from (Fizz, Ister, Ziqual).
Each of them has a very similar histogram, sort of like a truncated normal distribution. A linear regression of them with Holly gives approximately 1 as their coefficient, except for 1 variable (which I am calling X2 for now) which has a coefficient of roughly -1.
All of
A few miscellaneous observations:
Ister, Ziqual and Fizz seem to have some pretty deterministic structure connecting them.
Ister always predicts an integer between 51 and 60 inclusive.
Ziqual's prediction is equal to (Ister - 50) * (Integer from 1 to 10) - (one of 0, 1). Multipliers in the 5 to 7 range are most common.
Fizz's prediction is less than or equal to (Ister's prediction + 10). Fizz's prediction is greater than or equal to 44.
Separately, a scatterplot of Liboulen and Colleen's predictions has a lot of structure: [Scatterplot removed since it se
Story of a mostly homeless guy who scammed Isaac King out of $300. Isaac sued in small claims court on principle, did all the things, and none of it mattered.
This link goes to Sarah's tweet, not to Isaac's story.
This is not what the article says. It says that BC is re-criminalizing hard drugs.
I am in BC, and have not heard anything about decriminalizing marijuana. I get the sense that it being legal is generally popular. Complaints about drug users are common here, but they are usually not talking about weed.
I would expect that player 2 would be able to win almost all of the time for most normal hash functions, as they could just play randomly for the first 39 turns, and then choose one of the 2^8 available moves. It is very unlikely that all of those hashes are zero. (For commonly used hashes, player 2 could just play randomly the whole game and likely win, since the hash of any value is almost never 0.)
In addition to the object level reasons mentioned by plex, misleading people about the nature of a benchmark is a problem because it is dishonest. Having an agreement to keep this secret indicates that the deception was more likely intentional on OpenAI's part.
Based on the quote from Kirkpatrick, It looks like a clear example of preference falsification, but I do not see any reason to believe that it is internalized preference falsification. Did I miss how the submissive apes were internalizing the preference to not mate? The sentence "This is an easy to understand example of an important general fact about humans: we can be threatened into internalized preference falsification, i.e. preference inversion." makes me think that you intended it as an example of primates internalizing a preference falsification. It ...
As an example of how Manifold reacted to a (crude) attempt at manipulation:
Dr P (a Manifold user) would create and bet yes on markets for "Will Trump be president on [some date]?" for various dates where there was no plausible way trump would be president. Other users quickly noticed and set up limit orders to capture this source of free money. Eventually Dr. P's bets were cancelled out quickly enough that they had little to no effect on the probability, and it became hard to find one of those bets profit from. Eventually Dr P gave up and their account bec...
One thing that I have seen on manifold is markets that will resolve at a random time, with a distribution such that at any time, their expected duration (from the current day, conditional on not having already resolved) is 6 months. They do not seem particularly common, and are not quite equivalent to a market with a deadline exactly 6 months in the future. (I can't seem to find the market.)
The timing evidence is thus hostile evidence and updating on it correctly requires superintelligence.
What do you mean by this? It seems trivially false that updating on hostile evidence requires superintelligence; for example poker players will still use their opponent's bets as evidence about their cards, even though these bets are frequently trying to mislead them in some way.
The evidence being from someone who went against the collective desire does mean that confidently taking it at face value is incorrect, but not that we can't update on it.
The LW staff are necessary to take down the site. If we assume that there are multiple users that are willing to press the button, then the (shapely-attributed) blame for taking the site down mostly falls on the LW staff, rather than whoever happens to press the button first.
According to http://shapleyvalue.com/?example=8 if there were 6 people who were willing to push the button, the LW team would deserve 85% of the blame. (Here I am considering the people who take actions that act to facilitate bringing down the site as part of the coalition.)
I am not qu...
Here is an example of something that comes close from "The Selfish Gene":
...One of the best-known segregation distorters is the so-called t gene in mice. When a mouse has two t genes it either dies young or is sterile, t is therefore said to be lethal in the homozygous state. If a male mouse has only one t gene it will be a normal, healthy mouse except in one remarkable respect. If you examine such a male's sperms you will find that up to 95 per cent of them contain the t gene, only 5 per cent the normal allele. This is obviously a gross distortion of the 50
I had not thought of self-play as a form of recursive self-improvement, but now that you point it out, it seems like a great fit. Thank you.
I had been assuming (without articulating the assumption) that any recursive self improvement would be improving things at an architectural level, and rather complex (I had pondered improvement of modular components, but the idea was still to improve the whole model). After your example, this assumption seems obviously incorrect.
Alpha-go was improving its training environment, but not any other part of the training process.
The left hand side of the example is deliberately making the mistake described in your article, as a way to build intuition on why it is a mistake.
(Adding instead of averaging in the update summaries was an unintended mistake)
Thanks for explaining how to summarize updates, it took me a bit to see why averaging works.
Seeing the equations, it was hard to intuitively grasp why updates work this way. This example made things more intuitive for me:
If an event can have 3 outcomes, and we encounter strong evidence against outcomes B and C, then the update looks like this:
The information about what hypotheses are in the running is important, and pooling the updates can make the evidence look much weaker than it is.
The figure you are referring to does not need to add up to 100%, since it is showing P[data | aliens] and P[data | no aliens].
P[data | aliens] and P[not data | aliens] need to add to 100%, but that is not on the graph.
As an extreme case where P[A | B] + P[A | C] != 1, consider A = coin did not land on its edge, B = the coin is ordinary, C = the coin is weighted to land heads twice as often as tails.
Then P[A | B] = 0.9999 and P[A | C] = 0.9999 would be reasonable values.