Epistemic status: untested combination of ideas I think are individually solid. This is intended to be a quick description of some obvious ideas rather than a deeply justified novel contribution; I'll cite some things but not be especially careful to review all prior work.
Prediction markets are a simple idea, most commonly implemented as sports betting: who will win a particular game? People place bets, the outcome is determined, and then bookies give money to the correct predictors. There are also prediction markets for arbitrary events like Metaculus. Here I want to focus on how prediction markets can be useful for the process of doing science.
Why are prediction markets useful, to begin with? Let's consider the example of sports betting, again. As an observer of sports, you might have some idea that one team is more likely to beat another team, but you likely don't have a clear quantitative sense of how likely. As well, as you could discover by listening in to conversations between observers, disagreement about those beliefs is common, and often due to conflicting models, observations, and loyalties.
One common betting market means that prices will fluctuate until all the information is aggregated, and then there will be a 'consensus' number that reflects overall societal uncertainty. A journalist writing about the game doesn't need to be any good at assessing how likely teams are to win, or how representative their sample of interviewed experts is; they can directly report the betting market odds as indicative of overall beliefs. [After all, if someone disagreed with the betting market odds, they would place a bet against them, which would cause the odds to move.]
This has several useful effects:
When a paper is published, readers might want to know: was this a one-off fluke, or would another experiment conducted the same way have the same results? This is a job for a prediction market, and it's worked well before on Many Labs 2, a psychology paper replication project. Peers in a field have a sense of which papers are solid and which are flimsy; prediction markets make that legible and easily shared instead of the product of intuitive expertise learned over lots of research.
Note that you can ask not just "will the results be significant and in the same direction?" but "what will the results be?", but at least for that trial that market received less trading and performed much worse. This seems like the sort of thing that is possible to eventually develop skill for--sports bets that were once limited to 'who will win?' now often deal with 'will A win by at least this much?', a subtler question that's harder to estimate. We're still in the infancy of explicitly betting on science, so the initial absence of that skill doesn't seem very predictive about the future.
The idea of 'replication' markets idea generalizes; you don't need to have run the first paper! You could write the 'methods' section of a paper and see what results the market expects. This makes more sense in some fields than others; if you write a survey and then give it to a thousand people, your design is unlikely to change in flight, but many more exploratory fields have several non-informative and thus unpublished study designs on the path to a finished paper. This can still be usefully predicted in advance.
The main value of outcome markets is creating a field-wide consensus on undone experiments. For experiments where everyone agrees on the expected result, it may not be necessary to run the experiment at all (and a journal focused on 'novel results' might, looking at the market history, decide not to publish it). For experiments where the field disagrees on the result and there's significant trading, then there's obvious demand to run the experiment and generate the results to settle the bet.
Note that this has two important subpoints: review and disagreement.
Peer review is often done to studies after they are 'finished', in the author's point of view. This necessarily means that any reviewer comments on how the paper should have been done differently would have been appreciated earlier in the process (which is where outcome markets move them). Having proposed experiments be available to the field before the experiments have been run means that many suggestions can be incorporated (or not). Senior researchers with strong guidance can give advice to anyone, rather than just students that work with them directly.
Second, many fields, despite being motivated by controversies between research groups, write their papers 'in isolation'. They identify an effect that they can demonstrate isn't the result of noise, interpret that effect as supporting their broader hypothesis, and then publish. This is not a very Bayesian way to go about things; what's interesting is not whether you think an outcome-that-happened is likely, but whether you think it's likelier than the other side thinks. If both groups think the sun will rise tomorrow, then publishing a paper about how your hypothesis predicts the sun will rise and you observed the sun rising doesn't do much to advance science (and might set it back, as now the other group needs to spend time writing a response).
Trading volume can fix this, as proposals about whether or not the sun will rise receive basically no disagreements, whereas proposals about controversial topics will receive significant bets from both parties. This pushes research towards double-crux and adversarial collaboration. (Note that, in order to bet lots of their stake on a proposal, the other group needs to believe it's well-designed and will be accurately interpreted, incentivizing more shared designs and fewer attempts at dishonest gotchas.)
Often, people want to 'trust the science', while the science is focused on exploring the uncertain frontier instead of educating the masses. People who give TED talks and publish mass-market books have their views well-understood, probably out of proportion with their reputation among experts. In cases where journalists disagree with the experts, it's the journalists writing the newspapers.
Openly readable markets (even if only 'experts' can bet on them) make for easier transmission of the current scientific consensus (or lack thereof), even on recent and controversial questions (where the tenor of the controversy can be fairly and accurately reported). Hanson discusses how reporting on cold fusion could have more adequately conveyed the scientific community's skepticism if there had been a common and reputable betting market.
For the long term, it seems like publicly accessible markets (rather than closed expert markets) are probably better on net; if the public really wants to know whether a particular fad diet is accurate, they can vote with their dollars to fund studies on those diets. If a particular expert has set up a small fiefdom where they control peer review and advancement, only the discovery of the price distortions by a larger actor outside the fiefdom (who can afford to out-bet the locals) can allow for their correction.
If you're interested in working on this, the main group that I know of doing things here is the Science Prediction Market Project, and Robin Hanson has been thinking about this for a long time, writing lots of things worth reading.
For many gambling markets, the underlying events are designed to be random or competitions between people, which invites cheating to make things 'easy to predict'. Attempts to guard against that are thus quite important and might dominate thinking on the issue.
For gambling markets about external events or cooperative projects (like science), this is of reduced importance (while still being important!). A researcher might want to raise their personal prestige at the expense of the societal project, by inappropriately promoting their hypotheses or suppressing disagreements or falsifying data. There still need to be strong guardrails in place to make that less likely. Prediction markets can help, here, by rewarding people who choose to fight bad actors and win.
While outcome markets would likely start as play-money or prestige games, you could imagine a future, more-developed version which replaces a lot of the granting pipeline, using the market mechanism itself to determine funding for experiments.
In this vision, time spent writing grant applications (which involves a bunch of politics and misrepresentation) and reviewing papers would be replaced by writing experimental designs and commenting and betting on them. If scientists spend less time on meta and more time on the object level, it'll be because the system is more efficient (which isn't obviously the case) or because the roles are specialized out into different roles, where object-level scientists are more like price-takers in the prediction markets who provide very little meta-effort and meta-level scientists are traders who might provide very little object-effort.
See also the (related but distinct) Visitor's proposal from Moloch's Toolbox, chapter 3 of Inadequate Equilibria:
VISITOR: Two subclasses within the profession of “scientist” are suggesters, whose piloting studies provide the initial suspicions of effects, and replicators whose job it is to confirm the result and nail things down solidly—the exact effect size and so on. When an important suggestive result arises, two replicators step forward to confirm it and nail down the exact conditions for producing it, being forbidden upon their honor to communicate with each other until they submit their findings. If both replicators agree on the particulars, that completes the discovery. The three funding bodies that sustained the suggester and the dual replicators would receive the three places of honor in the announcement. Do I need to explain how part of the function of any civilized society is to appropriately reward those who contribute to the public good?
I agree with the reasoning of this post, and believe it could be a valuable instrument to advance science.
There does exists scientific forecasting on sites like Manifold market and Hypermind, but those are not monetarily traded as sports betting is.
One problem I see with scientific prediction markets with money, is that it may create poor incentives (as you also discuss in your first foot note).
For example, if a group of scientists are convinced hypothesis A is true, and bet on it in a prediction market, they may publish biased papers supporting their hypothesis.
However, this doesn't seem to be a big problem in other betting markets, so with the right design I don't expect the negative effects to be too big.
Futuur is a prediction market with play money and real money options (each option has a different probability of some event occurring). An interesting market launched was which option would be more accurate, and real money won even with fewer bettors.
I believe markets with play money are more likely to be biased as no money is involved, with less skin in the predictions.
In my opinion, the applications of prediction markets are much more general than these. I have a bunch of AI safety inspired markets up on Manifold and Metaculus. I'd say the main purpose of these markets is to direct future research and study. I'd phrase this use of markets as "A sub-field prioritization tool". The hope is that markets would help me integrate information such as (1) methodology's scalability e.g. in terms of data, compute, generalizability (2) research directions' rate of progress (3) diffusion of a given research direction through the rest of academia, and applications.
Here are a few more markets to give a sense of what other AI research-related markets are out there: Google Chatbot, $100M open-source model, retrieval in gpt-4
From the title I was expecting the proposal to go differently.
How about a scheme were original results / papers run on the existing incentives but in order to get more replications we fund a prediction market about it? The question of what the thing we are checking for would be clearer, it is what the orginal paper laid out. If the original paper doesn't lay out how it would be replicated that is noteworthy in itself. Negative result that does not replicate could be as useful as a positive result that it does replicate.
If you had an effect that sometimes replicates and sometimes not, betting on slightly different replication attempts would draw the attention on which conditions/variables the thing is dependent upon. Also "bought research" that "replicates too blindly" could be of interest to fish out. Oh no, the oil company funded climate paper means that question has become more open and "more research is needed". Nobody can guess which papers of the dialogue will be left standing, guess we have to hear both sides.
You use the analogy with sports betting multiple time in this post. But science and sports are disanalogical in almost all the relevant ways!
Notably, sports are incredibly limited and well-defined, with explicit rules that literally anyone can learn, quick feedback signals, and unambiguous results. Completely the opposite of science!
The only way I see for the analogy to hold is by defining "science" in a completely impoverished way, that puts aside most of what science actually looks like. For example, replication is not that big a part of acience, it's just the visible "clean" one. And even then, I expect the clarification of replication issues and of the original meaning to be tricky.
So my reaction to this proposal, like my reaction to any prediction market for things other than sports and games, is that I expect it to be completely irrelevant to the progress of knowledge because of the weakness of such tools. But I would definitely be curious of attempts to explicitly address all the ambiguities of epistemology and science through betting mechanisms. Maybe you know of some posts/works on that?
The only way I see for the analogy to hold is by defining "science" in a completely impoverished way, that puts aside most of what science actually looks like.
I mean, the hope is mostly to replace the way that scientists communicate / what constitutes a "paper" / how grantmaking sometimes works. I agree that most of "science" happens someplace else!
Like, I think for the typical prediction market, people with questions are dramatically underestimating the difficulty in operationalizing things such that they can reliably resolve it as 'yes' or 'no'. But most scientists writing papers are already running into and resolving that difficulty in a way that you can easily retool.
For a replication study to get defined well enough to place a bet on the outcome, the scientific community must have decided it’s a sufficiently worthwhile study to do. And at that juncture, it’s hard to see what value in a prediction market on the outcome. Do you have thoughts on whether this is actually a problem for scientific prediction markets, and if so, what could be done about it?