Thanks to Ozzie Gooen for reviewing this post.
The Parable of the Predict-O-Matic is a short story which considers a forecasting system which is ostensibly set-up to maximize accuracy, and which ends up interfering with the world in ways not intended. In the original story, some of these problems were:
- Fixed point problems / self-fulfilling prophecies
“Its answers will shape events. If it says stocks will rise, they'll rise. If it says stocks will fall, then fall they will. Many people will vote based on its predictions.”
- Nudge towards legibility and predictability
“You keep thinking of the line from Orwell's 1984 about the boot stamping on the human face forever, except it isn't because of politics, or spite, or some ugly feature of human nature, it's because a boot stamping on a face forever is a nice reliable outcome which minimizes prediction error.”
- Markets for entropy, which can be thought of as the opposite of the previous problem. In a prediction market, a market participant who could actively change an outcome has an incentive to first make a big bet for an unlikely outcome, and then actively make it come to pass.
“Suppose you have a prediction market that's working well. It makes good forecasts, and has enough money in it that people want to participate if they know significant information. Anything you can do to shake things up, you've got a big incentive to do. Assassination is just one example. You could flood the streets with jelly beans. If you run a large company, you could make bad decisions and run it into the ground, while betting against it -- that's basically why we need rules against insider trading, even though we'd like the market to reflect insider information."
- Unwanted agency (as opposed to tool AI behavior)
“You understand what you are. It isn't quite right to say you are the Predict-O-Matic. You are a large cluster of connections which thinks strategically. You generate useful information, and therefore, the learning algorithm keeps you around. You create some inaccuracies when you manipulate the outputs for any purpose other than predictive accuracy, but this is more than compensated for by the value which you provide.”
Below, I give some real-life examples of these problems, though some are speculative.
- Specification Gaming, e.g. Faulty Reward Functions in the Wild, Specification gaming: the flip side of AI ingenuity, Specification gaming examples in AI, etc.
- More generally, Categorizing Variants of Goodhart's Law. Examples below will often belong to the “Causal Goodhart” category.
- Incentive Problems in Current Forecasting Competitions and Limits of Current US Prediction Markets (PredictIt Case Study).
Fake polls by PredictIt forecasters
Example of: Markets for entropy.
PredictIt traders created fake polls to fool and troll other forecasters and the media, per FiveThirtyEight’s Fake Polls Are A Real Problem. Quoting liberally from the article:
Delphi Analytica released a poll fielded from July 14 to July 18. Republican Kid Rock earned 30 percent to Sen. Debbie Stabenow’s 26 percent. A sitting U.S. senator was losing to a man who sang the lyric, “If I was president of the good ol’ USA, you know I’d turn our churches into strip clubs and watch the whole world pray.”
the poll was quickly spread around the political sections of the internet. [...] There was just one problem: Nobody knew if the poll was real. Delphi Analytica’s website came online July 6, mere weeks before the Kid Rock poll was supposedly conducted. The pollster had basically no fingerprint on the web.
...some PredictIt users started gathering in a chat room on Discord, a voice and text application often used by gamers, to talk politics and betting. McDonald shared screenshots from that chat room, where a person going by the screen name “Autismo Jones,” who claimed to have started Delphi Analytica, bragged about the publicity the Kid Rock poll was receiving. Jones, apparently reacting to an email I had sent to Delphi, wrote, “we dont [sic] need Harry Enten. we got governors tweeting out our polls. we are already famous.”
McDonald believes that “Jones” and whoever may have helped him or her did so for two reasons. The first: to gain notoriety and troll the press and political observers. (The message above seems to support that theory.) The second: to move the betting markets. That is, a person can put out a poll and get people to place bets in response to it — in this case, some people may have bet on a Kid Rock win — and the poll’s creators can short that position (bet that the value of the position will go down). In a statement, Lee said Delphi Analytica was not created to move the markets. Still, shares of the stock for Michigan’s 2018 Senate race saw their biggest action of the year by far the day after Delphi Analytica published its survey.
The price for one share — which is equivalent to a bet that Stabenow will be re-elected — fell from 78 cents to as low as 63 cents before finishing the day at 70 cents. (The value of a share on PredictIt is capped at $1.) McDonald argued that the market motivations were likely secondary to the trolling factor, but the mere fact that the markets can be so easily manipulated is worrisome.
In this case, Delphi Analytica’s claims may have made Kid Rock more seriously consider entering the Michigan Senate race. He retweeted the results, after all. And while the singer has not made any official moves toward running for Senate, such as filing a statement of candidacy, it wasn’t too long after Delphi Analytica published its poll that Kid Rock said he’d take a “hard look” at a Senate bid and that former New York Gov. George Pataki endorsed him.
The paper Fake Polls, Real Consequences: The Rise of Fake Polls and the Case for Criminal Liability contains many more examples in pages 140 to 150 (13 to 23 of the linked pdf):
a PredictIt user seeking to purchase a futures contract on the outcome of the Republican primary in Alabama’s 2017 special U.S. Senate election who comes across a poll predicting a result of that exact election, allegedly conducted by CSP Polling, might reasonably consider that poll in their purchasing decision – even if they do not know that CSP lacks a track record or any indicia of reliability. And given the speed with which PredictIt users buy and sell contracts, a user seeing this information might reasonably conclude that if she is to use this information to her benefit, she needs to act quickly.
CSP Polling – which, according to University of Florida political science professor Michael McDonald and Jeff Blehar of the National Review, stands for “Cuck Shed Polling” – alleged that it conducted polls in the 2017 special congressional election in Montana, the special congressional election in Georgia, and the Virginia Democratic primary for Governor. Even after being identified in FiveThirtyEight as a fake pollster, CSP Polling continued to release polls, though the seriousness of the poll “releases” noticeably deteriorated in the year that followed.
Example of: Self-fulfilling prophecies, markets for entropy.
This example was mentioned in the original Predict-O-Matic story: "If it says stocks will rise, they'll rise." One sometimes sees this effect with companies Warren Buffet is rumored to be buying.
Additionally, hedge funds normally try to predict which companies will do better, but companies such as Third Point Management also exist:
New York magazine noted that Loeb's "preferred strategy" is to buy into troubled companies, replace inefficient management, and return the companies to profitability, which "is the key to his success." (source)
Further, rules against insider trading exist in order to avoid markets for entropy; otherwise a CEO of a company could profit by shorting its stock and running the company to the ground. More narratively satisfying, in Casino Royale the villain buys put options on an experimental aerospace manufacturer, betting on the company's failure and then organizing a terrorist attack on their only experimental plane.
Outside the realm of fiction:
In July 2003, the U.S. Department of Defense publicized a Policy Analysis Market on their website, and speculated that additional topics for markets might include terrorist attacks. A critical backlash quickly denounced the program as a "terrorism futures market" and the Pentagon hastily canceled the program. (source, source)
Example of: Fixed-point problems
Plausibly, in the 2016 election, overconfident win predictions for Hillary Clinton led to lower turnout, which led to her loss. Note that Trump got around 63M votes in 2016, and around 74M in 2020, whereas Democrats got 66M and 81M respectively.
This paper (available on sci-hub) makes a similar point (note in particular Figure 3, with two fixed points):
We see that the only way in which the pollster can arrive at a prediction that will coincide with the election result is by privately adjusting his poll results (which we assume for the moment to be an accurate estimate of I) for the effect that their publication will have upon the voters' behavior. But is even this possible? If he makes such an adjustment, will not the adjustment itself alter the effect of the prediction and again lead to its own falsification? Is there not involved here a vicious circle, where-by any attempt to anticipate the reactions of the voters alters those reactions and hence invalidates the prediction?
It can be seen from the figure (and can be shown rigorously by another application of the fixed-point theorem) that there always exists at least one prediction, P1, with the following two properties: (a) the prediction, if published, will be confirmed, and (b) publication of the prediction will not change the outcome of the election (i.e., P1>50% only if I>50%). However, examination of the figure will show that there may also exist other values of P possessing the first property but not the second. If one of these latter predictions is published, it will be confirmed by the election result, but the candidate who would have won if no prediction had been published will be defeated.
This NYT article makes a similar point:
There’s an even more fundamental point to consider about election forecasts and how they differ from weather forecasting. If I read that there is a 20 percent chance of rain and do not take an umbrella, the odds of rain coming down don’t change. Electoral modeling, by contrast, actively affects the way people behave.
In 2016, for example, a letter from the F.B.I. director James Comey telling Congress he had reopened an investigation into Mrs. Clinton’s emails shook up the dynamics of the race with just days left in the campaign. Mr. Comey later acknowledged that his assumption that Mrs. Clinton was going to win was a factor in his decision to send the letter.
Similarly, did Facebook, battered by conservatives before the 2016 election, take a hands-off approach to the proliferation of misinformation on its platform, thinking that Mrs. Clinton’s odds were so favorable that such misinformation made little difference? Did the Obama administration hold off on making public all it knew about Russian meddling, thinking it was better to wait until after Mrs. Clinton’s assumed win, as has been reported?
Ebola forecast may have run into fixed-point problem
Example of: Fixed point problems.
A fatalistic Ebola forecast may have played a role in Ebola having been contained early.
One forecast that gained particular attention during the epidemic was published in the summer of 2014, projecting that by early 2015 there might be 1.4 million cases. This number was based on unmitigated growth in the absence of further intervention and proved a gross overestimate, yet it was later highlighted as a “call to arms” that served to trigger the international response that helped avoid the worst-case scenario.
ReplicationMarkets participants may have tried to cheat Keynesian beauty contest.
Example of: Markets for entropy.
ReplicationMarkets is an experiment to see if the replication of papers can be predicted. They run contests, structured with a survey round, in which participants make predictions alone, followed by a market round, in which participants trade contracts in a market.
Some of the papers are then chosen for replication, and the contracts resolve, giving some payouts to the participants. But this happens far in the future, and in the meantime, participants are also paid according to their predictions during the survey round. I suspect some participants coordinated to exploit this mechanism, coordinating to predict something unlikely during the survey round:
Yes, the survey round is potentially a Keynesian beauty contest, though it takes some doing. You're not forecasting the market round. You're forecasting the best estimate we can make using peer prediction on the independent surveys. Harvard's peer prediction algorithm has done well in previous tests, and in theory takes a lot of coordination to defeat.
We got to test that a bit in Round 8 when we discovered a coordinated "attack" that accounted for ~1/3 of our surveys. Some forecasts would have changed, prizes would have been won, but neither so much as we feared.
Source: Speculation, ReplicationMarkets newsletter, this comment.
Superforecasters learning to choose easier questions
Example of: Other.
Tetlock explicitly mentions this in one of his Ten Commandments for Superforecasters: "Focus on questions where your hard work is likely to pay off," so Superforecasters learn to not forecast on the more intractable questions.
Surnames as a mechanism of control and taxation
Example of: Nudge towards legibility and predictability.
The introduction of surnames facilitated identification, taxation and statistical aggregation, and was often resisted by the local population. In this example, the prediction problem is usually “how much can the authorities tax or conscript?,” and the interference is forcing or incentivizing locals to adopt unambiguous name-surname combinations.
One can see an example of this need in this scene from The Wire (the big guy is ironically called "Little Kevin", and the police can't identify him.)
Source: The Production of Legal Identities Proper to States: The Case of the Permanent Family Surname (available on sci-hub):
The fixing of personal names, and, in particular, permanent patronyms, as legal identities seems, everywhere, to have been, broadly-speaking, a state project. As an early and imperfect legal identification, the permanent patronym was linked to such vital administrative functions as tithe and tax collection, property registers, conscription lists, and census rolls.
In many cultures, an individual's name will change from context to context and, within the same context, over time. It is not uncommon for a newborn to have had one or more name changes in utero in the event the mother's labor seemed to be going badly. Names often vary at each stage of life (in- fancy, childhood, adulthood, parenthood, old age) and, in some cases, after death. Added to these may be names used for joking, rituals, mourning, nick- names, school names, secret names, names for age-mates or same-sex friends, and names for in-laws.
...locally-kept census rolls have often under-reported the population (to evade taxes, corvée labor, or conscription) and understated both arable land acreage and crop yields.
The modern state-by which we mean a state whose ideology encompasses large-scale plans for the improvement of the population's welfare — requires at least two forms of legibility to be able to achieve its mission. First, it requires the capacity to locate citizens uniquely and unambiguously. Second, it needs standardized information that will allow it to create aggregate statistics about property, income, health, demography, productivity, etc.
Above are some real-life examples of prediction systems problematically interfering with the real world. More examples are welcome! In particular, I’d appreciate more examples of prediction systems making the world more predictable.