Here's a toy example which should make it clearer that the probability assigned to the true state is not the only relevant update.
Let's say that a seeker is searching for something, and doesn't know whether it is in the north, east, south, or west. If the object is in the north, then it is best for the seeker to go towards it (north), worst for the seeker to go directly away from it (south), and intermediate for them to go perpendicular to it (east or west). The seeker meets a witness who knows where the thing is. The majority (2/3) of witnesses want to help the seeker find it and the rest (1/3) want to hinder the seeker's search. And they have common knowledge of all of this.
In this case, the witness can essentially just direct the seeker's search - if the witness says "it's north" then the seeker goes north, since 2/3 of witnesses are honest. So if it's north and the witness wants to hinder the seeker, they can just say "it's south". This seems clearly deceptive - it's hindering the seeker's search as much as possible by messing up their beliefs. But pointing them south does actually lead to a right-direction update on the true state of affairs, with p(north) increasing from 1/4 (the base rate) to 1/3 (the proportion of witnesses who aim to hinder). It's still a successful deception because it increases p(south) from 1/4 to 2/3, and that dominates the seeker's choice.
There are simpler examples where identifying deception seems more straightforward. e.g., If a non-venomous snake takes on the same coloration as a venomous snake, this is intended to increase others' estimates of p(venomous) and reduce their estimates of p(not venomous), which is a straightforward update in the wrong direction.
In the fist attempt at a definition of deceptive signalling, it seems like a mistake to only look at the probability assigned to the true state ("causing the receiver to update its probability distribution to be less accurate (operationalized as the logarithm of the probability it assigns to the true state)"). Actions are based on their full probability distribution, not just the probability assigned to the true state. In the firefly example, P. rey is updating in the right direction on p(predator) (and on p(nothing)), but in the wrong direction on p(mate). And their upward update on p(mate) seems to be what's driving the predator's choice of signal. Some signs of this:
The predator mimicked the signal that the mates were using, when it could have caused a larger correct update to p(predator) and reversed the incorrect update to p(mate) by choosing any other signal. Also, P. redator chose the option that maximized the prey's chances of approaching it, and the prey avoids locations when p(predator) is sufficiently high. If we model the prey as acting according to a utility function, the signal caused the prey to update its expected utility estimate in the wrong direction by causing it to update one of its probabilities in the wrong direction (equivalently: the prey updated the weighted average of its probabilities in the wrong direction, where the weights are based on the relevant utilities). We could also imagine hypothetical scenarios, like if the predator was magically capable of directly altering the prey's probability estimates rather than being limited to changing its own behavior and allowing the prey to update.
I think part of the story is that language is compositional. If someone utters the words "maroon hexagon", you can make a large update in favor of a specific hypothesis even if you haven't previously seen a maroon hexagon, or heard those words together, or judged there to be anything special about that hypothesis. "Maroon" has been sufficiently linked to a specific narrow range of colors, and "hexagon" to a specific shape, so you get to put those inferences together without needing additional coordination with the speaker.
This seems related to the denotation/connotation distinction, where compositional inferences are (typically?) denotations. Although the distinction seems kind of fuzzy, as it seems that connotations can (gradually?) become denotations over time, e.g. "goodbye" to mean that a departure is imminent, or an image of a red octagon to mean "stop" (although I'd say that the words "red octagon" still only have the connotation of "stop"). And "We should get together more often" is interesting because the inferences you can draw from it aren't that related to the inferences you typically draw from the phrases "get together" and "more often".
Hello! I'm wondering if I can translate your book into Russian?I'm not going to monetize it, and of course I will give the credits.
Hello! I'm wondering if I can translate your book into Russian?
I'm not going to monetize it, and of course I will give the credits.
Yes, you can translate it. Just make it clear that the original content in English is from CFAR, and the translation into Russian is something that you've done independently.
-Dan from CFAR
There's a similar challenge in sports with evaluating athletes' performance. Some pieces of what happens there:
There are many different metrics to summarize/evaluate a player's performance rather than just one score (e.g., see all the tables here). Many are designed to evaluate a particular aspect of a player's performance rather than how well they did overall, and there are also multiple attempts to create a single comprehensive overall rating. Over the past decade or two there have been a bunch of improvements in this, with more and better metrics, including metrics that incorporate different sources of information.
There common features of different stats that people who follow the analytics are aware of, such as whether they're volume stats (number of successes) or efficiency stats (number of successes per attempt). Some metrics attempt to adjust for factors that aren't under the player's control which can influence the numbers, such as the quality of the opponent, the quality of the player's teammates, the environment of the game (weather & stadium), various sources of randomness, and whether the play happened in "garbage time" (when the game was already basically decided).
Payment is based on negotiations with the people who benefit from the player's performance (their team's owners) rather than being directly dependent on their stats. Their stats do play into the decision, as do other thing such as close examinations of particular plays that they made.
The awards for individual performance that people care about the most (e.g., All-Star teams, MVP awards, Hall of Fame) are based on voting (by various pools of voters) rather than being directly based on the statistics. Though again, they're influenced by the statistics and tend to line up pretty closely with the statistics.
The achievements that people care about the most (e.g., winning games & championships) are team achievements rather than individual achievements. In a typical league there might be 30 teams which each have 20 players, and there's a mix of competitiveness between teams and cooperativeness within a team.
Seems like forecasting might benefit from copying some parts of this. For example, instead of having one leaderboard with an overall forecasting score, have several leaderboards for different ways of evaluating forecasts, along with tables where you can compare forecasters on a bunch of them at once and a page for each forecaster where you can see all their metrics, how they rank, and maybe some other stuff like links to their most upvoted comments.
AI Camera Ruins Soccer Game For Fans After Mistaking Referee's Bald Head For Ball
It looks like this includes the fees you pay to PredictIt, but not the taxes you pay to the government.
Do you have an estimate of expected profit per $100 bet (for a few of the most plausible scenarios)?
My impression is that PredictIt is +EV is you make lots of not-too-correlated bets so that your losses can offset your wins (though maybe not by enough to be worth the time & effort), but it's generally -EV (or at best barely +EV) if you deposit to make a one-off bet where you have to pay fees & taxes on your winnings (and don't get any tax benefit from your losses).
Related: Limits of Current US Prediction Markets (PredictIt Case Study)
Here's a doc, since I haven't figured out how to do spoilers.
One way to look at this is to pick questions where you're really sure that the two versions of the question should have different answers. For example, questions where the answer is a probability rather than a subjective value. One study some years ago asked some people for the probability that Assad's regime would fall in the next 3 months, and others for the probability that Assad's regime would fall in the next 6 months. As described in the book Superforecasting, non-superforecasters gave essentially identical answers to these two questions (40% and 41%, respectively). So it seems like they were making some sort of error by not taking into account the size of the duration. (Superforecasters gave different answers, 15% and 24%, which did take the duration into account pretty well.)