Kevin Dorst

See the discussion in §6 of the paper. There are too many variations to run, but it at least shows that the result doesn't depend on knowing the long-run frequency is 50%; if we're uncertain about both the long-run hit rate and about the degree of shiftiness (or whether it's shifty at all), the results still hold.

Does that help?

Mathematica notebook is here! Link in the full paper.

How did you define Switchy and Sticky? It needs to be >= 2-steps, i.e. the following matrices won't exhibit the effect. So it won't appear if they are eg

Switchy = (0.4, 0.6; 0.6, 0.4)

Sticky = (0.6,0.4; 0.4,0.6)

But it WILL appear if they build up to (say) 60%-shiftiness over two steps. Eg:

Switchy = (0.4, 0 ,0.6, 0; 0.45, 0, 0.55, 0; 0, 0.55, 0, 0.45, 0, 0.6, 0, 0.4)

Sticky = (0.6, 0 ,0.4, 0; 0.55, 0, 0.45, 0; 0, 0.45, 0, 0.55, 0, 0.4, 0, 0.6)

Would it have helped if I added the attached paragraphs (in the paper, page 3, cut for brevity)?

Frame the conclusion as a disjunction: "either we construe 'gambler's fallacy' narrowly (as by definition irrational) or broadly (as used in the blog post, for expecting switches). If the former, we have little evidence that real people commit the gambler's fallacy. If the latter, then the gambler's fallacy is not a fallacy."

I see the point, though I don't see why we should be too worried about the semantics here. As someone mentioned below, I think the "gambler's fallacy" is a folk term for a pattern of beliefs, and the claim is that Bayesians (with reasonable priors) exhibit the same pattern of beliefs. Some relevant discussion in the full paper (p. 3), which I (perhaps misguidedly) cut for the sake of brevity:

Good question. It's hard to tell exactly, but there's lots of evidence that the rise in "affective polarization" (dislike of the other side) is linked to "partisan sorting" (or "ideological sorting")—the fact that people within political parties increasingly agree on more and more things, and also socially interact with each other more. Lilliana Mason has some good work on this (and Ezra Klein got a lot of his opinions in his book on this from her).

This paper raises some doubts about the link between the two, though. It's hard to know!

I think it depends a bit on what we mean by "rational". But it's standard to define as "doing the best you CAN, to get to the truth (or, in the case of practical rationality, to get what you want)". We want to put the "can" proviso in there so that we don't say people are irrational for failing to be omniscient. But once we put it in there, things like resource-constraints look a lot like constraints on what you CAN do, and therefore make less-ideal performance rational.

That's controversial, of course, but I do think there's a case to be made that (at least some) "resource-rational" theories ARE ones on which people are being rational.

Interesting! A middle-ground hypothesis is that people are just as (un)reasonable as they've always been, but the internet has given people greater exposure to those who disagree with them.

Nice point! I think I'd say where the critique bites is in the assumption that you're trying to maximize the expectation of q_i. We could care about the variance as well, but once we start listing the things we care about—chance of publishing many papers, chance of going into academia, etc—then it looks like we can rephrase it as a more-complicated expectation-maximizing problem. Let U be the utility function capturing the balance of these other desired traits; it seems like the selectors might just try to maximize E(U_i).

Of course, that's abstract enough that it's a bit hard to say what it'll look like. But whenever is an expectation-maximizing game the same dynamics will apply: those with more uncertain signals will stay closer to your prior estimates. So I think the same dynamics might emerge. But I'm not totally sure (and it'll no doubt depend on how exactly we incorporate the other parameters), so your point is well-taken! Will think about this. Thanks!

Very nice point! We had definitely thought about the fact that when slots are large and candidates are few, that would give people from less prestigious/legible backgrounds an advantage. (We were speculating idly whether we could come up with uncontroversial examples...)

But I don't think we'd thought about the point that people might intentionally manipulate how legible their application is. That's a very nice point! I'm wondering a bit how to model it. Obviously if the Bayesian selectors know that they're doing this and exactly how, they'll try to price it in ("this is illegible" is evidence that it's from a less-qualified candidate). But I can't really see how those dynamics play out yet. Will have to think more about it. Thanks!

Hm, I'm not following your definitions of P and Q. Note that there's no (that I know of) easy closed-form expression for the likelihoods of various sequences for these chains; I had to calculate them using dynamic programming on the Markov chains.

The relevant effect driving it is that the degree of shiftiness (how far it deviates from 50%-heads rate) builds up over a streak, so although in any given case where Switchy and Sticky deviate (say there's a streak of 2, and Switchy has a 30% of continuing while Sticky has a 70% chance), they have the same degree of divergence, Switchy makes it less likely that you'll run into these long streaks of divergences while Sticky makes it extremely likely. Neither Switchy nor Sticky gives a constant rate of switching; it depends on the streak length. (Compare a hypergeometric distribution.)

Take a look at §4 of the paper and the "Limited data (full sequence): asymmetric closeness and convergence" section of the Mathematica Notebook linked from the paper to see how I calculated their KL divergences. Let me know what you think!