[Added April 26: This post has received a fair amount of mixed-to-negative comments and feedback. I'm still not entirely convinced that this isn't a good test, but I retract my request not to comment on the object-level question here or elsewhere.]


This post describes an exercise that tests some aspects of your rationality skills. In particular, it tests for the ability to find and sift through evidence about a complex real-world event, reason probabalistically, draw conclusions, and communicate your reasoning process in writing.

If left unspoiled, it could potentially be used as part of a job application process for a job where these skills are important, such as a research role at an effective altruism or AI alignment organization.

In other words, this is a proposal for a verification method for rationality skills at the organizational level.

The exercise is similar to the Amanda Knox test from over a decade ago. A conclusion and post-mortem from that test are good background reading, though I believe this one is somewhat more difficult and more time-consuming to give a good answer for.

The exercise

In short, figure out what happened during a controversial high-stakes poker game live-streamed in September 2022. In particular, come up with a probability estimate that Garrett Adelstein or other players were cheated, and if so, how and by whom.

Write up your reasoning and your research process, discussing how you weighted different pieces of evidence and why.

More background, from a report on the incident by a consulting firm hired by the casino where the incident occurred:

On September 29, 2022, one of the most controversial hands in poker history was played on an episode of “Hustler Casino Live,” which streams high-stakes poker games played at Hustler Casino in Gardena, Calif., to a worldwide audience on its YouTube Channel. The hand has subsequently been viewed by hundreds of thousands, if not millions, of poker fans around the globe. In the hand, recreational poker player Robbi Jade Lew called an all-in shove by Garrett Adelstein for her remaining $109,000 with the jack of clubs and four of hearts on a board reading ThTc9c3h. Lew ended up winning a pot of $269,000 when her jack high held against Adelstein’s eight of clubs and seven of clubs. After the hand, Adelstein said he was suspicious of Lew’s play and she agreed to return half the pot to him. In the weeks that have followed, Lew has repeatedly denied wrongdoing and requested that Adelstein return her winnings. In October, Adelstein published a series of allegations on the Two Plus Two poker forum that he said proved Lew “was very likely part of a cheating ring of at least three members,” which he said likely included Lew, a friend of hers who was playing that night and an employee of the company that produces the show. Adelstein and others in the poker community said that there was no logical reason why Lew would call a bet of that size without inside knowledge that she had the better hand. Her hand would lose to any pair as well as many potential bluffs, including ace-high, king-high, queen-high or better jack-high hands. In fact, Adelstein’s exact hand was one of the few combinations of hands that he could have had that she would beat. Lew offered several explanations for her call on Sept. 29 and in the weeks that followed; most of her explanations centered on the fact that she thought Adelstein was bluffing, or as she said on Sept. 29, “You don’t have shit.”

(Full report here, which goes into great detail about the technical aspects of the casino and livestream security system. There may be better introductions to the overall incident elsewhere online. )

Why this is a good test

There are piles of footage, interviews, commentary and other kinds of evidence from many different sources, including those directly involved. It's not obvious what actually happened or how to weigh up all the evidence, and I believe there is no real consensus in the poker world on matter.

Understanding all the background information about the incident requires a bit of familiarity with poker and the high-stakes scene, but it's probably not too hard to get up to speed, and the ability to quickly learn about an unfamiliar domain is also an important rationality skill.

Many in the poker world who followed the incident closely at the time hold strong and differing views on the matter. I believe the "official" conclusion was that there is insufficient evidence to accuse Robbi of cheating, and she is still invited to play in various high-stakes games.

Plenty of people have posted analyses and their own conclusions about what they think happened. Poker players tend to be more rational and think in terms of probabilities more than the average person, but many of the popular analyses I've seen have been lacking in basic rationality skills and applications of Bayes theorem.

Relative to the Amanda Knox case, this seems less politically charged and lower-stakes; probably no one is going to jail over the matter at this point no matter what, and the poker world has mostly moved on. Speculating about this in private or in rationality spaces is unlikely to have any negative reputational consequences for the people involved or for LessWrong or EA, though if someone does do a good investigation and posts a conclusion here publicly, it might attract unwanted attention from the high-stakes poker community or others who followed this incident closely.

Digging into this for its own sake might be interesting, but seems pretty low-value to me; I think this question is most useful as a test or exercise, and there might be rationality organizations that would be interested in using it as an actual test or as part of a job application process. To preserve the value of the question as a rationality test, please refrain from posting your own object-level thoughts or conclusions about the incident in the comments here or elsewhere publicly, at least for the time being. [edit: retracted, see above.] Linking to existing publicly-available analyses by non-rationalists is OK.

Comments on whether you think the exercise is useful, how it could be improved, resources for getting up to speed on the incident or general background information, and other ideas for similar exercises are all very welcome.

New to LessWrong?

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 3:19 AM

I think you're underestimating the effort required to understand this scenario for someone who doesn't already follow poker. I am a lifelong player of trick-taking games (casually, at the kitchen table with family members), but I've never played poker, and here's how the play description reads to me:

called an all-in shove

Only a vague idea of what this means, based on the everyday idiom of being "all-in".

with the jack of clubs and four of hearts on a board

Don't know what it means for these to be "on a board".

reading ThTc9c3h

Gibberish.

her jack high held against Adelstein’s eight of clubs and seven of clubs

Only vaguely comprehensible. I don't know poker's hand-scoring rules.

Additional details that are necessary to interpret the situation: is the deck continually shuffled, or are multiple hands played off of the same shuffle? (Implicitly: are there card-counting strategies that provide relevant information?) What are the point rules / rank of hands? How does suit interact with card rank? Is there a concept of trump? What was the sequence of bets leading up to the play in question? How typical is this behavior in high-level play? How high-level are these people? Robbi is called a "recreational" player -- does this mean "top-level amateur" or "low-level pro", or something else?

In the absence of these details, all I really get is "Robbi made a risky play off a mediocre hand, and won big". And yes, this is bayesian evidence in favor of cheating, but how strong the evidence is depends heavily on all of the unknown details mentioned above. At the same time, the fact that no one identified the means by which the cheating occurred despite heavy scrutiny is bayesian evidence against cheating.

My operational decision would be that this is enough evidence to subject Robbi to heightened scrutiny in future tournaments, but not enough to ban her or claw back her winnings. This is a good test, but maybe not as good as you think it is, due to the amount of uncommon background knowledge required.

(Also, FYI for others: this comment is close to violating my bolded request not to post object-level conclusions or speculations publicly. I'll let this one slide since it's mostly just an initial reaction, but I may ask that similar comments be deleted.)

I may be underestimating the background knowledge and effort required, yes. Understanding the rules of poker and Texas Hold'em in particular is pretty essential for this exercise, so it might be worth writing a longer introduction and explanation that provides some of the required background knowledge.

Though, this is the kind of thing I expect GPT to be a great help with, and so for those unfamiliar with poker, this is also a good test of a different set of skills: using AI tools to get up to speed quickly in an unfamiliar domain.

Here's what GPT-4 said in response to your comment:

Let me try to explain the key terms and concepts in the description of the hand.

"Called an all-in shove" - In poker, a player can bet all their remaining chips (money) in a single move. This is known as going "all-in." When a player calls an all-in shove, they match the bet and put their own chips at risk, deciding to continue playing with their current hand.

"On a board" - In Texas Hold'em poker, a variant played in the described scenario, there are five community cards that are dealt face-up in the middle of the table. These cards are shared by all players and are referred to as the "board." Players combine their two private cards (hole cards) with the community cards to form the best possible hand.

"Reading ThTc9c3h" - This is the composition of the board at the time of the described hand. It indicates that the community cards are the Ten of hearts (Th), Ten of clubs (Tc), Nine of clubs (9c), and Three of hearts (3h).

"Her jack high held against Adelstein’s eight of clubs and seven of clubs" - In poker, the objective is to have the best hand or convince others to fold. In this case, Lew's best hand was a single high card, the Jack, which is better than Adelstein's best hand, which had an Eight as the high card. The term "held" means that Lew's hand remained the best hand after all the community cards were dealt.

As for your other questions:

  • In poker, the deck is shuffled before each hand, so card-counting strategies are not relevant.
  • Poker hands are ranked based on the combination of cards, from high card (weakest) to royal flush (strongest). There is no concept of trump, and suits do not affect hand ranking.
  • The sequence of bets leading up to the play in question and the players' levels were not provided in the original description, but the focus is on the suspicious nature of Lew's call given her weak hand.
  • Robbi being called a "recreational" player usually means she is not a professional poker player and plays casually or for fun.

In summary, the controversy revolves around Robbi making a risky call with a weak hand and winning a large amount of money. This incident raises suspicions of cheating, but without concrete evidence, it's difficult to confirm. As you mentioned, heightened scrutiny for Robbi in future tournaments would be a reasonable course of action, but without more information or evidence, it would be unfair to ban her or take back her winnings.

I think it's pretty good! If anyone wants to learn more, I suggest pasting the description (or other, longer descriptions available online) into ChatGPT and querying interactively. Note, I used GPT-4 for the version above, not sure how well the free version does on something like this. Bing might do really well with this, since it can query external / up-to-date info on the web.

I think this is a bad excercise or test of rationality skill. First, it's massively time-consuming, as a LOT has been written about it.  Second (though perhaps more important), there's no reasonable scoring rubric (so not good as a test), and no feedback loop to improve on (so not good as an excercise).

I have, in fact, followed the topic - I used to play poker at semi-professional levels (played in big games and cashed in many small and medium tourneys, net positive over many years, never actually devoted the energy to make it a big part of my income), and still have close friends in the biz (organizers, authors, and players).  There is a consensus among those I know well enough to have a positive opinion on their honesty and epistemology, but it's complex enough that it's not a very good topic for abstract rationality practice.

More standard prediction contests would seem strictly superior for testing and practice.  Pick some metaculus medium-term predictions, make individual bets, then discuss reasoning and make new bets.  Practice crux-finding and input metrics you can use to resolve actual work disagreements.  

I used to be quite an active and profitable trader on PredictIt. I've also looked into this incident a bit myself. I think the rationality skills needed to do well in prediction contests are important, but different, than the kind needed to investigate a question like this, the Amanda Knox case, or the Sabatini incident.

I have opinions on the object-level here, but I concur that this is probably more of a test of "how familiar are you with what is and is not normal in a high-stakes cash game" and also "how familiar are you with the specific math" than of more general rationality.

(I haven't read the post yet) The mention of the Knox posts made me think of this comment chain about the slowly-growing number of similar posts on LW: https://www.lesswrong.com/posts/YTJp5WBcktBimdxBG/staying-split-sabatini-and-social-justice?commentId=xctop8E3zpuCFjj4p

I don't know if it's worth adding in to your post anywhere, but here it is if you would like it.

Seems difficult to mark answers to this question.

The type of replies you get, and the skills you are testing, would also depend how long the subject is spending on the test. Did you have a particular time limit in mind?

I think timeboxing it to 3 hours or so would be a good standard; maybe a bit more if you're totally unfamiliar with poker.

I don't think judging responses would be particularly difficult; even if we don't know what actually happened for certain, you can still judge whether someone used valid rules of inference to reach a plausible estimate. (Judging well requires rationality skills too, of course - rationalists should be more easily convinced of true propositions than false ones, and be able to distinguish invalid reasoning from valid reasoning.)

Also, I suspect that most strong rationalists would independently converge to the same probability estimate for approximately the same reasons, if they looked into the matter, which could serve as a baseline.

Also, I suspect that most strong rationalists would independently converge to the same probability estimate for approximately the same reasons, if they looked into the matter, which could serve as a baseline.

ROFL. The very setup of the post (it's controversial and there's no consensus, even among professionals who've spent a lot more than a few hours looking into it) contradicts this.  There's also a bunch of private information and priors (such as "what is the base rate of cheating in high-stakes poker" and "what side payments had been made among participants and crew") that are very hard to validate.  Even if there were a reasonable base-rate, the question of whether this KIND of cheating (alleged access to hold-card camera feed) is comparable to other kinds (soft-playing or signaling a compatriot, acting out of turn for information, more mild angle-shooting).

The fact that it is controversial among non-rationalists does not mean that it would be similarly controversial among (strong) rationalists.

This is probably not worth their time and too expensive to test, but concretely, I predict: if Duncan Sabien, gwern, Zvi (or other people in this general reference class, or people who did well on the Amanda Knox test for the right reasons, etc.) each spent some hours looking into this, I suspect they would reach mostly the same conclusions for mostly the same reasons independently.