Wiki Contributions


D&D.Sci Dungeoncrawling: The Crown of Command Evaluation & Ruleset

Reflections on my attempt:

I’m pleasantly surprised by how well I did in both a general and absolute sense (if you asked me yesterday I would not have put my strategy’s odds of overall success above 20%). Of course, ~half the credit for this victory goes to Measure, whose inferences about the dungeons’ likely populations I was shameless in making use of.

If I’d had more time and energy to spare, I would have looked into how reliably teams which counter all their encounters win, and how character levels affect this outcome; from what I read here, I think that would have been a good next step.

Reflections on the challenge:

  • The problem statement was the most fun-to-read D&D.Sci introduction so far, including (imo) all of my own.
  • I found myself surprisingly uncomfortable playing a villain. (If you don’t get why I’d be bothered by mostly-task-irrelevant skippable flavortext then that makes two of us.)
  • The mechanic of “you have fungible but limited resources to allocate between multiple tasks, all of which have to be completed for a win” turns out to be an extremely good fit for D&D.Sci, and I look forward to using it in one several of my scenarios.
  • The difficulty level was in an uncanny valley between “simple task that’s only hard because Inference and Application are inherently hard” and “arbitrary fractal complexity which rivals that of the real world”; I would have liked this game more if it were significantly harder or easier.
  • I got to play another D&D.Sci scenario! Which I didn’t make! And which probably helped me to get better at making D&D.Sci scenarios!

Regarding future feedback:

If you – I here refer both to the esteemed OP and to anyone else with a complete-but-unreleased D&D.Sci game – want me to proof a future challenge before it’s released to the wider public, dm me and I’d be happy to take a look. I’d also be (reluctantly) willing to give (a small amount of) more general support to people perpetuating my genre (even though this would disqualify me from playing the resulting games, and my advice would mostly be variations on “do the things I did but better”).

D&D.Sci Dungeoncrawling: The Crown of Command

Thank you for making this.

Misc. Insights:

  • An adventuring party has a success chance of ~64%. We need to get three of them to win in a row. This is worrying.
  • It looks like level has almost no impact on chance of success, but there’s a major confounder in that more expensive teams get sent on longer and more arduous journeys: length of a dungeon correlates very strongly with the total price of an expedition, and dungeons with multiple dragons attract a disproportionate number of >level 6 adventurers.
  • Success rates for dungeons with ‘Goblin’ in the name are much lower than average, though so is the average level of the party sent. I think this means the current market is pretty good at pricing in general, but systematically underestimates Goblins.
  • A (very) crude approximation is that the price of an expedition is about 2000gp times the number of encounters to be encountered. Our adventurers have to survive a total of 23 encounters, and we only have 36000gp to play with. This is worrying.
  • Classes seem about evenly distributed, but there’s a bias towards diversity; there are far fewer teams with two or more of a given class than you’d expect if it were random. However, this bias is if anything not strong enough; success rates for parties with four unique classes are much higher than success rates for parties with three. I don’t know to what extent this is because more variety increases the odds that a party will have the right counter to an obstacle, and to what extent class diversity is Inherently Good.
  • Adventuring parties tend to have everyone be about the same level; this tendency is so strong that the sampling bias makes it hard to work out whether it’s a good idea. I guess I’ll trust convention here?
  • Literally all the parties with a gap of >3 between their max and min levels are like that because a high-level Rogue joined a low-level party. I’d suspect that this is Rogues faking being higher-levelled to get more gold, but actually teams like this have an above-average success rate, so I have no idea what’s going on. (Fortunately, I don’t have to, since my strategy makes no use of high-level Rogues.)
  • In general, Clerics are the most useful class, followed by Mages and Fighters.

The actual backbone of my strategy:

  • A dungeon is a marathon, not a series of sprints; probability of success in later stages is affected by how well a party handled earlier ones. This is shown by the fact that literally all parties managed to defeat their first encounter, and only <0.1% fall to their second (despite the fact that either of these can be Dragons!). The practical implication is that handling ‘easy’ encounters smoothly probably matters, since it means the party will be fresh for the real threats.
  • Specific encounters have specific counters. By finding what distinguishes the average party defeated by a thing from the average party that encounters a thing, I can determine what classes best combat which obstacles.
  • Measure has very cleverly inferred what encounters each dungeon is likely to contain, and I’m not shy about copying their homework. (Thank you, Measure.)
  • Different encounters have vastly different failure probabilities. Dragons are the most dangerous, and Goblin Chieftains are also pretty bad. Our parties will probably have to fight both. This is worrying.


(I reserve the right to change all of these if I come up with a better idea or another commenter shares a new and relevant insight.)

For the Lost Temple of Lemarchand, I’ll send a level 2 Rogue to handle the needletraps*, a level 2 Druid to handle the snakepits, and a level 2 Cleric and a level 2 Mage to handle the various undead.

For the Infernal Den of Cheliax, I’ll send a level 5 Fighter to fight the orcs and the dragon, a level 3 Druid to keep everyone safe from the snakepits and wolves so they’re fresh for the boss fight, and a level 3 Ranger and level 3 Mage to help the Fighter with the dragon (dragons are scary!).

For the Goblin Warrens of Khaz-Gorond, I’ll send a level 4 Fighter to handle the goblin chieftain and the boulders, a level 4 Ranger to handle the rank-and-file goblins, a level 3 Cleric to help the Ranger out, and . . . I guess a level 3 Fighter to support the first one? (I hate to have doubles on a team but there’s no other class that does as well against chiefs and boulders.)

*This is the one place I feel confident Measure made a mistake: “Rogues help with needletraps” is the most reliable inference I ran into in my encounter-countering research, so I don’t get why they’d include a Mage and a Fighter but not a Rogue in Adventuring Party #1.


The odds don’t seem great. The odds of all three adventures concluding successfully really don’t seem great. And that’s assuming all my inferences are correct, which they aren’t. I know my character is set on this path, but if I was faced with a prospect like this in real life, there’s no way I’d bet anything I’d be afraid to lose.

How to generate idea/solutions to solve a problem?

Disclaimer: I've never tried any of these things on real problems, they just seem like obvious answers.

The oldschool answer is to use Tarot or the Oblique Strategies, creatively interpret what you draw in the context of your problem, and evaluate whether that's an improvement over what you're currently doing. Personally, I'm more fond of the modern counterpart, Weird Sun Twitter's "Have you tried X?" meme, as incarnated here, here and here.

Creating a truly formidable Art

How does it work?


I think the name may have given the wrong impression. The 'D&D' part of D&D.Sci is mostly the trappings of the genre, not the substance; monsters, wizards and (simulated) dicerolls yes, anything resembling Actual Roleplaying no.

Since you asked . . . from the top, the typical/intended way to Consume my Product is:

  • Download the dataset provided in the introductory post.
  • Investigate the scenario and decide the best course of action, using the dataset, the problem description, and a vague sense of what tricks you think the GM will/won't use.
  • OPTIONAL: Post about any ambiguities in the problem description or apparent errors in the dataset so the GM can clarify/fix them.
  • OPTIONAL: Post your findings and/or call your decision in advance (for bragging rights, this is best done in the ~week between the problem and the solution being posted).
  • OPTIONAL: Update your analysis/answer based on what other people said.
  • Use your solution in the evaluator the corresponding "Evaluation and Ruleset" post links to; see what happens to your character as a result, and what the odds of that outcome were given your choices.
  • Read the ruleset: see how well your deductions matched reality, and how close your strategy was to the optimal one.
  • OPTIONAL: Read the code used to generate the dataset.
  • OPTIONAL: Post about how well your strategy worked, and what you think you’ve learned from the game.
  • OPTIONAL: Post about what you think was good/bad about the scenario and what you’d like to see more of in future ones.
  • OPTIONAL: Make your own scenario. (aphyer has built two so far – both of which are very good – and various other LWers are planning to run games at some point this year)

I'd shied away from RPG style simulated practice because of the difficulty with embodied integration. I find it far too easy to view my character from the outside and solve their situation like a puzzle, rather than experiencing myself as the character who's actually encountering the confusion and psychological states and trying to navigate them from the inside.

Your Honor, I plead guilty to exactly half of this charge. It’s true that - for example - the player in Voyages of the Gray Swan will not be feeling the terror, desperation and confusion of the character they play, because they aren’t actually having the experience of trying to analyze their way out of being eaten by crabmonsters. As such, they won’t be able to test or develop their making-good-decisions-under-pressure mental musculature: this is a weakness of the genre as it currently exists, and I cop to it.

However, I can tell you from experience that players do get to use their pattern-matching, noticing-confusion, admitting-they’re-wrong, and balancing-priors-against-the-evidence skills, because the scenarios are intentionally weird and messy enough that they have to do those things to reach the best answer. I suppose I’d summarize this by saying I think D&D.Sci players get to practice being rational, but not being not-irrational?

(I once tried to give players a chance to use their not-irrationality skills by writing one of my scenarios as fanfiction of a story with some compelling characters, and inventing a situation where those characters could die, survive, or survive and avoid some of the problems they face in canon, depending on the player's decisions. This completely failed for reasons documented in the Reflections section of the evaluation post, chief among which is that none of my players had read the story my scenario was fanfiction of. I have various tentative plans for (hopefully!) more effective projects with the same goal.)

Creating a truly formidable Art

What would pressure-testing in the context of rationality look like?

Well, honestly, I don't yet know.

I have a few bad examples that don't strike me as entirely wrong . . .


At the risk of being accused of flagrant self-promotion, I also have a few bad examples that don't strike me as entirely wrong. My data science challenges are only tractable to players with the appropriate skillset, and resemble real-life problems the same way mystery novels resemble real-life detective work . . . but if you're looking for novel ways to test for skill at Inferring The Truth And Then Using It, they're probably relevant to your interests.

D&D.Sci 4th Edition: League of Defenders of the Storm Evaluation & Ruleset

This was extremely good. In particular, I like that you managed to make the challenge tractable to both Analysis and Machine Learning. I also appreciated that you included an explicit Real-world Data Science Moral in the wrap-up; I should try to do that more often.

D&D.Sci 4th Edition: League of Defenders of the Storm

. . . I feel oddly proud to have continued the tradition of D&D players getting in-universe names wrong.

D&D.Sci 4th Edition: League of Defenders of the Storm

Thank you for making this.

Regular team:

Nullifying Nightmare, Blaze Boy, Greenery Giant, Tidehollow Tyrant, and . . . yeah, okay, Phoenix Paladin.

(I was on the fence about whether the last spot should go to Paladin or Ranger, but when I saw Measure's answer I decided to let hipsterism be the deciding factor.)

Key Insights:

There seems to be a rock-paper-scissors thing going on here: Earthy fighters have an advantage over Watery fighters, Watery fighters have an advantage over Flamey fighters, and Flamey fighters - kinda, sorta, unreliably - have an advantage over Earthy fighters. (And the Nightmare has an advantage over everyone.)

This is relevant because 3/5 of the opposing team is Earthy fighters, including Greenery Giant, who has strength that rivals the Nightmare, and whose presence on a team predicts a ~60% chance of victory.

Teams which are slanted too heavily towards a given element have an extremely low win rate. I can't tell to what extent this is because losing the rock-paper-scissors game hurts you more than winning it helps, and to what extent balance is inherently valuable, so I'm playing it safe and not building an entire team of firestarters (also, there are only two Flamey fighters with non-terrible win/loss ratios).

Tangential insights:

I infer from the format of the alternative list that - absent an extremely tricky fakeout - position doesn't matter: A+B+C+D+E is equivalent to E+D+C+B+A.

Different fighters are used with very different frequencies, but this sampling bias doesn't seem to affect my analysis much.

Eyeballing the correlation matrix, it looks like teams are thrown together randomly; no pairs that always show up together, etc. This makes things much simpler, since I can be confident that (for example) GG's apparent power isn't just because people keep using him alongside NN (or vice versa).

There's a random element here. Existence proof: A+B+C+S+V vs A+E+I+T+V happened twice with different outcomes. Given this, I'd want to push Cloud Lightning Gaming to have the match be best-of-five, to decrease randomness' relevance to the outcome.

I appreciate the omission of letters that would let us (accidentally or otherwise) spell out common swearwords.

PVP team:



I made a Sequence for my replayable challenges, but think we should keep the tag. That way people wanting to make posts about D&D.Sci will have something to tag them with.

D&D.Sci Pathfinder: Return of the Gray Swan Evaluation & Ruleset

You may want to include a link to the challenge in this post, so people seeing it on the frontpage know what you're referring to.

Load More