D&D.Sci II Evaluation and Ruleset

by abstractapplic5 min read17th Jan 20216 comments



This is a followup to the D&D.Sci post I made earlier this week; if you haven’t already read it, you should do so now before spoiling yourself.

Here is the web interactive I built to let you test your solution; below is a complete explanation of the rules used to generate the dataset. You’ll probably want to test your answer before reading any further.

(The generation process is more complex than last time, so I’m leaving some extremely minor and irrelevant details out: feel free to dig through my code for the full story if this bothers you.)


Item Types

A magic item is either a Weapon (Sword, Longsword, Warhammer, Battleaxe), Tool (Plough, Saw, Hammer, Axe), or Trinket (Ring, Pendant, Amulet).


Every item has an abstraction (Wrath, Prosperity, etc.) assigned to it. Weapons are randomly assigned Weaponish abstractions, Tools are randomly assigned Toolish abstractions, and Trinkets are randomly assigned Weaponish or Toolish or Trinketish abstractions. These abstractions have no effect on other features, and are only relevant to this exercise insofar as they make the demarcation of item types more obviously meaningful.


Weapons are assigned modifiers (+1, +2, etc.) explaining how much more damage the enchantment lets them inflict. This is both easily testable and tightly-regulated in-universe: warriors can just swing their new Battleaxe of Wrath +2 at a training dummy and confirm it does 2 extra damage. This is potentially relevant to us because the amount of extra damage is equal to mana/10 rounded down.

Non-Weapon items are also sometimes assigned modifiers; this is a marketing tactic performed ad-hoc by salesmen with shaky or nonexistent justification, and provides no actual information.

Color and Mana

Enchanting an item randomly assigns it a color, and then randomly assigns it an amount of mana via a process based on that color:

  • Red-enchanted items have 1d4*1d4*1d6 mana.
  • Blue-enchanted items have 1d6*1d10 mana.
  • Yellow-enchanted items have 17+1d4 mana.
  • Green-enchanted items have 2*1d20 mana.

The mean amount of mana for each color is near 20, but the variance varies a lot.

The Thaumometer

Wakalix did not read the instructions which came with his Thaumometer, and does not realize it needs to be calibrated based on the size and color of an item. Unfortunately, your lack of magical ability prevents you from adjusting it yourself.

It is currently optimized for Weapons and Tools which glow blue: its readings will only be off by one when applied to them.

When applied to Trinkets which glow blue, it will consistently overestimate the amount of mana present by 22, then be off by one from that position; for example, a Ring which glows blue and has a reading of 52 will have either 29 or 31 mana.

For a non-blue Weapon or Tool, it will roll 1d6*1d10 and report the result, completely disregarding the amount of mana present. For a Trinket, it will add 22 to this figure before reporting.

Purchase History

On a given day, the caravans (the only local source of magic items) will have 4-5 randomly-generated Weapons, 2-3 randomly-generated Tools, and 5-6 randomly-generated Trinkets. The value of Weapons is affected by their mana content via their modifiers; the value of other item types is completely uncorrelated with mana, since mana level for Trinkets and Tools is both difficult to detect and irrelevant for most customers.

Since before he started his list, Wakalix’s strategy when buying items for sacrifice has been to trust in his wizardly intuition and buy whatever two items he feels are best while disregarding cost; for the last 418 shopping trips, he has purchased two items almost completely at random. His only limitation is that these two items will not share a noun, abstraction, or modifier (he believes this lowers his chances of coming home with two low-mana items); selection effects produced thus are irrelevant enough to neglect.

(If you’re wondering why I included selection effects that didn’t affect the outcome, it’s because creating a dataset entirely without selection effects would cause me to turn to ash and blow away in the wind.)


When sacrificing an item, Wakalix always harvests all of its mana, can always tell exactly how much mana that is, and will always honestly report it. He’s very reliable like that.


Using the Thaumometer, you can deduce the amount of mana every blue-glowing object has to a tolerance of +/-1. You can also make use of the fact that yellow-glowing objects never have less than 18 mana. Using this information, you can guarantee >120 total mana by buying the Pendant of Hope, the Hammer of Capability, the Plough of Plenty, and the Warhammer of Justice +1, leaving you with 55gp profit.

(It is possible to do better than this by guessing randomly and being lucky. But given the information provided, and given that you’re trying to max your character’s EV and not your chances of topping the leaderboard, the above solution is optimal.)


This challenge was intended primarily as a horrible, unfair trap for those inclined to approach problems by throwing conventional ML algorithms at relevant datasets without doing manual data exploration. My condolences to everyone who got caught out, and my apologies to those who didn’t: I’ll endeavour to make future tricks harder to dodge.

Congratulations to gjm and GuySrinivasan for producing my intended solution. Congratulations also to everyone who produced my intended solution plus some safety margin, to everyone who came up with a solution that made them a profit, and to everyone who chose to send the owl back when they couldn’t find a solution they thought was worth gambling on.

(I really mean that last one, by the way. Making the best choice you can even if it seems boring or counter-intuitive is an important life skill; and also, the fact that people were willing to do this increases the variety of themes I can use in future games.)

I have some mixed feelings about how puzzle-ish this challenge turned out: ideally, I want challenges to have a smoother – or at least more natural – incentive gradient for figuring things out, but I couldn’t find a good way of doing that with this concept. Feedback on this point would be greatly appreciated, as would feedback about anything else about the challenge.


Week-long delays between posting a challenge and posting its solution don’t seem optimal; most of the action in the comments section happens in the first few days after I post, and I don’t want to leave people waiting for most of a week after coming to a final conclusion (or worse, leave anyone nerdsniped for an entire ~168 hours). At the same time, I want to ensure everyone has a chance to investigate before I reveal the ruleset, and I want to be able to clear up questions about the premise while people are still playing. I think the best way to balance this is to hold challenges over shorter periods of time, and call these well in advance: to that end, I hereby commit to posting the next challenge at 7pm UTC on Friday the 5th of March, and resolve it at 11pm UTC the following Sunday.

(My much more tentative long-term plan is to post a challenge on the first Friday of every month for the rest of this year. We’ll see how that goes.)


6 comments, sorted by Highlighting new comments since Today at 2:34 AM
New Comment

I don't think the trap was horrible and unfair. Rule One of data science is: always look at your freakin' data rather than blindly feeding it into the sausage-making machine and hoping you'll be able to eat what comes out.

For what it's worth, I'm perfectly fine with a week-long delay before the follow-up. I think your proposed solution is okay as well, though I would prefer Monday-Friday rather than Friday-Sunday, since I'm less likely to be at a computer on weekends.

I think the all-or-nothing threshold could have been omitted for a smoother gradient, but I do think it was helpful in this case to highlight the potential do-nothing solution (I seriously considered that Wakalix might be scamming me for some free sacrifice material with a fake/misleading list).

This challenge was intended primarily as a horrible, unfair trap for those inclined to approach problems by throwing conventional ML algorithms at relevant datasets without doing manual data exploration. 

I kinda noticed that when I tried to apply a Random Forest which, in my experience, almost no one tries to counter in their toy problems and encountered some really bad results (I couldn't even get enough mana if tried to go with its predictions no matter how I handled the data).

Soooo I made a simple feed-forward NN and returned home with ~20 gold according to the interactive web shopping. Using suspiciously powerful black boxes in problems obviously asking you to stop and think is also horrible and unfair, yet here I am.

To be honest, I did explore the data, making a correlation matrix between one-hot-encoded columns and readings/mana, and noticed some of the patterns you describe in the post. But they seemed complex, and I was sleepy, and so I went ahead and hammered in some nails with a microscope.

Very cool challenge idea overall, I greatly enjoyed doing it wrong. Thank you for sharing this on ACX. I'm a long-time lurker on LW, but this made me sign up.

I didn't do this problem, but I can imagine I might have been tripped up by the fact that "hammer" and "axe" are tools and not weapons. In standard DnD terminology, these are often considered "simple weapons"; distinct from "martial weapons" like warhammer and battleaxe, but still within the category of "weapons".

I guess that the "toolish" abstractions might have tipped me off, though. And even if I had made this mistake, it would only have mattered for "simple-weapon" tools with a modifier.

It was very obvious upon filtering to just-hammer or just-axe that the abstractions were suspiciously limited.

Wakalix did not read the instructions which came with his Thaumometer, and does not realize it needs to be calibrated based on the size and color of an item. Unfortunately, your lack of magical ability prevents you from adjusting it yourself.

This wouldn't have helped with solving the problem by itself, but sending this friendly advice along might have been a good idea.