This is an entry in the 'Dungeons & Data Science' series, a set of puzzles where players are given a dataset to analyze and an objective to pursue using information from that dataset. 

STORY (skippable)

You may have experienced a meteoric rise to be the billionaire CEO of DataCorp, but you can still enjoy the finer things in life.  Like children's card games.

The world-famous gaming company 'Sorcerers of the Shore' is about to release a new card game, supposedly based upon the 'Ancient Sumerian Game of Shadows'.  They're holding an opening tournament next week, and you mean to attend.  

Rumors that the game absorbs the souls of its players, and can destroy them if they lose, are obvious nonsense.  That strange lady who appeared in your DataCorp office out of a cloud of mist and warned you that 'you unleash a power beyond your comprehension' was a transparent con artist.   You're not sure how she got in past your security unnoticed, or how she got back out afterwards, but you don't have time to worry about that!  You need to submit your deck list for the opening tournament!

Unfortunately, the Sorcerers of the Shore have been very tight-lipped about the whole thing.  They haven't actually published the ruleset of the game, only the names of cards available for you to select from.  (They said something about how you should select 'the cards that call out to your soul.'  This is also clearly nonsense.  You don't want cards your soul is somehow compatible with.  You want cards that will win.)

Fortunately, the ruleset is apparently 'unchanged from the ancient Sumerian game', and so you have funded a series of archeological expeditions to look for records of past plays of this game.  (Is this a good use of your DataCorp billions?  Of course!  This is Serious Business!)

Your expeditions have come back with...quite impressive results, actually.  Apparently they've uncovered a wider Ancient Sumerian civilization than anyone anticipated, with more advanced technology than anyone expected.  The archeological community is fascinated, but more importantly for you, you've uncovered a very large dataset of games.  Apparently this game was used by the Sumerians for a wide variety of things - there are records of it being used for gambling, divination, as a justice system in trials, and apparently 'holding back the Great Devourer lest it bring an end to this earth'.  You're not quite sure what this last one means - you think your archaeologists have probably messed up the translation.

Even more fortunately, it appears the Sumerians had some strange ritual around selecting cards for this game - they have done so entirely randomly, giving you a very clean sample.  This is fortunate, because you've just heard that your long-time rival will be attending the tournament as well.  (Some would call it undignified for a billionaire CEO to have an ongoing rivalry with a middle schooler who is 3 feet tall, or 4 if you count his hair.  Some would call it more undignified that you keep losing.)

Your rival has declared his intention to bring a deck with one of every card available.  This pathetic deck should be easy for you to beat - is he just hoping that he can get lucky and draw exactly the right card whenever he needs it?  He does get lucky annoyingly often though - you want to make sure your odds of winning are as high as possible, to make sure you beat him this time even if he gets lucky again.


  • You need to build a deck of 12 cards from the following available cards:
    • Alessin, Adamant Angel
    • Bold Battalion
    • Dreadwing, Darkfire Dragon
    • Evil Emperor Eschatonus, Empyreal Envoy of Entropic End
    • Gentle Guard
    • Horrible Hooligan
    • Kindly Knight
    • Lilac Lotus
    • Murderous Minotaur
    • Patchy Pirate
    • Sword of Shadows
    • Virtuous Vigilante
  • You can include any number of copies of any card.  So '12 copies of Alessin, Adamant Angel' is a valid deck, as is '10 copies of Alessin, Adamant Angel plus 1 of Bold Battalion plus 1 of Dreadwing, Darkfire Dragon'.
  • Your objective is to maximize your win rate against a deck that consists of 1 copy of each card.
  • THIS DATASET lists past games (what cards were played on each side, and who won).
  • The decks used in that dataset were randomly generated - each deck is a random set.*

*Note for nerds: specifically, each possible deck is equally likely.  This is not quite the same as 'pick 12 random cards each time', as that would make '12 copies of Virtuous Vigilante' much less likely than '1 copy each of 12 cards' due to the number of different orderings available for the second.  The takeaway for you is the same either way though - there isn't any hidden structure in the decks you'll see in the dataset, don't waste time looking for it.


As in a past scenario, you may also submit a PVP deck.  I recommend sending it as a PM to me, but if you don't mind other people seeing it you can just put it in your answer.  The PVP deck with the best overall record (sum of performances against all other submitted teams) will win the right to specify the theme of an upcoming D&D.Sci scenario.  I can't guarantee success, but at some point in the next few months (possibly after other scenarios in the pipeline have been produced) I will try to write a scenario around whatever theme (either a general genre or a specific work) you want.  Our previous winner, simon, selected the SCP Foundation canon as a theme for his scenario.

I don't want the existence of a PVP objective to incentivize people too strongly against posting findings in the chat, so as an effort to reduce the risk of your findings being used against you: if multiple people submit identical PVP decks, I will break the tie in favor of whoever submits it earlier.

I'll aim to post the ruleset and results  on April 4th (giving one week and both weekends for players).  PVP decks should be submitted by April 3rd to give me time to test them.  You may edit a submitted solution at any time before the deadline.  If you find yourself wanting extra time, comment below and I can push these deadlines back.

As usual, working together is allowed, but for the sake of anyone who wants to work alone, please spoiler parts of your answers (type a '>' followed by a '!' at the start of a line to open a spoiler block) that contain information or questions about the dataset. 

Thank you to abstractapplic and RavenclawPrefect, who reviewed drafts of this scenario.  (For the avoidance of doubt, they do not have inside information on the scenario, and are free to play it).

New to LessWrong?

New Comment
38 comments, sorted by Click to highlight new comments since: Today at 8:01 PM

A couple of people have already pointed out similarities to mtg/hearthstone so I thought it might be a good way to think explicitly about the rules/specific cards (some reasoning behind it, but these aren't predictions, just a way to solidify my intuitions):

- Hearthstone-like mana system since no lands (Lotus works like innervate?)
- Creatures regenerate like in mtg, otherwise bigger creatures could just be traded off with 2 or 3 smaller ones and it wouldn't be so big a deal to lotus one of them out
- Starting life total is balanced in a way that good aggro decks and good control (that is lotus ramp) decks have approximately even odds against each other

- Archetypes:
   - Good guy tribal: Battalion is some sort of lord for guards , knights, vigilantes and copies of itself (making it one of the inexpensive cards that doesn't do too badly with lotuses, because it makes sense to ramp out things that lord each other)
       - (intuition stats: guard: 1 mana 1/4, knight: 3 mana 3/3, vigilante: 3 mana 4/2, battalion: 2 mana 2/2 and buff all others by +1/+1. something like this would explain why battalion/guard > battalion/knight > battalion/vigilante)

   - Lotus ramp: Lotus out a big thing, big thing clears the board and wins the game. Dragons seem worse at this than Angels, emperors are slower than both 
       - (intuition stats: Angel: 5 mana 4/6, Dragon: 6 mana 5/6, Emperor: 7 mana 7/7, Lotus: gain 2-3 mana this turn only)

   - Evil equip: Sword of shadows is presumably some kind of equipment you put on Pirates, Hooligans or Minotaurs to give them stats and it can be reused when the creature has died (or maybe it's an anthem sort of thing, but that would go against its name). Pirates have a high winrate in general, and work well with both Swords and Angels, so I would expect Pirate to be overstatted for its cost and slightly defensive-leaning. Minotaurs don't have a great winrate and are not great with Swords, but are the other "low-cost" thing that works reasonably well with lotuses, and game nr. 185896 gives a hint of what minotaur does (mainly, having a minotaur unopposed gives you a short clock, 2-3 turns maybe?). 
       - (intuition stats: Pirate: 1 mana 2/3, Hooligan: 2 mana 3/2, Minotaur: 4 mana 7/1, Sword: 0 or 1 mana +2/+1 equip for 0)

- presumably, Swords technically also work with Dragons and Emperors, but those are big enough that it doesn't make much of a difference, same with Angels and battalions.


- diverse decks do well because it's important to curve out. Yugi's gonna be hitting the perfect curve, which means we have to defend early and outvalue late. With my completely made-up stats Yugi's curve would look something like: Pirate/Sword into Hooligan into Lotus Angel into Minotaur into Knight/Battalion into Dragon into Emperor.

- one way to do the "outvalue late" part of the plan is going with angels and at least 1 emperor. Which means we'll need lotuses. For the "defend early" part, this strongly depends on how exactly lotuses work. It might be that if we have, say, 5 Lotuses and 5 Angels, we can double Lotus into Angel turn 1 which already stops the aggro cold. But if you can only play one Lotus a turn or something, we're gonna need some smaller creatures.

- especially for smaller creatures, exact stats and things like whether double-blocking is a thing are hugely important. The first idea would be to just take a bunch of pirates, but then a pirate with a sword probably just eats those and it's not clear whether the saved life total makes up for the lost value.

- another way is stacking swords on things and hoping that that suffices to at least trade with Yugi's big creatures in the later turns or, depending on the cost of swords, to out-aggro him (seems unlikely against a perfect curve)

- starting Hand size is really hard to speculate on, but we can try anyway: Main observation is that lotuses don't work with aggro cards. If the starting hand size was something like 7, there would be no reason for that. It seems like, unless they have a sword/battalion or some good high-cost cards, aggro decks quickly run out of steam. Aggro can stay aggressive for about as many turns as it can double-spell to keep board control and it would seem that that limit is reached pretty early on, if a lotus decreasing that number by 1 is more relevant than a big boost in turn 1-2 board control. So my guess would be something like 2-3 cards in opening hand, with one drawn every turn. Yugi loves a small hand size, so this is something we'll need to keep in mind.

My PvE list:
5 Lotus
3 Angel
2 Emperor
2 Pirate

One thing to clarify in case of ambiguity in the specification:

 I know the scenario is putting you against an obvious Yugi expy, and that Yugi canonically always draws exactly what he needs to, but despite that your goal is not actually to optimize against an opponent who is perfectly lucky.  Your goal is to optimize against a regular opponent who is playing Yugi's deck - we're assuming that if you can get that win rate high enough it'll help against Yugi.

 I know this is not entirely satisfying, but it was necessary to make sure that the games from the dataset could be meaningfully used to optimize for your use-case.

I just want to express appreciation, because I am outright cackling at the title/story section.

Some initial observations:

Based on the card names, I think most are 'Creatures' but L and S are 'Artifacts'. There are 13 decks which were entirely Artifacts; these had 0% winrate. For comparison there are 364 decks with at most 2 distinct cards; these had 31% winrate. I thus think that Artifacts by themselves are unusable; they must be 'wielded' by some Creature.

Based on the card names, I think there's an even split between 'Good' (A,B,G,K,L,V) and 'Evil' (the rest). ~3k decks were entirely Good and enjoyed a 55% winrate, suggesting some faction synergy there. This stands out from the class of all decks that restrict to any group of <= 6 cards, which gets you <50% winrate (in fact in general having high diversity seems like it works pretty well: the ~2k decks that used 10 different cards had 59% winrate)

Followed up on this idea and noticed that

A table of winrate as function of number of "evil" cards and "item" cards shows that item cards only benefit evil decks. I considered dragon, emperor, hooligan, minotaur, and pirate to be evil.

  • "No items, all good" wins 55.6%
  • "4 items, no good" wins 58.4%
  • "4+ items, all good" peaks at 37.0% winrate, and drops when adding items

My deck:

2x Angel, 3x Minotaur Hooligan, 3x Pirate, 4x Sword.

My reasoning:

Absolutely no reasoning was applied in reaching this conclusion; all my attempts to solve this one analytically met dead ends. Instead, I copied the ML-based approach gjm won Defenders of the Storm with - except using gradient descent to search deckspace instead of trying all possible options - and got an answer I have no way to explain or evaluate. I'm very curious to see if this works!

Misc insights:

  • As a general rule, a more diverse deck leads to a better outcome. (Our opponent has the most diverse deck; this is worrying.)
  • Victory seems to be more down to synergies within a player's own deck than counters to their opponent's cards.
  • Apparent synergies: Angels like Lotuses, Emperors like Dragons, Pirates like Swords.
  • Having too many of the same card is always bad, but the diminishing-return point and the level of damage caused by homogenity both depend on the card. You need >4 Angels for it to start hurting you, but having eight of them sends your odds of victory to <20%; meanwhile, Vigilantes aren't that great, but you can have a deck that's two-thirds composed of them without your win chance going below 40%.
  • The four obviously and inarguably evil cards are bizarrely middle-of-the-road in terms of average effectiveness. Banality of Evil?
  • A GBM with treedepth=2 beats one with treedepth=1 and one with treedepth=3. This (weakly) suggests there are two-way interactions between cards, but no three-way interactions.


I had a transcription error in the original draft of this post: it should have been 3x Hooligan, not 3x Minotaur.

Just the obvious contrarian poke:

Are the decks equally likely? We observe that 412050 decks appear just once, 104483 decks appear twice, etc. Is this distribution compatible with random draws?

There are 342396 rows, with 2 decks each. Solving for the number of valid decks one could make, gives me (straightforward application of counting particle arrangements, imagine you have "coins" to place in "card-type boxes").

Then I just simulated and eyeballed. If I pick 2*342396 random numbers from 1 to 1352078, how many numbers appear just once? How many twice? Eleven runs and the data looks like this:

[412050, 104483, 17805, 2257, 246, 16, 1]

[411712, 104617, 17885, 2245, 213, 23, 1]
[412504, 104559, 17665, 2240, 212, 20, 5]
[412884, 104509, 17528, 2256, 225, 25, 1]
[413170, 104376, 17629, 2190, 217, 23]
[411926, 104692, 17828, 2209, 202, 23, 2]
[411883, 104746, 17624, 2302, 244, 16, 3]
[411866, 104728, 17779, 2258, 190, 24, 1]
[412771, 104106, 17813, 2294, 210, 24]
[412746, 104561, 17640, 2205, 209, 22, 1]
[412083, 104869, 17627, 2205, 237, 13, 1]
[412093, 104377, 17854, 2287, 224, 18, 1]

Which seems compatible enough.

I don't know why I expected anyone to take things for granted just because I told them not to bother looking into them.

If there is a place where "pay no attention to that man behind the curtain" is to be disobeyed with prejudice, I'd reckon this is it. ;)

Here are informed but wildly overconfident beliefs:

  • this is a card game with characteristics similar to Magic the Gathering
  • cards are played and have costs to play related to the number of alliterative words in their names
  • Lotus makes it easier to pay more costs, so has synergy with Angels, Dragons, and Emperors
  • Guards and Knights are pretty much the same; lots of them with lots of Battalions have great synergy
  • Hooligans, Minotaurs, and Pirates have some synergy
  • Swords are used by others and used especially well by Pirates
  • Good and Evil work well against each other?
  • Minotaurs kill indiscriminately??
  • Picking cards that call out to your soul can work better than random, at least, because cards that feel similar to a human also have in-game synergy

I know exactly what I want to do to continue analysis, but don't know whether I'll get/make the time to do it. For now, here is my PvE deck:

Angel: 3x (because they're just solid!)
Lotus: 3x (need to turbo out the Angels)
Emperor: 2x (I mean if you already have the Lotuses...)
Dragon: 1x (cannot resist a Dragon)
Vigilante: 3x (just some filler to stabilize)

I have completed the modeling I'm going to do.

I built a NN from scratch! That was fun. It probably has bugs. It has a deck eval with leaky ReLU and (12, 6, 4) arch. Two of those feed into a deck vs deck eval with (8, 3, 1) arch and a sigmoid output.

This NN loves Dragon ramp, it seems. I mean not loves loves, it thinks the best ramp still only beats our rival ~60% of the time.

New PvE deck:

Dragon: 4x (this is a Dragon ramp deck)
Emperor: 2x (with even more top end)
Lotus: 6x (how to ramp)

Does it like Dragon ramp in general, or just against the one-of-everything deck? My guess would be that Angel ramp beats Dragon ramp, but maybe Dragon ramp beats one-of-everything better than Angel ramp does.

Likes Dragon ramp pretty well in general. My best guesses are that:
- I've got bugs (90%)
- Networks are hard to fit (60%)
- Angels appear to do well because if you can ramp into them that's good but also they are friendly with some other pieces, that you don't actually want to have? (25%).

If I look only at decks that contain no Battalions, Guards, Knights, or Vigilantes, it looks like Angels and Dragons are about equally good in low quantities, but that Dragons get less (though still significant) penalty from having many copies.

Here is my very bad approach after spending ~one hour playing around with the data

  1. Filter decks that fought against a similar to the rivals deck, using a simple measure of distance (sum of absolute differences between the deck components)
  2. Compute a 'score' of the decks. The score is defined as the sum of 1/deck_distance(deck) * (1 or -1 depending on whether the deck won or lost against the challenger) 
  3. Report the deck with the maximum score

So my submission would be: [0,1,0,1,0,0,9,0,0,1,0,0]



This is fairly close to my result for worst deck to face a one-of-each deck (I would guess it has less than 10% win rate). Is it possible you flipped a sign somewhere?

EDIT: It in fact has a 31% win rate.

Thank you for bringing this up!

 I think you might be right, since the deck is quite undiverse and according to the rest diversity is important. That being said, I could not find the mistake in the code at a glance :/

Do you have any opinions on [1, 1, 0, 1, 0, 1, 2, 1, 1, 3, 0, 1]? This would be the worst deck amongst the decks that played against a deck similar to the rival's in my code, according to my code.

At first glance that one looks pretty mediocre - not obviously good or bad. I would guess it's slightly worse than one-of-everything.

This one has a 43% win rate.

It looks like your current code is using (max_distance - d) as the discount factor rather than 1/(d + 1). I tried both of those as well as 2^(-d) and got very different results with each. It appears you're also using a threshold to pre-filter decks, so d will probably be much less than max_distance anyway. I'm not sure I entirely follow your code, but it looks like you're just looking at decks that appear in the dataset rather than all possible decks?

Yes, I am looking at decks that appear in the dataset, and more particularly at decks that have faced a deck similar to the rival's.

Good to know that one gets similar results using the different scoring functions.

I guess that maybe the approach does not work that well ¯\_(ツ)_/¯ 

Seeking clarification here: which of these decks are you currently submitting?  If you need more time to decide, let me know.

Ah sorry for the lack of clarity - let's stick to my original submission for PVE

That would be:


Could you try reformatting this, please? It looks like your answer hasn't been successfully spoilered out.

Thank you!

Fixed, thanks!

My PvE deck (EDIT: see reply):

3x Alessin, Adamant Angel
2x Evil Emperor Eschatonus, Empyreal Envoy of Entropic End
3x Lilac Lotus
2x Patchy Pirate
2x Virtuous Vigilante

After further analysis, I will amend my PvE deck to the following:

2x Alessin, Adamant Angel
1x Dreadwing, Darkfire Dragon
2x Evil Emperor Eschatonus, Empyreal Envoy of Entropic End
3x Lilac Lotus
2x Patchy Pirate
1x Sword of Shadows
1x Virtuous Vigilante

(swapped an Angel out for a Dragon and a Vigilante out for a Sword)

Possible alternative strategy (not my PvE deck):

Eschew the ramp plan (Lotus) to go all-in on the Pirate-Sword combo along with a few high-end cards and a Vigilante because why not.

3x Evil Emperor Eschatonus, Empyreal Envoy of Entropic End
4x Patchy Pirate
4x Sword of Shadows
1x Virtuous Vigilante

Seeking clarification here: which of these decks are you currently submitting?  If you need more time to decide, let me know.


I made some progress (right in the nick of time) by...

Massaging the data into a table of every deck we've seen, and whether the deck won its match or lost it (the code is long and boring, so I'm skipping it here), then building the following machinery to quickly analyze restricted subsets of deck-space.

q = "1 <= dragon <= 6 and 1 <= lotus <= 6"
decks.query(q)["win"].agg(["mean", "sum", "count"])

q is used to filter us down to decks that obey the constraint. We then check the correlation of each card to winrate. Finally, we show how many decks were kept, and what the winrate actually is.

q can be pretty complicated, with expressiveness limits defined by pd.DataFrame.query. A few things that work:

  • (angel + lotus) == 0
  • 1 <= dragon and 1 <= lotus and 4 <= (dragon + lotus)
  • 1 <= dragon and lotus == 0
  • (pirate-1) <= sword <= (pirate+1)

My deck submission (PvE and PvP) is:

4 angels 3 lotuses 3 pirates 2 sword

There doesn't seem to be a killer card that dominates one or more of the others.

Having a balanced deck seems to be important, as the probability of winning  consistently declines as the number of distinct cards falls, and the maximum number of a single card rises.

The cards clearly aren't equivelant as some groups of cards do consistently better than others. 

I haven't been able to get a grip on what the rules might be, but peering through the fog suggests that the following combination won't be too bad:

 2 Alessin, Adamant Angel
 0 Bold Battalion
 1 Dreadwing, Darkfire Dragon
 2 Evil Emperor Eschanous, Empyreal envoy of entropic dead
 0 Gentle Guard
 1 Horrible Hooligan
 0 Kindly Knight
 2 Lilac Lotus
 1 Murderous Minotaur
 1 Patchy Pirate
 0 Swords of the Shadow
 2 Virtuous Vigilante

This is my provisional entry for both the main and PVP objectives.

Seems like you want to include A, L, P, V, E in your decks, and avoid B, S, K. Here is the correlation between the quantity of each card and whether the deck won. The ordering is ~similar when computing the inclusion winrate for each card.

My deck against the rival is:

Alessin_Adamant_Angel_Deck_A_Count: 3 

Murderous_Minotaur_Deck_A_Count: 1 

Patchy_Pirate_Deck_A_Count: 6 

Sword_of_Shadows_Deck_A_Count: 2 

This deck has around a 70% probability of beating the rival. I used a LightGBM booster trained on all the data. The out-of-sample accuracy before dumping all the data in was an AUC of 0.66, but the model is well-calibrated, so I'm confident in the ~70% probability.

Hi - having a great time with this! I have a decent answer currently, but want to try a more sophisticated search procedure for the PvP deck. If I submit by 11:59pm CST on Sunday, is that still considered April 3rd? Or are you planning to run things on Sunday, making late Sunday submissions inconvenient?

I'm fine with you submitting very late on Sunday, I'll be putting together code to run the PVP and it won't take much work to swap in a new deck on Monday.  

If you find yourself wanting more time beyond that, I am willing to grant extensions if you want them.

[This comment is no longer endorsed by its author]Reply