D&D.Sci 5E: Return of the League of Defenders Evaluation & Ruleset

aphyer

This is a follow-up to last week's D&D.Sci scenario: if you intend to play that, and haven't done so yet, you should do so now before spoiling yourself.

There is a web interactive here you can use to test your answer, and generation code available here if you're interested, or you can read on for the ruleset and scores.

RULESET

A character has three stats, Attack, Defense and Range.

Name	Attack	Defense	Range
Daring Duelist	5	0	1
Bludgeon Bandit	4	1	1
Silent Samurai	3	2	1
Lamellar Legionary	2	3	1
Granite Golem	1	4	1
Flamethrower Felon	4	0	2
Captain Chakram	3	1	2
Jaunty Javelineer	2	2	2
Hammer Hurler	1	3	2
Professor Pyro	3	0	3
Matchlock Marauder	2	1	3
Rugged Ranger	1	2	3
Thunder Tyrant	2	0	4
Amazon Archer	1	1	4
Wily Wizard	1	0	5

Congratulations to abstractapplic, who I think was the first person to identify range differences (albeit with a slight confusion that led to categorizing the 2-range characters as 'long-range' ones).

The most important element of underlying structure is that when playing the game your team will line up in order, with a 'front line', 'mid line' and 'back line' character. Lower-range characters will go at the front, higher-ranged ones behind them. If two characters have the same Range, the higher-Defense one will go in front.

For example, if a blue team of Duelist (A5-D0-R1), Samurai(A3-D2-R1), and Archer (A1-D1-R4) plays against a green team of Bandit(A4-D1-R1), Javelineer (A2-D2-R2) and Hurler (A1-D3-R2), the teams will line up like this:

Amazon

Archer

Daring

Duelist

Silent

Samurai

Bludgeon

Bandit

Hammer

Hurler

Jaunty

Javelineer

with each team's frontline next to the opposing frontline, and the midlines and backlines further away.

The game is played as follows:

An initiative order is determined (this is based on how skillfully players play, but in the dataset it was generated by ordering the six characters at random).
Each round, in initiative order, each character attacks.
A character does damage equal to their Attack minus their target's Defense, but not less than 1.
- So if a defender has 2 Defense, attackers with 1, 2 or 3 Attack will all do 1 damage to it, Attack 4 will do 2 damage, Attack 5 will do 3 damage.
A character can only attack an opposing character that is within its range, and will preferentially attack the furthest-back character it can. So in the diagram above:
- Silent Samurai can attack only Bludgeon Bandit, and will do 2 damage (3-1).
- Daring Duelist cannot attack any enemy, and will skip its attack.
- Amazon Archer can attack either Bludgeon Bandit or Hammer Hurler, but will choose to attack Hammer Hurler, dealing 1 damage (1-3, min 1).
- Bludgeon Bandit can attack only Silent Samurai, and will do 2 damage (4-2).
- Hammer Hurler can attack only Silent Samurai, and will do 1 damage (1-2, min 1).
- Jaunty Javelineer cannot attack any enemy, and will skip its attack.
Once a character has taken 6 damage, it is KOd. At this point, any characters behind it immediately 'step forward' to fill the spot it vacated.
- So if Silent Samurai in the diagram above is defeated, Daring Duelist will step forward into the front-line slot (from which it will now attack Bludgeon Bandit) and Amazon Archer will step forward into the mid-line slot (from which it will now attack Jaunty Javelineer).
Characters keep attacking (repeating in the same initiative order) until all characters on one team have been KOd, at which point the other team wins.

STRATEGY

Broad general strategy was:

Ensure that your characters were all able to attack by bringing sufficient range (at least 2 on your midline and at least 3 on your backline).
Ensure that your characters would 'focus fire', targeting the same opponent to KO one enemy quickly. This was accomplished by bringing sets of adjacent ranges, e.g. one 1-range, one 2-range and one 3-range character.
Try to bring Defense/Attack values that worked well against what the opponent brought (for example, against a 2 Defense opponent you want either 4+ Attack to do >1 damage, or 1 Attack to free up points for other stats.

The strongest team types I'm aware of were:

By far the most powerful strategy against the broad field was a '1-2-3' team composition, bringing one character from each of those three ranges to focus the front enemy character while having maximum possible Attack/Defense.
- The most important aspect of a 1-2-3 for playing against other 1-2-3s was to bring a durable enough frontline to take at most 1 damage per hit, since both teams would be focusing the opposing frontline.
You could also bring a '2-3-4' composition, sacrificing some amount of Attack/Defense to focus the middle rather than the front opposing character.
- This was generally weaker than a 1-2-3 composition against the broader field.
- However, it was very valuable against compositions where the front line was more durable than the midline, including many 1-2-3s.
In theory you could bring a similar '3-4-5' composition, but in practice the lost Attack/Defense was too much for this to be good except as a niche counter to extremely specific opposing teams.
There was also a somewhat-surprising set of 'Sneaky Duelist' team compositions, with ranges of 1-1-2 or sometimes 1-1-3, that worked well to beat specifically 1-2-3 compositions.
- This team brought a disposable front-line character, e.g. Silent Samurai (to get focused down by the 1-2-3 team but do some damage to their frontline in the meantime) and then put a Duelist in the midline and a 2 or 3-range character (e.g. Jaunty Javelineer) in the backline.
- The Duelist and possibly the backline as well could not attack until the front-line was KOd and they were able to step forward, but once that happened they could usually KO the enemy frontline before their own Duelist was KOd.
- At that point, the 1-2-3 team's midline and backline would step forward and retarget away from the Duelist, and while they focused the character behind the Duelist it would KO them.
- This team was awful against anything that wasn't specifically a 1-2-3 teamcomp (since if the opposing team didn't focus down your frontline you would be left with your midline Duelist unable to attack), but it was favored against most 1-2-3 teams.

The NPC team had:

Frontline Silent Samurai (A3-D2-R1)
Midline Flamethrower Felon (A4-D0-R2)
Backline Rugged Ranger (A1-D2-R3)

All three of the main strategies had some potential viability against the NPC team:

The NPC team itself was a 1-2-3, but a somewhat-suboptimal one with e.g. a more fragile frontline than was optimal. Making your own 1-2-3 team would usually get you a 50% winrate. Optimizing your 1-2-3 team could improve this further. The best possible 1-2-3 team was to:
- Bring a frontline that can survive the enemies' attacks well, with 3 or more Defense (Lamellar Legionary or Granite Golem).
- Bring a midline that can deal >1 damage in a hit to their Silent Samurai (Flamethrower Felon)
- Bring a Range 3 backline (Professor Pyro, Matchlock Marauder, or Rugged Ranger would all work).
- Any of these 6 potential combinations would get an 85% winrate (you deal 4 damage/round to their frontline while they do 3/round to yours, but occasionally you get unlucky and enough of their characters win initiative that they KO your frontline first anyway).
A 2-3-4 team was the best possible strategy against the NPC team, capable of reaching a 100% winrate.
- The ideal team brought Hammer Hurler (A1-D3-R2) in the frontline, Professor Pyro (A3-D0-R3) in the midline, and Thunder Tyrant (A2-D0-R4) in the backline.
- This team would deal 6 damage to the opposing midline Flamethrower Felon in the first round (KOing it) while taking at most 3 on its own frontline.
- Other 2-3-4 teams also worked well, but not quite to the level of a 100% winrate.
A 1-1-2 team could also successfully counter the NPC team.
- The best possible 1-1-2 team brought a Samurai, Duelist, and Felon.
- This teamcomp behaved somewhat strangely, effectively sacrificing the Samurai to get the Duelist and Felon favorable initiative that they could use to rapidly cut through the medium-low durability on the NPC characters.
- While somewhat vulnerable to initiative order, this managed an 89.2% winrate, and several other 1-1-2 teams could also work well (e.g. any team with a Legionary, Duelist and 2-range character was 65-75% favored).

PVE LEADERBOARD

Submissions were:

Player	Frontline	Midline	Backline	Winrate
Optimal Play	Hurler (A1-D3-R2)	Professor (A3-D0-R3)	Tyrant (A2-D0-R4)	100%
gjm	Golem (A1-D4-R1)	Felon (A4-D0-R2)	Professor (A3-D0-R3)	85%
Optimal 1-2-3	Legionary or Golem	Felon	Professor, Marauder or Ranger	85%
simon	Bandit (A4-D1-R1)	Duelist (A5-D0-R1)	Javelineer (A2-D2-R2)	64.4%
abstractapplic	Legionary (A2-D3-R1)	Hurler (A1-D3-R2)	Professor (A3-D0-R3)	50%
Yonge	Hurler (A1-D3-R2)	Marauder (A2-D1-R3)	Professor (A3-D0-R3)	26.4%
Random play	??	??	??	17.5%

Two players submitted 1-2-3 teams:

gjm managed to correctly target the enemy team's weakness with a midline Felon, taking the lead overall in PVE with the best possible 1-2-3.
abstractapplic submitted a 1-2-3 team with exactly a 50% winrate (relatively fragile attackers meant that whoever's frontline fell first would basically always lose, and so those games ended up as pure coinflips on initiative rolls).

Two players boldly tried something different:

simon actually discovered 1-1-2 comps at the last minute, and submitted a 1-1-2 comp that managed to be favored against the NPC team.
Yonge came tantalizingly close to a 2-3-4 comp with Hammer Hurler in front that would have done extraordinarily well, but sadly had a 3-range backline that targeted differently from the other characters, for an overall 2-3-3 team that didn't quite work as well as the 1-2-3 approach.

PVP LEADERBOARD

The below should be considered non-final for a few days so people can check and confirm that I've gotten it right.

Again, most players submitted 1-2-3 teams (and everyone included Professor Pyro):

Player	Frontline	Midline	Backline
abstractapplic	Legionary (A2-D3-R1)	Javelineer (A2-D2-R2)	Professor (A3-D0-R3)
gjm*	Golem (A1-D4-R1)	Felon (A4-D0-R2)	Professor (A3-D0-R3)
simon	Legionary (A2-D3-R1)	Captain (A3-D1-R2)	Professor (A3-D0-R3)
Yonge*	Hurler (A1-D3-R2)	Marauder (A2-D1-R3)	Professor (A3-D0-R3)

*Note: gjm and yonge did not submit distinct PVP teams, I've used their PVE ones here, which may not have been intended for PVP.

with winrates:

	abstractapplic	simon	Yonge	gjm	Overall
abstractapplic	-	52.8%	100%	80.4%	2.33
simon	47.2%	-	51.7%	96.5%	1.95
Yonge	0%	48.3%	-	94.3%	1.43
gjm	19.6%	3.5%	5.7%	-	0.29

The 1-2-3 teams all had similarly durable frontlines, and the same Professor backline, but performed very differently based on their midlines.

While gjm's midline Felon was an excellent pick against the PVE team (able to target their 2-defense frontline and do 2 damage a hit), it fared very poorly here, only able to do 1 damage per hit to the durable frontlines it encountered, and vulnerable to being KOd extremely fast (either once its frontline was defeated or to yonge's team targeting it directly.)

simon and particularly abstractapplic submitted more durable midlines that performed better overall, and brought Legionary rather than Golem as their frontline (a slight improvement since no-one brought a Duelist).

Yonge's 2-3-3 team helped highlight the midline difference with two characters targeting the opposing midline, doing extremely differently based on how durable the opposing midline was (ranging from a 94% winrate against gjm to guaranteed defeat against abstractapplic).

Congratulations to abstractapplic for winning the PVP overall! Now ~~abstractapplic gets to specify two scenarios~~ ~~I have to feel twice as guilty about not having finished abstractapplic's requested scenario yet~~ abstractapplic gets to feel twice as much pride in their Data Science Skills!

FEEDBACK REQUEST

As usual, I'm interested to hear feedback on what people thought of this scenario. If you played it, what did you like and what did you not like? If you might have played it but decided not to, what drove you away? What would you like to see more of/less of in future? Do you think the scenario was too complicated to decipher? Or too simple to feel realistic? Or both at once? Do you have any other feedback?

[-]gjm11mo40

I didn't actually submit a PvP entry. I assume you used my PvE one, but it wasn't intended for PvP use and I am in no way surprised that it came last. I don't particularly object to its having been entered in the PvP tournament, but maybe there should be a note explaining that it was never meant for that?

[-]aphyer11mo40

Fair enough, edited.

[-]gjm11mo30

Thanks!

[-]simon11mo30

Thanks for the scenario, aphyer.

I made a last minute PVE change ~~which didn't get into the results~~, but looks like it would have gotten 64.44% winrate which is still lower than gjm's. Congrats to gjm and abstractapplic. I also had previously changed my PVE selection which also isn't in the results, but that change didn't make any difference - it was still 50%.

Interesting ruleset that has some complicated behaviour, but still allows analysis. I think it was actually quite good in this respect, though even with the extension I didn't really get to a point where I felt I was done.

If I had continued the analysis, my next thing to look at would have been how different candidate PVP teams, plus yonge's PVP team, interacted with different team compositions (classified according to the groups

which corresponded to range 1, range 2, and range 3+).

Not sure what I would have ended up concluding from this.

[-]aphyer11mo30

I actually edited to include your PVE change, you did manage a 64% winrate. Sorry not to give you more time, didn't realize there was work still ongoing.

[-]simon11mo10

NP aphyer, I didn't ask for any more time, though I was happy to get some extra due to you extending for yonge. I hadn't been particularly focused on it for a while, until trying to get things figured out at the last minute, largely I think due to me having spent a greatly disproportionate-to-value effort on figuring out how to do similarity clustering on a highly reduced (and thus much more random) version of the dataset, and then not knowing what to do with the results once I got them. (though I did learn stuff about finding the similarity clustering, so that was good).

Looks like the clusters I found in the reduced dataset more or less corresponded to:

either an aggressive 2-ranged character or everything fairly tanky (FLR cluster)

tending towards tankier 2-ranged and aggressive 1-ranged (melee) character (HSM cluster, note I had excluded B and D from this dataset)

tending towards more aggression to the back (JGP cluster)

So now I'm trying to figure out why the observed FLR>HSM>JGP>FLR rock-paper scissors effect occurred...

edit: a just-so story (don't know if real reason):

JGP vs FLR: FLR loses the melee first, then likely loses the 2-range since very squishy, then doomed.

FLR vs HSM: HSM loses the melee first. Then FLR might well lose the 2-range first, depending on initiative. FLR would then be splitting damage, but since HSM's 2-range is already damaged and FLR's tank typically isn't that tanky, HSM's 2 range might well die before FLR's backline? dunno, seems weak explanation

HSM vs JGP: HSM loses the melee first. But then, the tanky 2-range of HSM tends to last a while, and the tanky melee of JGP doesn't contribute much. Once JGP loses its 2-range, it splits damage between HSM's remaining characters, while HSM focuses and defeats JGP's squishy backline?

[-]abstractapplic11mo20

Reflections on my performance:

I failed to stick the landing for PVE; looking at gjm’s work, it seems like what I was most missing was feature-engineering while/before building ML models. I’ll know better next time.

For PVP, I did much better. My strategy was guessing (correctly, as it turned out) that everyone else would include a Professor, noticing that they’re weak to Javelineers, and making sure to include one as my ~~back~~midline.

Reflections on the challenge:

I really appreciated this challenge, largely because I got to use it as an excuse to teach myself to build Neural Nets, and try out an Interpretability idea I had (this went nowhere, but at least failed definitively/interestingly).

I have no criticisms, or at least none which don’t double as compliments. The ruleset was complicated and unwieldy, increasing the rarity of “aha!” moments and natural stopping points during analysis, and making it hard to get an intuitive sense of how a given matchup would shake out (even after the rules were revealed) . . . but that’s exactly what made it such a useful testing ground, and such valuable preparation for real-world problems.

[-]gjm11mo20

I think calling anything I did "feature engineering" is pretty generous :-). (I haven't checked whether the model still likes FGP without the unprincipled feature-tweaking I did. It might.)

LESSWRONG
LW