D&D.Sci 4th Edition: League of Defenders of the Storm Evaluation & Ruleset

[-]abstractapplic4y50

This was extremely good. In particular, I like that you managed to make the challenge tractable to both Analysis and Machine Learning. I also appreciated that you included an explicit Real-world Data Science Moral in the wrap-up; I should try to do that more often.

[-]gjm4y40

Feedback on the scenario: I liked it, but evidently I took the framing story a bit too seriously because I took it to indicate that I would probably be wasting my time trying to understand the game mechanics, when actually they weren't so very complicated. I can't complain about how that worked out for me, given that mindlessly putting the given data into a model-fitting machine and cranking the handle produced a very good result, and if it had seemed more likely that the thing was approachable then I might have avoided it lest it be too much work :-), but I do feel a bit bad about not trying to use my brain a bit more and my computer a bit less.

[-]gjm4y40

I strongly suspect that I got rather lucky; at any rate, my model's predicted win-rate for my team was substantially less than the real ~80%, suggesting that the model didn't do a great job of capturing reality.

I wonder whether lsusr had a sign-error bug similar to mine.

[-]simon4y40

Hmm how about we switch to using a Condorcet method for the PVP ranking?

I screwed up my thinking on whether the sides were different, will add an edit/reply to my comment on the main post later.

Thanks to Maxwell Peterson for introducing me to R through his post and code, I hope to continue using R later, ideally with a better gears-level understanding than currently.

[-]aphyer4y30

True, you are the Condorcet winner. :P

Do you know how you ended up with Oil Ooze on your team? I was expecting to trick a lot of people into submitting Nightmare, but I wasn't expecting the Ooze to show up.

[-]simon4y20

The magic black box supplied to me by Maxwell, after I fiddled with it a tiny bit and supplied it with adjusted data, told me that BGNOT was supposedly the strongest team in general, and that the strongest counter to BGNOT was ABGOT. It also claimed that ABGOT was the strongest counter to gjm's BGNPT. I asked the magic black box how well a few candidate teams, including ABGOT, did against the non-secret competitors already posted, and the numbers it gave looked more generally decent for ABGOT than the other candidates (I was looking for broad-spectrum effectiveness more than average effectiveness) and it also said that ABGOT would have a decent winrate against average teams, so I went with it. I would have liked to make a figure of merit and find the top team for that figure of merit but wasn't able to do so in time.

In other words, the magic black box liked Oil Ooze for some reason.

[-]Maxwell Peterson4y10

When I read in the main post that the inclusion of Oil Ooze was confusing, I thought my magic box might be the guilty one!

[-]Alumium4y00

I'm perfectly happy not to claim any sort of prize.

I only got the elements due to Measure.

I only dropped the Nightmare due to Maxwell Peterson.

Also, I have no idea what I'd even ask for as a scenario.

[-]simon4y10

Also, I have no idea what I'd even ask for as a scenario.

Neither do I; I was just seeking the glory of (slightly tarnished due to hypothetical rule change) victory.

[-]Alumium4y20

I legitimately did not expect to do that well.

If I have done better than others, it is because I have stood on the shoulders of, uh, ogres at least.

I've gotten two morals from this. The good moral is 'Even if you have no idea what is going on, you can still data science at something.'

The bad moral is 'When in doubt, crib other people's notes (only be sure always to call it please 'research').'

[-]aphyer4y20

I think you're selling yourself a bit short here. You say that you 'only dropped the Nightmare due to Maxwell Peterson' - but Maxwell himself included the Nightmare on his PVP team!

Only two people submitted PVP teams without the Nightmare on them. One of them was you, and the other was using an analysis he didn't understand that led to him including Oil Ooze on his team for no reason he can discern even after the fact. (Sorry, simon).

If you managed to read through other people's findings and get more use out of them than those people themselves did, I think that leads to a well-deserved victory.

[This comment is no longer endorsed by its author]Reply

[-]Alumium4y00

Feedback: I'm biased because I won, but I had a great time. This was very approachable even for a complete beginner, while still having sneaky hidden tricks.

[-]SarahNibs4y20

I was fairly surprised my PvE team did so well. Why did my heuristics basically work?

How well do individuals do? Multiply. How well do teams do, relative to the product of their members? How well do individuals on a team do against the known opponents, pairwise? Okay try all teams, combine their "vs opponents" scores with "team coherence" scores, pick the max.

Finding how well teams did relative to how their individuals did let me accidentally pick just from balanced teams; asking how well individuals did against the known opponents got rid of Null and biased towards the 6s, 5s, and 1s. I think Maelstrom (Water 3) probably got on there because the team did really well in the data despite Maelstrom's inclusion, which looked the same to me as Maelstrom synergizing well with the rest of the team.

I weighted "individuals vs opposing team" equal to "team synergy"; I wonder if I had left the weights as each individual = each other individual = team synergy, whether that would have correctly axed 2-4s while still axing Null? [tries] Nope, that lets Null in quickly.

[-]SarahNibs4y20

Feedback: I liked this one a lot, in theory. I just found myself with less time than I wanted to actually engage. It had a latent structure that was definitely discoverable, but also definitely plenty obfuscated. Even with time constraints the barrier-to-entry was low enough that I could get something nice in pretty quickly. And the differences between individual, team, vs-that-team, and vs-any-team were interesting to mull over.

[-]Measure4y20

Confirmed PVP rankings with my own test script (my run swapped 5th and 6th place, but they're very close).

[-]Measure4y20

Lol, I just now realized that 5th/6th place are the exact same team.

[-]aphyer4y10

...huh, indeed they are, I guess I missed that.

Power Level	Fire	Water	Earth
1	Volcano Villain	Arch-Alligator	Landslide Lord
2	Oil Ooze	Captain Canoe	Earth Elemental
3	Fire Fox	Maelstrom Mage	Dire Druid
4	Inferno Imp	Siren Sorceress	Quartz Questant
5	Phoenix Paladin	Warrior of Winter	Rock-n-roll Ranger
6	Blaze Boy	Tidehollow Tyrant	Greenery Giant

Entrant(s)	Team	Win Rate
Optimal Play	Fire5, Fire6, Water1, Earth5, Earth6	81.47%
gjm*	Fire5, Fire6, Water1, Earth1, Earth6	80.40%
~~Alumium,~~ simon	Fire5, Fire6, Earth1, Earth5, Earth6	76.53%
GuySrinivasan	Fire5, Fire6, Water3, Earth5, Earth6	72.41%
Measure, Jemist	Fire6, Water6, Earth5, Earth6, Void5	70.01%
abstractapplic	Fire5, Fire6, Water6, Earth6, Void5	66.97%
Maxwell Peterson	Fire5, Water1, Earth1, Earth6, Void5	62.05%
Yonge	Water6, Earth1, Earth5, Earth6, Void5	36.00%
Random Play	5 randomly selected characters	28.55%
lsusr	Fire3, Water5, Earth2, Earth3, Earth4	14.97%

	~~Alumium~~	simon	Maxwell Peterson	Jemist	abstractapplic	gjm	GuySrinivasan	Yonge	Measure	lsusr	Overall Score
~~Alumium~~	–	~~46.29%~~	~~56.57%~~	~~58.27%~~	~~62.69%~~	~~62.87%~~	~~65.60%~~	~~82.14%~~	~~55.99%~~	~~73.95%~~	DQ
simon	~~53.71%~~	–	53.87%	64.97%	67.80%	68.06%	60.75%	69.63%	50.21%	68.09%	5.57
Maxwell Peterson	~~43.43%~~	46.13%	–	54.18%	58.78%	58.53%	61.90%	73.82%	68.37%	84.80%	5.50
Jemist	~~41.73%~~	35.03%	45.82%	–	50.46%	50.49%	49.26%	73.09%	76.70%	87.46%	5.10
abstractapplic	~~37.31%~~	32.20%	41.22%	49.54%	–	50.13%	50.51%	67.15%	64.42%	86.36%	4.79
gjm	~~37.13%~~	31.94%	41.47%	49.51%	49.87%	–	50.21%	67.34%	64.28%	86.50%	4.78
GuySrinivasan	~~34.40%~~	39.25%	38.10%	50.74%	49.49%	49.79%	–	54.93%	57.00%	88.72%	4.62
Yonge	~~17.86%~~	30.37%	26.18%	26.91%	32.85%	32.66%	45.07%	–	77.44%	75.83%	3.65
Measure	~~44.01%~~	49.79%	31.63%	23.30%	35.58%	35.73%	43.00%	22.56%	–	40.80%	3.26
lsusr	~~26.05%~~	31.91%	15.20%	12.54%	13.64%	13.50%	11.28%	24.17%	59.20%	–	2.07

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

40

D&D.Sci 4th Edition: League of Defenders of the Storm Evaluation & Ruleset

40

40

RULESET

CHARACTER STATS

HOW CHARACTERS FIGHT

HOW TEAMS FIGHT

DATASET GENERATION

PVE LEADERBOARD

PVP LEADERBOARD

FEEDBACK REQUEST