D&D.Sci Dungeonbuilding: the Dungeon Tournament

[-]abstractapplic1y50

I still have a bunch of checking to confirm whether this actually works, but I'm getting my preliminary decision down ASAP:

CWB/OOH/XXD (where the Xes are Nothing or Goblins depending on whether I'm Hard-mode-ing)

On the basis that:

Adventurers should prioritize the 'empty' trapped rooms over the ones with Orcs, then end up funelled into the traps and towards the Hag; Clay Golem and Dragon are our aces, so they're placed in the two locations Adventurers can't complete the course without touching.

[-]abstractapplic1y30

On further inspection it turns out I'm completely wrong about

how traps work.

and it looks like

Dungeoneers can always tell what kinds of fight they'll be getting into: min(feature effect) between 2 and 4 is what decides how they collectively impact Score.

It also looks like

The rankings of effectiveness are different between the Entry Square, the Exit Square, and Everywhere Else; Steel Golems are far and away the best choice for guarding the entrance but 'only' on par with Dragons elsewhere.

Lastly

It looks like there's a weak but solid benefit to dungeoneers having no choice even between similarly strong creatures: a choice of two dragons and a choice of two hags are both a bit scarier than hag-or-dragon. (Though that might just be because multiple of the same strong creature is evidence you're in a well-stocked dungeon? Feature effects are hard to detangle.)

Also

It seems like there's a weirdly strong interaction between the penultimate obstacle and the ultimate obstacle?

[-]abstractapplic1y30

Oh and just for Posterity's sake, marking that I noticed both

the way some Tournaments will have 3 judges and others will have 4

and

the change in distribution somewhere between Tournaments 3000 and 4000

but I have no clue how to make use of these phenomena.

[-]kave1y20

Maybe sometimes a team will die in the dungeon?

[-]abstractapplic1y20

On reflection, I think

my initial guess happened to be close to optimal

because

Adventurers will successfully deduce that a mid-dungeon Trap is less dangerous than a mid-dungeon Orc

and

Hag-then-Dragon seems to make best use of the weird endgame interaction I still don't understand

however

I'm scared Adventurers might choose Orcs-plus-optionality over Boulders

so my new plan is

CBW/OOH/XXD

(and I also suspect

COW/OBH/XXD

might be better because of

the tendency of Adventuring parties to pick Eastern routes over Southern ones when all else is equal

but I don't have the confidence to make that my answer.)

[-]abstractapplic1y20

Did some more tinkering with this scenario. It is remarkably difficult to be 100% confident when determining the basic mechanics of this scenario, i.e.

whether adventuring parties can see more than one room ahead.

And I'm beginning to suspect that

some adventuring parties always take the optimal path, while some others are greedy algorithms just picking the easiest next encounter.

[-]abstractapplic1y20

Built a treebased model; trialled a few solutions; got radically different answers which I'm choosing to trust.

The machines seem to think that the best solution I can offer is

BOG/OWH/GCD

and I've

found a row which confirms the adventurers-scout-one-room-ahead paradigm is, at the very least, not both eternal and absolute

so I'm making that my answer for now.

[-]abstractapplic1y20

Oh, and as for

the Bonus Objective

if I'm continuing with my current paradigm I'd guess it has something to do with

an apparent interaction between Orcs and Hags which makes a path containing both less dangerous than might otherwise be expected

possibly such that

I could remove the Goblin in Room 7 without making the easiest path any easier

but

I have low confidence in this answer

and

I have no idea how I could get away with purging the second Goblin

[-]kave1y*40

So I did some super dumb modelling.

I was like: let's assume that there aren't interaction effects between the encounters either in the difficulty along a path or in the tendency to co-occur. And let's assume position doesn't matter. Let's also assume that the adventurers choose the minimally difficult path, only moving across room edges.

To estimate the value of an encounter, let's look at how the dungeons where it occurs in one of the two unavoidable locations (1 and 9) differ on average from the overall average.

Assuming ChatGPT did all the implementation correctly, this predictor never overestimates the score by much. Though it frequently, and sometimes egregiously, underestimates the score.

Anyway, using this model and this pathing assumption, we have DBN/OWH/NOC

We skip the goblins and put our fairly rubbish trap in the middle to stop adventurers picking and choosing which parts of the outside paths they take. The optimal path for the adventurers is DONOC, which has a predicted score of 30.29, which ChatGPT tells me is ~95th percentile.

I'd love to come at this with saner modelling (especially of adventurer behaviour), but I somewhat doubt I will.

[-]kave1y40

I'm guessing encounter 4 (rather than encounter 6) follows encounter 3?

[-]aphyer1y40

The dungeon is laid out as depicted; Room 3 does not border Room 4, and does border Room 6. You don't, however, know what exactly the adventurers are going to do in your dungeon, or which encounters they are going to do in which order. Perhaps you could figure that out from the dataset.

(I've edited the doc to make this clearer).

[-]simon1y*30

Looking like I'll not have figured this out before the time limit despite the extra time, what I have so far:

I'm modeling this as follows, but haven't fully worked out and am getting complications/hard to explain dungeons that suggest that it might not be exactly correct

the adventurers go through the dungeons using rightwards and downwards moves only, thus going through 5 rooms in total.
at each room they choose the next room based on a preference order (which I am assuming is deterministic, but possibly dependent on, e.g. what the current room is)
the score is dependent only on the rooms they pass through (but again, am getting complications)
I'm assuming a simple addition of scores to start with, but then adding epicycles (which so far have been based on the previous room, generally)
there is some randomness in the individual score contributions from each encounter.

For the dungeon generation: dungeon generation seems to treat rooms 1-8 equally (room 9 is different and tends to have harder encounters). Encounters of the same types (and some related "themes") tend to be correlated. Scores in each tournament seem to be whole numbers from each judge and averaged between 3 or 4 judges; I am not sure if any tournaments are judged by 2 or 1, but if so they're relatively less common.

In theory, I'd like to plug in a preference model and a score model to a simulator and iterate to refine, but I'm not there yet, still working out plausible scores and preferences.

One possibility for the scores and preference order:

baseline average scores:

Nothing: 0; Goblins: 1.5 (1d2?); Whirling Blade Trap 3; Orcs 3; Hag 4; Boulder Trap 4.5; Clay Golem 6, Dragon 6?, Steel Golem 7.5 (edit: <--- numbers estimated with small, atypical samples (included many Nothing, which is problematic for reasons that become obvious with below edit))

With Goblins and Orcs being increased (doubled?) if following goblins/orcs/any trap? (edit - or golems?) (edit - looking now like it's probably anything but an empty room?)

Plus with the adventurers seemingly avoiding Orcs and Hags more than their difficulty warrants? (I found them to be relatively late in the preference order, then found that they were in practice lower in score, so am having to ad hoc adjust if I keep the assumption that the score contribution and prefrence order are related. 1.5 multiplier? 2x multiplier? fixed addition?) (I'm assuming a 1.5x multiplier atm since I initially had Hag avoided over anything but orcs, but found one dungeon that looks suspiciously like, but does not prove, Hag being chosen over Dragon (edit: see below for update)) (I suppose +2 would also work) (edit - it looks like the Orc difficulty increase for following a non-empty room only applies to adventurer preference if the current room is also Orcs - violating the assumption that preference is tied to expected difficulty. But for Goblins it seems the preference may indeed depend only on following a non-empty room, though in practice it doesn't matter much since it only affects order wrt WBT).

(edit - see update to preference order below)

Assuming the above is correct, and I'm pretty sure it isn't but hopefully has some relationship with reality, one strategy might be:

CHN/WON/BOD <---obsolete answer

where the idea is to use the encounters the adventurers avoid too much relative to their actual score contributions (Hag, Orcs) to herd the adventurers away from the Nothing rooms. One of the Orcs is left in after a Boulder Trap in the belief that will make it score higher than the hag. WBT is left in the preferred path to lead the adventurers along, don't immediately see a way to avoid this.

EV if above model is correct: 6+3+4.5+6+6=25.5

How I've gotten here (mainly used Claude and Claude-written code, including the analysis tool which is good for prototyping if you don't mind javascript):

found initial basic encounter score contribution estimates from linear regression on whole dungeon
after determining that rooms 1-8 were interchangeable as far as dungeon generation is concerned, looked at room importance to score, guessed the basic model based on that iirc (might have been more complicated than this) (I do remember considering and rejecting a model where each room is selected one at a time from the full set of available rooms, and rejecting any "symmetrical" model based on working out the full path in advance)
initially assumed that adventurers preferred easier encounters based on the inital score estimates
refined preference order based on minimizing variance between same-predicted-sequence-of-encounters dungeons
tried to work out how scores actually work by filtering for specific predicted sequences of encounters and finding their scores
found epicycles from that and started refining model, including preference order adjustments
haven't really finished the above step, epicycles might be because model is wrong/incomplete?
hypothetical todo: apply model to entire dataset, also develop model for variations in score from each encounter, compare to known 3-judge and 4-judge tournaments for full Bayes assessment, refine further with this as feedback

edit: I've now read other people's comments; I did not notice any 1-point jump in scores (didn't check for it), not sure if i would have noticed if it is a judging difference as opposed to a strategy change? (wouldn't notice if just strategy change). Also I did not notice anything special about Steel Golems at the entrance vs. other spots, did not check for any change in distribution of 3 vs 4 judge tournaments, etc.

further analysis after the above:

I've looked at root mean square deviation of predictions from the data for the full dataset (full Bayes seems a bit intimidating to code atm even with AI help). From this it seems the preference order is (there remains a likely possibility for more complications I haven't checked):
Nothing > Goblins (current encounter null or Nothing) > Goblins (otherwise) = Whirling Blade Trap > Boulder Trap = Clay Golem = Orcs (current encounter not Orcs) > Dragon > Steel Golem >= Orcs (current encounter Orcs) > Hag Nothing > Goblins (current encounter null or Nothing) > Goblins (otherwise) = Whirling Blade Trap > Boulder Trap > Clay Golem = Orcs (current encounter not Orcs) > Dragon > Orcs (current encounter Orcs) > Hag = Steel Golem

~~where I can't distinguish between Steel Golem being preferred or equal to Orcs with current encounter being Orcs.~~

~~Soo, if Orcs are avoided equally to a Boulder Trap if the current encounter is not Orcs, I need to improve the herding.~~ But also it seems Orcs get doubled by many other encounter types? This could work:
CHN/OBN/WOD <---- current solution

Predicted value is now 6+6+3+6+6=27.

further edit: also refining the scores, getting probably nonsense (due to missing some dependcy of some stuff on something else, probably), but it's looking like maybe every encounter's score depends on whether the previous encounter was Nothing/null. Except traps/golems? Which would explain why Steel Golems are being reported as better in the first slot.

I'm also getting remarkably higher numbers for Hag compared with my earlier method. But I don't immediately see a way to profitably exploit this.

[-]Yonge1y10

I only had time to construct a simple model based on the average value of the score for different encounters in different dungeons. Based on this my submission is:

COG/GOB/WHD

And when Goblins aren't present:

CON/WOB/NHD

[-]Christian Z R1y10

A few notes about strange phenomenons in the scores:

1) As already pointed out there is a clear jump in scores by around one point around tournament 3400 (the jump is too small for me to be quite certain when it happened). This might be because of a small change in the rules. So conclusions drawn by data from before this point might be flawed.

2) The scores are either whole numbers, or fractions ending in halfs, thirds or quarters. So they might be from taking the mean of either 1, 2, 3 or 4 whole numbers. It is inherently tricky to find out how a given whole number came about. Also the half points might be the mean of either 2 or 4 numbers.
However there seems to be some time dependence in this. The scores ending in .333... or .666... are slightly (but statistically significantly) more likely to be found in the earlier tournaments than the scores ending in 0.25, 0.5 or 0.75. (40% of thirds happen before tournament 3400 compared to 35% of quarters.). This might very well be related to the rule change happening around tournament 3400, so later tournaments were more likely to have 2 or 4 judges instead of 3.

3) The mean score of Scores ending in 0.33.. and 0.66.. seems to be almost the same as the mean for scores ending in 0.25, 0.5 or 0.75. so it can not be only the change in number of judges that led to the change in mean score.

[-]Christian Z R1y30

Some more insights:

I assume that Adventurers can't walk diagonally. In that case we can try to look at dungeons where the same encounter is present in room 2 and 4 (or in room 6 and 8), so the adventurers must pass through that exact challeng. I then make a linear model on this encounter + include room 1 and 9 in order to control for the fact that dungeons with strong encounters in one room is more likely to also have them in others.

Looking at both the case where Enc2==Enc4 and the case where Enc6==Enc8 I got an agreement between the difficulties of the encounters (Nothing < W < G < B < O < S < H=D ). Clay Golems however differed.

It also seemed that in general meeting a tough challenge later would make a bigger difference than meeting it early.

So my plan will look the same as Abstractapplics for pretty much the same reasons, except I will place the Boulder Trap after the Whirling Blade, since Boulder Traps seems to be more challenging, and so should be encountered later.

[-]Christian Z R1y10

So my plan is:

CWB / OOH/ XXD

^{^}

"The bane of the mountains, the princess's hand. The loyal protector all at her command. Her lovely, her darling, oh now where are you? Are you gone? Are you lost? No, you're boiled in a stew!"

At this point you made a hasty retreat, because you did not like the way the Hag was eyeing you while talking about boiling you in a stew.

^{^}

Four different adventuring teams have signed up as judges to run this tournament's dungeons; thankfully, Dungeon Tournament officials will be healing your monsters and resetting your traps between adventuring teams, so you won't have to worry about that.

^{^}

"My uncle Gobbo has no tongue!" "How does he talk?" "...Terrible! Ahahahaha!" And you don't even want to mention the one about the Mind Goblin.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

50

D&D.Sci Dungeonbuilding: the Dungeon Tournament

50

50

STORY

DATA & OBJECTIVES

OPTIONAL HARD MODE

SCHEDULING & COMMENTS