D&D Sci Thanksgiving: the Festival Feast Evaluation & Ruleset

aphyer

This is a follow-up to last week's D&D.Sci scenario: if you intend to play that, and haven't done so yet, you should do so now before spoiling yourself.

There is a web interactive here you can use to test your answer, and generation code available here if you're interested, or you can read on for the ruleset and scores.

RULESET

A dish has three properties: Size, Spiciness and Sweetness.

Dish	Size	Spiciness	Sweetness
Ambrosial Applesauce	0	0	2 (±1)
BBQ Basilisk Brisket	2 (±1)	1 (±1)	1 (±1)
Chili Con Chimera	2 (±1)	2 (±1)	0
Displacer Dumplings^[1]	0	0	0
Ettin Eye Eclairs	1 (±1)	0	4 (±1)
Fiery Formian Fritters	1 (±1)	2 (±1)	0
Geometric Gelatinous Gateau	1 (±1)	0	3 (±1)
Honeyed Hydra Hearts	1 (±1)	0	2 (±1)
Killer Kraken Kebabs	3 (±1)	3 (±1)	0
Mighty Minotaur Meatballs	3 (±1)	0	0
Opulent Owlbear Omelette	2 (±1)	0	0
Pegasus Pinion Pudding	1 (±1)	0	1 (±1)
Roc Roasted Rare	4 (±1)	0	0
Scorching Salamander Stew	2 (±1)	3 (±1)	0
Troll Tenderloin Tartare	1 (±1)	0	0
Vicious Vampire Vindaloo	2 (±1)	4 (±1)	0
Wyvern Wing Wraps	1 (±1)	1 (±1)	0

Cooking is not an exact science, and there's a ±1 variation in each dish (effectively adding 1d3-2 to each of its stats). For example, if you cook Chili Con Chimera (usually Size 2 and Spiciness 2), you might end up with a large mild batch (Size 3 Spiciness 1), or a small spicy batch (Size 1 Spiciness 3), etc. etc. However, any stat that is at 0 will stay there: your Chili will never end up sweet, for example, and your Eclairs will never end up spicy.

The ideal Feast has in total Size 10, Spiciness 5, and Sweetness 5.

Each of these stats increases your score up to this ideal, and then decreases it thereafter. For example, a total Size of 6 will score you 6 points, Size 9 will score you 9 points, Size 12 will score you 8 points (10 minus 2), and Size 15 will score you 5 points (10 minus 5).

STRATEGY

The first order of strategy was to select dishes that tended to generate the correct overall amount of size/spiciness/sweetness.

The subtler element of strategy was to minimize variance by using as few dishes as possible for each stat: reaching e.g. 5 Sweetness using 3 Sweet dishes is worse than reaching 5 Sweetness using 2 Sweet dishes, because there's more variance away from your ideal 5.

The perfectly optimal feast that used as few dishes as possible for each stat was actually very tightly constrained:

Two Sweet dishes are needed to reach 5 Sweetness.
- One of these should be Applesauce, which avoids adding variance to Size by having Size 0 and just adding Sweetness 2.
- The other therefore needs to have Sweetness 3, and so must be Gateau (Size 1 Sweetness 3).
Two Spicy dishes are needed to reach 5 Spiciness.
- These dishes should also give as much Size as possible, so that we can avoid having to include too many more dishes for Size.
- Therefore we bring Chili (Size 2 Spiciness 2) and Kebabs (Size 3 Spiciness 3) to contribute Size 5 along with Spiciness 5 in two dishes.
With 6 Size so far, we can reach Size 10 in one more dish by bringing Roc (for a total of 4 dishes that contribute to Size).
Optionally, we can include Dumplings as well, since they have no effect.
So we bring AC(D?)GKR.

However, there were many other options that would get extremely close in score, just e.g. having very slightly more variance in one stat or another, or accepting a Spiciness/Sweetness of 4 off a single dish (4±1 scores only a tiny bit worse on average than 5±2).

LEADERBOARD

Player	Dishes	Size	Spiciness	Sweetness	Average Score
Optimal Play	AC(D?)GKR)	10 (±4)	5 (±2)	5 (±2)	16.94
James Camacho	AGORTV	10 (±5)	4 (±1)	5 (±2)	16.67
Unnamed	ABDMOPV	10 (±5)	5 (±2)	4 (±3)	16.30
simon	BDGKOPT	10 (±6)	4 (±2)	5 (±3)	16.09
Multicore	ABDGRS	9 (±4)	4 (±2)	6 (±3)	15.89
abstractapplic	ABDHKO	8 (±4)	4 (±2)	5 (±3)	15.52
Yonge	ABFOP	6 (±4)	3 (±2)	4 (±3)	12.63
Entirely Random Play	??	??	??	??	9.88

Congratulations to all players, especially James Camacho.

DATASET GENERATION

The Isamandan feasts in your dataset were generated as follows:

Select 2-10 (4d3-2) dishes at random.
- Some dishes are weighted differently here. For example, Roc Roasted Rare is the traditional main course, and is more common, while Ambrosia is hard to find and so Ambrosial Applesauce is rare.
If the combined average Size of the resulting dishes is <6, this doesn't really count as a Feast, just as a regular dinner, and nobody gets invited over/this doesn't enter your dataset.
- This creates some additional bias towards the large-Size dishes in feasts with few dishes, but not in feasts with many dishes (which will hit the Size threshold anyway).

FEEDBACK REQUEST

As usual, I'm interested to hear any other feedback on what people thought of this scenario. If you played it, what did you like and what did you not like? If you might have played it but decided not to, what drove you away? What would you like to see more of/less of in future? Do you think the scenario was more complicated than you would have liked? Or too simple to have anything interesting/realistic to uncover? Or both at once? Did you like/dislike the story/fluff/theme parts? What complexity/quality scores should I give this scenario in the index?

^{^}
These dumplings Displace themselves out of your stomach after being eaten.

Reflections on my performance

I took this game as an opportunity to demonstrate how my modelling library works. This was in its own way a resounding success: I can’t think of a better demonstration of my methodology’s strengths and weaknesses than getting a uniquely deep & justifiable view of the underlying systems - I really wasn’t expecting to be right about everything in this post (minus arguably the last part) - and then significantly underperforming more mainstream approaches. Still, I think I did acceptably, and a combination of decent judgement and excellent luck happened to let my character get a Perfect Feast, so I’m content.

(I really shouldn’t have skipped the “. . . and then use all the insight you got from playing with interpretable models to make best use of uninterpretable treebased models” step^[1] at the end. I’ll know better next time.)

Reflections on the challenge

I know I liked this one, but I’m uncertain as to how much: there were two major aspects I find myself deeply ambivalent about.

First: the relatively low number of rows, high number of columns and nonzero randomness in output made this look like a data starvation problem, but there turned out to be a pretty tight linkage between predictors and responses, such that it actually was more-or-less fair; in other words, the scenario pretended to be harder and jankier than it was. The effect of this for me was kind of videogamey: I’m not sure whether to consider this impeccable design (“good job mechanically playing into the scrappy-underdog-who-keeps-winning-anyway fantasy!”), lowkey disquieting (“should a game about epistemology be doing that, though?”), or a legitimate extra layer of challenge (“I need to git gud at knowing how gud I need to git.”^[2]).

Second: the challenge was in retrospect pretty cheeseable, but none of the actual players actually cheesed it. I can net a low-variance high-Quality Feast just by ordering historical Feasts by Quality and mimicking one of the ones that managed to get Quality=20^[3]; that said, afaict the only person to make use of an approach like this was me, and I still didn’t lean on it anywhere as hard as I could have^[4]. (This could have been fixed with a trivial extension of the existing ruleset by having one or more foods with high and high-variance Sweetness and/or Spiciness, such that they could sometimes fill one or more of the quotas by themselves; these would be disproportionately represented in the highest rows, but in expectation a terrible choice for players.)

There were also plenty of things I just straightforwardly enjoyed. The writing and the premise were both fun, and underlying mechanics were conceptually beautiful and impeccably implemented. Also, I like a game which lets me show off, and a game which kicks my ass; this one did the former qualitatively and the latter quantitatively, which imo counts as two reasons to think it's good. All told, I’d award this a conflicted-but-approving [almost-certainly-at-least-three-and-plausibly-more-than-that-but-I-don’t-know-how-much]/5 for Quality.

^{^}
Or, potentially, the “construct an entire modelling paradigm around the shape of the problem” step.
^{^}
I’d have been much less likely to skip that final step if I hadn’t thought I had too little data to justify higher model complexity.
^{^}
Two of the four dish combinations with Quality=20 had >16 Quality in expectation.
^{^}
If a tree falls in the forest, and the only person to hear it is over a mile away, is the sound it makes loud?

Two of the four dish combinations with Quality=20 had >16 Quality in expectation.

This is true, but is actually something I looked into making the scenario. The average score of 'pick a random 20-quality feast from the dataset' was 15.38, which players did successfully beat.

There was a related writing-constraint that came from me pushing to simplify the ruleset a bit. The original envisioned ruleset was going to give players an additional rule limiting which dishes they were allowed to include.^[1]

This would have let me give a substantially larger dataset without worrying that grabbing the best-scoring thing out of the dataset would trivially solve the scenario - if 6-7 dishes were banned, it would be easy for none of the top-scoring feast to be allowed for you, and/or for the best-scoring feast you were allowed to be a single bit of random luck that would betray you if you repeated it.

When I removed that rule, I needed to cut down on the dataset size to avoid that being a trivial solution, which is what led to the data-starvation. Overall I think something like that wouldn't have been worth the complexity - just telling players they can include whatever dishes they want is simpler and also feels more realistic in context. Open to other views on that, though.

^{^}
I had a whole bunch of excuses lined up for this too! One of your companions gets seasick and doesn't want to go hunt Kraken...another is Good-aligned and will be angry if you kill a Pegasus...

Fewer rows might not give interpretable/rules-based solutions an advantage. I tried training on only the first 100 or 20 rows, and I got CDEFMW (15.66) and EMOPSV (15.34) as the predicted best meals. Admittedly CDEFMW shows up in the first 100 rows scoring 18 points, but not EMOPSV. Maybe a human with 20 rows could do better by coming up with a lot of hypothetical rules, but it seems tough to beat the black box.

For the last 2 DnDsci's, I've only gotten the tag-notification for the evaluation, but not the first post. I'd like to participate again, but I keep missing it. Anyone else having that problem?

I can't speak for aphyer, but I tend not to tag my own posts (mostly out of a vague "authors don't get to decide what their works are" sentiment). If it's impeding people from playing I'll make a point of tagging my D&D.Sci scenarios as D&D.Sci (when it's part of a very specific genre and also literally has the name of that genre in the post title there's no point in me being ontologically coy); hopefully that will help.

Huh. I believe I tagged them both the same way, but I don't get tag notifications on my own posts. Can someone who isn't me comment on whether they got the notification?

Thanks aphyer for making this scenario and congrats to James Camacho (and Unnamed) for their better solutions.

The underlying mechanics here are not that complicated, but uncovering details of the mechanics seemed deceptively difficult, I guess due to the non-monotonic effects, and randomness entering the model before the non-monotonicity.

It wasn't that hard though to come up with answers that would do OK, just to decipher the mechanics. I guess this is good in some ways (one doesn't just insta-solve it), but I do like to be able to come up with legible understanding while solving these, and it felt pretty hard to do so in this case, so maybe I'd prefer if there were more lower hanging fruit mechanics wise. (So maybe I don't actually prefer simple mechanics as long as some of them are easier to figure out and can be built off of?)

Regarding my own attempt to figure it out:

Thanks to abstractapplic for the comment that tipped me off to the characteristics being quantitative not just a classification, as well as to there possibly being a total food amount characteristic not just spicy/sweet. Multicore's comment on Roc possibly being in a "meaty" category also helped me in regard to the latter.

Ironically though, the model change actually lowered my performance (from 16.30) presumably due to the improved model not taking as much into account penalties for variance that were implicitly present in the earlier interactions + spicy/sweet dish numbers model. I'm still pleased that the "improved" model had more structural resemblance to the actual reality, even if the answer was worse. (the second change in my answer also lowered my expected performance (from 16.13), but I think this was basically coincidental).

I actually did consider the possibility that the characteristics might have variability, including explicitly considering a uniform distribution between bounds, but Claude did an initial probe and dismissed it. I guess I should have pushed Claude on this! But also I'd been refining imperfect models for a while and and didn't want to spend the time/effort required to develop a new model at that time if it didn't instantly seem promising.

I heavily used AI for this scenario. It helps a lot to quickly do stuff that would take a lot more effort without, and in principle should also help with exploring, but I feel like its tendency to focus on what's right in front of it can also distract me from switching approaches. Also I wish I had done more exploration of the data before asking AI to come up with a solution. As it was Claude (4.5 Opus) found the spicy/sweet categories without me having previously done so, which robbed me of having the pleasure of finding them on my own, as other commenters who mentioned them likely did.

I'm not sure if anyone appreciated my insanely long AI-generated comments, I could avoid doing that in the future if people were annoyed by it.

Fwiw, I really liked seeing someone take an AI-heavy approach to one of these.