simon — LessWrong

D&D Sci Thanksgiving: the Festival Feast Evaluation & Ruleset

Thanks aphyer for making this scenario and congrats to James Camacho (and Unnamed) for their better solutions.

The underlying mechanics here are not that complicated, but uncovering details of the mechanics seemed deceptively difficult, I guess due to the non-monotonic effects, and randomness entering the model before the non-monotonicity.

It wasn't that hard though to come up with answers that would do OK, just to decipher the mechanics. I guess this is good in some ways (one doesn't just insta-solve it), but I do like to be able to come up with legible understanding while solving these, and it felt pretty hard to do so in this case, so maybe I'd prefer if there were more lower hanging fruit mechanics wise. (So maybe I don't actually prefer simple mechanics as long as some of them are easier to figure out and can be built off of?)

Regarding my own attempt to figure it out:

Thanks to abstractapplic for the comment that tipped me off to the characteristics being quantitative not just a classification, as well as to there possibly being a total food amount characteristic not just spicy/sweet. Multicore's comment on Roc possibly being in a "meaty" category also helped me in regard to the latter.

Ironically though, the model change actually lowered my performance (from 16.30) presumably due to the improved model not taking as much into account penalties for variance that were implicitly present in the earlier interactions + spicy/sweet dish numbers model. I'm still pleased that the "improved" model had more structural resemblance to the actual reality, even if the answer was worse. (the second change in my answer also lowered my expected performance (from 16.13), but I think this was basically coincidental).

I actually did consider the possibility that the characteristics might have variability, including explicitly considering a uniform distribution between bounds, but Claude did an initial probe and dismissed it. I guess I should have pushed Claude on this! But also I'd been refining imperfect models for a while and and didn't want to spend the time/effort required to develop a new model at that time if it didn't instantly seem promising.

I heavily used AI for this scenario. It helps a lot to quickly do stuff that would take a lot more effort without, and in principle should also help with exploring, but I feel like its tendency to focus on what's right in front of it can also distract me from switching approaches. Also I wish I had done more exploration of the data before asking AI to come up with a solution. As it was Claude (4.5 Opus) found the spicy/sweet categories without me having previously done so, which robbed me of having the pleasure of finding them on my own, as other commenters who mentioned them likely did.

I'm not sure if anyone appreciated my insanely long AI-generated comments, I could avoid doing that in the future if people were annoyed by it.

D&D.Sci Thanksgiving: the Festival Feast

simon2mo20

# Feast Quality Model: Full Parameter Specification (simon note: AI generated)

## Executive Summary

This document presents two piecewise linear models for predicting feast quality based on the balance of spicy, sweet, and bulk characteristics. Both models use learned per-dish weights for each dimension, with penalties applied when totals fall outside optimal zones.

**Key Finding:** The "spicy peak" model shows marginal RMSE improvement and is **marginally significant** (p=0.034) by F-test, but **BIC favors the simpler flat model**. Given the mixed evidence, we recommend targeting the **center of the flat model's optimal zones** for maximum robustness.

---

## Model Comparison

| Metric | All-Flat Model | Spicy Peak Model |

|--------|----------------|------------------|

| RMSE | 1.9635 | 1.9583 |

| Parameters | 64 | 67 |

| AIC | 2440.9 | 2437.9 |

| BIC | 2789.5 | 2802.8 |

**Statistical Test (F-test):**

- F-statistic: 2.892

- Degrees of freedom: (3, 1647)

- **p-value: 0.034**

**Interpretation:** The peak model's improvement is marginally significant (p=0.034), but BIC favors the simpler flat model (delta = -13.3). This mixed evidence suggests caution in adopting the more complex model.

---

## Model 1: All-Flat (3-Piece Piecewise Linear)

### Structure

For each dimension (spicy, sweet, bulk):

- **Below optimal zone:** Linear penalty with slope

- **In optimal zone:** No penalty (flat)

- **Above optimal zone:** Linear penalty with slope

```

Quality = Intercept + SpicyEffect + SweetEffect + BulkEffect

SpicyEffect = -slope_low * max(0, t1 - total_spicy) - slope_high * max(0, total_spicy - t2)

SweetEffect = -slope_low * max(0, t1 - total_sweet) - slope_high * max(0, total_sweet - t2)

BulkEffect = -slope_low * max(0, t1 - total_bulk) - slope_high * max(0, total_bulk - t2)

```

### Optimal Zones and Penalties

|-----------|---------------------|---------------------|-------------|-------------|

| Spicy | 3.213 | 3.558 | 1.573 | 0.899 |

| Sweet | 3.068 | 4.461 | 1.593 | 1.273 |

| Bulk | 7.136 | 8.677 | 0.877 | 1.028 |

**Intercept:** 16.742

### Per-Dish Weights

|------|-------------|--------------|-------------|

| Ambrosial Applesauce | 0.000 | 1.456 | 0.000 |

| BBQ Basilisk Brisket | 0.821 | 0.831 | 1.404 |

| Chili Con Chimera | 1.304 | 0.078 | 1.502 |

| Displacer Dumplings | 0.052 | 0.075 | 0.086 |

| Ettin Eye Eclairs | 0.124 | 2.815 | 0.371 |

| Fiery Formian Fritters | 1.337 | 0.000 | 0.918 |

| Geometric Gelatinous Gateau | 0.176 | 2.078 | 0.583 |

| Honeyed Hydra Hearts | 0.000 | 1.537 | 0.851 |

| Killer Kraken Kebabs | 2.146 | 0.054 | 2.434 |

| Mighty Minotaur Meatballs | 0.069 | 0.181 | 2.233 |

| Opulent Owlbear Omelette | 0.008 | 0.016 | 1.788 |

| Pegasus Pinion Pudding | 0.124 | 0.621 | 0.786 |

| Roc Roasted Rare | 0.004 | 0.290 | 3.493 |

| Scorching Salamander Stew | 2.138 | 0.035 | 1.520 |

| Troll Tenderloin Tartare | 0.062 | 0.000 | 0.960 |

| Vicious Vampire Vindaloo | 3.015 | 0.148 | 1.570 |

| Wyvern Wing Wraps | 0.657 | 0.181 | 0.749 |

---

## Model 2: Spicy Peak (4-Piece for Spicy, 3-Piece for Sweet/Bulk)

### Structure

For spicy: 4-piece continuous function with potential peak

- Piece 1: spicy < t1 -> slope s1

- Piece 2: t1 <= spicy < t2 -> slope s2 (if positive, quality increases)

- Piece 3: t2 <= spicy < t3 -> slope s3 (if negative, quality decreases = peak at t2)

- Piece 4: spicy >= t3 -> slope s4

For sweet/bulk: Same 3-piece flat structure as Model 1.

### Spicy Parameters (4-Piece)

| Parameter | Value | Interpretation |

|-----------|-------|----------------|

| t1 (breakpoint 1) | 1.016 | Below this: slope s1 |

| t2 (breakpoint 2) | 2.697 | **Peak location** (if s2>0, s3<0) |

| t3 (breakpoint 3) | 5.872 | Above this: slope s4 |

| s1 (slope 1) | 2.043 | Slope for spicy < t1 |

| s2 (slope 2) | 1.856 | Slope for t1 <= spicy < t2 |

| s3 (slope 3) | -1.253 | Slope for t2 <= spicy < t3 |

| s4 (slope 4) | -0.268 | Slope for spicy >= t3 |

**Peak Location:** ~2.70 (where slope changes from positive to negative)

### Sweet/Bulk Parameters (3-Piece)

|-----------|-----------------|-----------------|-------------|-------------|

| Sweet | 3.028 | 4.591 | 1.600 | 1.299 |

| Bulk | 6.692 | 8.486 | 0.980 | 1.126 |

**Intercept:** 13.752

### Per-Dish Weights (Peak Model)

|------|-------------|--------------|-------------|

| Ambrosial Applesauce | 0.000 | 1.452 | 0.000 |

| BBQ Basilisk Brisket | 0.642 | 0.871 | 1.395 |

| Chili Con Chimera | 1.020 | 0.071 | 1.427 |

| Displacer Dumplings | 0.036 | 0.062 | 0.089 |

| Ettin Eye Eclairs | 0.070 | 2.904 | 0.410 |

| Fiery Formian Fritters | 1.079 | 0.000 | 0.973 |

| Geometric Gelatinous Gateau | 0.139 | 2.104 | 0.522 |

| Honeyed Hydra Hearts | 0.000 | 1.515 | 0.745 |

| Killer Kraken Kebabs | 1.693 | 0.042 | 2.322 |

| Mighty Minotaur Meatballs | 0.000 | 0.182 | 2.228 |

| Opulent Owlbear Omelette | 0.052 | 0.010 | 1.673 |

| Pegasus Pinion Pudding | 0.125 | 0.592 | 0.653 |

| Roc Roasted Rare | 0.000 | 0.279 | 3.229 |

| Scorching Salamander Stew | 1.691 | 0.041 | 1.614 |

| Troll Tenderloin Tartare | 0.071 | 0.000 | 0.914 |

| Vicious Vampire Vindaloo | 2.361 | 0.163 | 1.545 |

| Wyvern Wing Wraps | 0.518 | 0.157 | 0.711 |

---

## Optimal Feast Recommendations

### Degeneracy Analysis

Both models have **degenerate optimal solutions** - multiple feasts achieve the same (or nearly the same) predicted quality:

| Model | Best Score | # Degenerate Solutions |

|-------|------------|------------------------|

| Flat Model | 16.74 | 135 |

| Peak Model | 16.87 | 6 |

| In All Flat Zones | 16.74 | 119 |

To break this degeneracy, we select the feast **closest to the center** of each optimal zone, providing maximum robustness against model uncertainty.

### Target Centers (Flat Model)

|-----------|--------------|--------|-------|

| Spicy | [3.21, 3.56] | 3.386 | 0.344 |

| Sweet | [3.07, 4.46] | 3.764 | 1.393 |

| Bulk | [7.14, 8.68] | 7.907 | 1.540 |

---

## RECOMMENDED FEAST (Center-Targeted)

This feast is in all optimal zones AND closest to the center of each zone.

|-----------|-------|---------------|-----------|

| Spicy | 3.388 | 3.386 | 0.003 |

| Sweet | 3.675 | 3.764 | 0.089 |

| Bulk | 8.041 | 7.907 | 0.135 |

**Predicted Quality:** 16.74 (flat model), 16.80 (peak model)

**Normalized Distance from Center:** 0.160

**Dishes (7):**

- BBQ Basilisk Brisket

- Displacer Dumplings

- Geometric Gelatinous Gateau

- Killer Kraken Kebabs

- Opulent Owlbear Omelette

- Pegasus Pinion Pudding

- Troll Tenderloin Tartare

---

## Alternative Recommendations

### Best by Flat Model

*(135 feasts tied at this score)*

**Example:** Predicted Quality: 16.74

- Spicy: 3.388, Sweet: 3.675, Bulk: 8.041

- Dishes (7): BBQ Basilisk Brisket, Displacer Dumplings, Geometric Gelatinous Gateau, Killer Kraken Kebabs, Opulent Owlbear Omelette, Pegasus Pinion Pudding, Troll Tenderloin Tartare

### Best by Peak Model

*(6 feasts tied at this score)*

**Example:** Predicted Quality: 16.87

- Spicy: 2.695, Sweet: 3.138, Bulk: 6.864

- Dishes (5): Geometric Gelatinous Gateau, Pegasus Pinion Pudding, Roc Roasted Rare, Troll Tenderloin Tartare, Vicious Vampire Vindaloo

---

## Caveats and Limitations

1. **Peak model has mixed evidence:** The F-test p-value of 0.034 is below 0.05, but BIC favors the flat model. The evidence is not conclusive.

2. **High degeneracy:** 119 different feasts fall within all optimal zones of the flat model. The center-targeting approach provides a principled way to select among them.

3. **Weights are continuous:** The per-dish weights are fitted continuous values. Integer or simple-fraction approximations may exist but are not explored here.

4. **Variance heterogeneity:** Residual variance increases when above optimal thresholds (especially for bulk). This is not captured in the point predictions above.

5. **Model uncertainty:** All predictions have associated uncertainty (~2.0 quality points RMSE). The differences between top feasts are often smaller than this uncertainty.

---

## Summary

**Use the center-targeted recommendation** for maximum robustness:

**BBQ Basilisk Brisket, Displacer Dumplings, Geometric Gelatinous Gateau, Killer Kraken Kebabs, Opulent Owlbear Omelette, Pegasus Pinion Pudding, Troll Tenderloin Tartare**

This feast:

- Falls within all optimal zones of the flat model

- Is closest to the center of each zone (normalized distance: 0.160)

- Scores well on both flat (16.74) and peak (16.80) models

D&D.Sci Thanksgiving: the Festival Feast

simon2mo20

further followup:

So I have (had AI generate) a new better model, in which the sweet/spicy/bulk dimensions have continuous values, each independently having a piecewise linear effect on feast quality. In one version, the piecewise linear functions have a flat middle, with drop offs at higher and lower values. This results in a range in which all feasts are equivalent. In another version, there's a peak at low spiciness and we try to pick near that spiciness peak. A combined hedge version has us pick as low a spiciness value as possible within the flat range. The resulting feast pick is:

~~**['Geometric Gelatinous Gateau', 'Pegasus Pinion Pudding', 'Roc Roasted Rare', 'Troll Tenderloin Tartare', 'Vicious Vampire Vindaloo']**~~

~~Which I guess is now my pick.~~

I'm still not that satisfied with it: it's pretty ad hoc, with non-integer values, and I haven't properly understood the error sources (there's some theory that there might be a base error distribution with possibly stochastic penalties for going too high in each dimension and possibly deterministic penalties for going too low). Anyway, full model specification of these questionable models in a reply post again for ease of separately hiding it.

edit: changed my mind. I'm not going to hedge towards the alternate model with the peak, I'll instead hedge toward the center of the (claimed in the simpler model) flat zone, which seems safer. It also seems dumb because probably I don't need to go all the way to the center to be reasonably safe, so probably I could hedge to lower spiciness without penalty. Still, I'm at this point considering myself "done" (in the sense of not feeling it worth it to keep going, not in the sense that I couldn't keep going or am satisfied). I will see what I failed to account for soon enough! The centered feast, which is now my pick, is:

**BBQ Basilisk Brisket, Displacer Dumplings, Geometric Gelatinous Gateau, Killer Kraken Kebabs, Opulent Owlbear Omelette, Pegasus Pinion Pudding, Troll Tenderloin Tartare**

The reply post will contain the full description of the models discussed here.

D&D.Sci Thanksgiving: the Festival Feast

simon2mo*20

followup after reading other answers:

abstractapplic says that it's total quantity of sweetness/spiciness that matters, and I feel like that's almost certainly true. Probably the hearty/ethereal thing is quantitative too if it exists (may or may not relate to total food quality, I note Claude thought Roc had x2 food contribution, after I suggested it as a possible explanation for only Roc feasts going down to 2 ingredients and some tests seemed to corroborate.). I had some AI generated quantitative stuff (including sweet, spicy and hearty/ethereal axes), but not sure it was even used in the actual model before the AI pushed me to an interactions model that couldn't possibly be the simplest model but at least beat an initial sweet count and spicy count model. Anyway, also Multicore pointed out that total ingredients matter in a non-linear way, which the interactions model probably doesn't take into account in an optimal way (also even the spicy/sweet counts, maybe). So I'm even more sure that I'm missing the optimal model which likely does include optimal levels of spiciness and sweetness and total food quantity that all aren't based on a simple count.

Edit: I'm following up on this...

D&D.Sci Thanksgiving: the Festival Feast

simon2mo*00

# Feast Analysis Summary (simon note - AI generated, also obsolete. I've un-upvoted it which might hide it)

This document summarizes findings from analyzing feast quality variance, optimal feast selection, and ingredient distribution patterns.

---

## 1. Noise Model Analysis

### Approach

We analyzed duplicate feast configurations (same dishes, multiple observations) to understand the underlying noise distribution. With 67 n=2 pairs and 10 n=3 triplets (excluding WWW), we computed spread distributions and compared them to various hypothesized noise models using full likelihood calculations.

### Findings

**Spread distribution (observed, no WWW):**

| Spread | Count | Percentage |

|--------|-------|------------|

| 0 | 10 | 14.9% |

| 1 | 23 | 34.3% |

| 2 | 20 | 29.9% |

| 3 | 3 | 4.5% |

| 4+ | 11 | 16.4% |

**Model comparison:**

- A **U[-1,0,1] + penalty** model (one-sided negative shock) fits reasonably well

- Best-fit parameters: penalty size ~3-4, probability ~15-20%

- Chi-square test (p=0.23) does not reject this model

- However, with only n=67 duplicates, sampling noise is substantial and other models may also be consistent with the data

**Heteroscedasticity by dish:**

- Earlier analysis suggested WWW and SSS may increase variance

- Evidence is suggestive but not conclusive due to limited duplicate data when split by dish presence

### Caveats

- Comparing duplicate spreads to model residuals is complicated: residuals include model error, which inflates variance

- Bootstrap resampling from duplicates showed the mismatch between observed and predicted spread distributions could plausibly be sampling noise

- The "true" noise model remains uncertain

---

## 2. Joint Mean-Variance Model

### Approach

We fit a joint maximum likelihood model:

- **Mean model**: Linear + pairwise interactions (154 parameters)

- **Variance model**: log(σ) as linear function of dishes (18 parameters)

This avoids the problems of two-stage estimation where mean and variance are fit separately.

### Findings

**Heteroscedasticity is statistically significant:**

- Likelihood ratio test: LR = 63.2, df = 17, p < 0.0001

- AIC favors heteroscedastic model (-29), BIC is mixed (+63)

**Dishes that increase variance (σ multiplier, from heteroscedastic-only model):**

| Dish | σ multiplier |

|------|--------------|

| Honeyed Hydra Hearts | 1.22x |

| Roc Roasted Rare | 1.17x |

| Killer Kraken Kebabs | 1.14x |

| Scorching Salamander Stew | 1.14x |

| Chili Con Chimera | 1.14x |

**Dishes that decrease variance:**

| Dish | σ multiplier |

|------|--------------|

| Displacer Dumplings | 0.90x |

Predicted σ ranges from 1.50 to 3.85 depending on dish combination.

**Note:** When count features are added to the model, some of these effects are absorbed by the count terms (e.g., Roc's effect drops from 1.17x to 1.03x). See Section 5 for the combined model's variance estimates.

### Caveats

- The optimizer did not fully converge for some regularization values

- Effect sizes are modest (most dishes change σ by <20%)

- With 172 total parameters and 1714 observations, overfitting is a concern despite regularization

---

## 3. Pairwise Interaction Structure

### Approach

We categorized dishes into flavor groups and examined whether interaction coefficients align with these categories.

**Well-supported categories:**

- SPICY: Chili Con Chimera, Fiery Formian Fritters, Killer Kraken Kebabs, Scorching Salamander Stew, Vicious Vampire Vindaloo

- SWEET: Ambrosial Applesauce, Ettin Eye Eclairs, Geometric Gelatinous Gateau, Honeyed Hydra Hearts, Pegasus Pinion Pudding

**Remaining dishes** (no clear category): BBQ Basilisk Brisket, Displacer Dumplings, Mighty Minotaur Meatballs, Opulent Owlbear Omelette, Roc Roasted Rare, Troll Tenderloin Tartare, Wyvern Wing Wraps

### Findings

**Mean interaction coefficient by category:**

| Interaction Type | N | Mean Coef |

|------------------|---|-----------|

| SPICY-SPICY | 10 | **-2.29** |

| SWEET-SWEET | 10 | **-1.62** |

| SPICY-SWEET | 25 | -0.09 |

**Statistical test:** Same-category (SPICY or SWEET) vs other interactions: t = -6.80, p < 0.0001

**Most negative interactions:**

1. Ettin Eye Eclairs × Geometric Gelatinous Gateau: -4.33 (SWEET-SWEET)

2. Killer Kraken Kebabs × Vicious Vampire Vindaloo: -4.17 (SPICY-SPICY)

3. Scorching Salamander × Vicious Vampire Vindaloo: -3.57 (SPICY-SPICY)

### Validation of SPICY/SWEET Categories

We verified the category assignments by checking each dish's mean interaction with SPICY vs SWEET groups:

- All SPICY dishes have much more negative interactions with other SPICY dishes than with SWEET dishes

- All SWEET dishes have much more negative interactions with other SWEET dishes than with SPICY dishes

- All 10 SPICY-SPICY interactions are negative (-0.84 to -4.17)

- All 10 SWEET-SWEET interactions are negative (-0.04 to -4.33)

**Pegasus Pinion Pudding** is the weakest SWEET member (mean interaction with SWEET group only -0.58), but still fits the pattern.

### Remaining Dishes

The 7 dishes not in SPICY or SWEET do not form clear additional categories:

- **Roc Roasted Rare** has negative interactions with almost everything (mean -1.13 with other remaining dishes)

- **Displacer Dumplings** is the only dish with mostly neutral/positive interactions (+0.06 mean)

- The others (BBQ, Mighty Minotaur, Opulent Owlbear, Troll, Wyvern) have mixed negative interactions but don't cluster into coherent groups

**Roc Roasted Rare** has negative interactions with *both* SPICY (-1.66) and SWEET (-0.65) groups, suggesting it may be a "universal intensifier" that doesn't combine well with strongly-flavored dishes.

### Interpretation

- Combining multiple SPICY or multiple SWEET dishes incurs a quality penalty

- Cross-category combinations (SPICY + SWEET) are roughly neutral

- No strong evidence for additional flavor categories beyond SPICY and SWEET

---

## 4. Model with Count Features

### Approach

We extended the pairwise model to include count features:

- Total number of dishes

- Number of SPICY dishes (and squared term)

- Number of SWEET dishes (and squared term)

This allows the model to capture effects like "too many spicy dishes" beyond pairwise interactions.

### Findings

**Model improvement is statistically significant:**

- Likelihood ratio test: χ² = 232.6, df = 9, p < 0.0001

- AIC improved by 214.6

**Marginal effect of adding SPICY dishes:**

| Change | Effect on Mean |

|--------|----------------|

| 0→1 SPICY | **+1.84** |

| 1→2 SPICY | +0.12 |

| 2→3 SPICY | **-1.60** |

| 3→4 SPICY | -3.32 |

**Marginal effect of adding SWEET dishes:**

| Change | Effect on Mean |

|--------|----------------|

| 0→1 SWEET | +0.02 |

| 1→2 SWEET | **-1.05** |

| 2→3 SWEET | -2.12 |

**Marginal effect of total dish count:**

| Change | Effect on Mean |

|--------|----------------|

| 5→6 total | +0.74 |

| 6→7 total | +0.03 |

| 7→8 total | -0.68 |

### Interpretation

- First SPICY dish adds significant value; second is roughly neutral; third and beyond hurt quality

- SWEET dishes show diminishing returns starting from the second dish

- Optimal total dish count appears to be around 6-7

### Residual Pattern

Adding count features modestly improves the model:

- RMSE: 2.025 → 2.005 (1% improvement)

- Skew: -0.111 → -0.080 (slightly more symmetric)

- Systematic biases by SPICY/SWEET count are reduced but not eliminated

---

## 5. Optimal Feast Selection

### Approach

Using the combined heteroscedastic + count-features model (which models both dish-dependent variance and SPICY/SWEET count effects), we searched all 5-7 dish combinations to optimize:

- (a) Expected quality (mean)

- (b) 90th percentile (high upside)

- (c) 10th percentile (reliable floor)

### Results

| Objective | Mean | σ | Q10 | Q90 |

|-----------|------|---|-----|-----|

| Best Mean | **17.51** | 2.09 | 14.83 | 20.19 |

| Best Q90 (upside) | 17.40 | 2.48 | 14.22 | **20.58** |

| Best Q10 (reliable) | 17.48 | 1.89 | **15.05** | 19.91 |

**(a) Best Mean:**

- Ambrosial Applesauce

- BBQ Basilisk Brisket

- Geometric Gelatinous Gateau

- Opulent Owlbear Omelette

- Roc Roasted Rare

- Vicious Vampire Vindaloo

(6 dishes: 1 SPICY, 2 SWEET)

**(b) Best Q90 (high upside):**

- Ambrosial Applesauce

- BBQ Basilisk Brisket

- Fiery Formian Fritters

- Honeyed Hydra Hearts

- Killer Kraken Kebabs

- Opulent Owlbear Omelette

- Pegasus Pinion Pudding

(7 dishes: 2 SPICY, 3 SWEET — includes high-variance Honeyed Hydra Hearts)

**(c) Best Q10 (reliable):**

- Ambrosial Applesauce

- BBQ Basilisk Brisket

- Geometric Gelatinous Gateau

- Roc Roasted Rare

- Troll Tenderloin Tartare

- Vicious Vampire Vindaloo

(6 dishes: 1 SPICY, 2 SWEET — includes low-variance Troll Tenderloin)

**Core dishes (in all three):**

- Ambrosial Applesauce

- BBQ Basilisk Brisket

- Vicious Vampire Vindaloo

**Dishes in Best Mean and Best Q10 (but not Q90):**

- Geometric Gelatinous Gateau

- Roc Roasted Rare

**High-variance dishes (for Q90 upside):** Honeyed Hydra Hearts, Killer Kraken Kebabs, Fiery Formian Fritters

**Low-variance dish (for Q10 reliability):** Troll Tenderloin Tartare (σ x0.96), Displacer Dumplings (σ x0.88, but not in optimal)

### Variance Effects by Dish

The model estimates these variance multipliers per dish:

| Dish | σ multiplier | Notes |

|------|--------------|-------|

| Honeyed Hydra Hearts | 1.07x | Highest variance |

| Chili Con Chimera | 1.06x | |

| Opulent Owlbear Omelette | 1.06x | |

| Roc Roasted Rare | **1.03x** | Slight increase, not decrease |

| Troll Tenderloin Tartare | 0.96x | Lower variance |

| Displacer Dumplings | 0.88x | Lowest variance |

**Note on Roc:** Earlier analysis suggested Roc might decrease variance, but the combined model shows Roc has a **slight variance increase** (σ x1.03). The confusion arose because:

- The heteroscedastic-only model found Roc σ x1.17

- With count features, some of Roc's apparent variance effect is absorbed by the count terms

- Roc appears in Best Mean and Best Q10 due to its high base quality value, not low variance

### Model Sensitivity

Different model specifications produce somewhat different optimal feasts, suggesting substantial model uncertainty. However, core dishes (Ambrosial, BBQ, Vicious Vampire) appear consistently.

### Caveats

- None of these optimal feasts appear exactly in the data

- Model predictions are extrapolations; actual performance may differ

- The optimizer showed convergence warnings, though restarting improved the solution

---

## 6. Validation Against Observed Data

### Highest quality feasts in data (Quality = 20)

- Ambrosial, Chili, Honeyed Hydra, Killer Kraken, Mighty Minotaur, Roc (6 dishes)

- Ambrosial, Honeyed Hydra, Killer Kraken, Roc, Scorching Salamander (5 dishes)

- Ambrosial, Chili, Honeyed Hydra, Roc, Scorching Salamander (5 dishes)

**Notable:** These include multiple SPICY dishes, which the model predicts should have negative interactions. This could indicate:

1. These are lucky high-variance outcomes (model predicts lower mean but high upside)

2. The model is missing something

3. Small sample of quality=20 feasts (only 4 observations)

### Highest-mean duplicate configurations

| Mean | Qualities | Dishes |

|------|-----------|--------|

| 18.0 | [17, 19] | Ambrosial, Geometric, Opulent, Roc, Vicious Vampire |

| 18.0 | [18, 18] | Ambrosial, Honeyed Hydra, Killer Kraken, Roc, Vicious Vampire |

| 17.5 | [16, 19] | Chili, Ettin, Opulent, Roc, Scorching |

**Roc Roasted Rare** appears in nearly all high-quality duplicate configurations, and the combined model's Best Mean and Best Q10 feasts now include it. This suggests Roc's high base value compensates for its negative interactions.

---

## 7. Ingredient Distribution Patterns

### Global Variance Constraint

The total number of ingredients per feast has **compressed variance** compared to what independent selection would produce.

| Metric | Observed | Expected (if independent) |

|--------|----------|---------------------------|

| Mean ingredients | 5.37 | 5.37 |

| Std dev | 1.27 | 1.89 |

| Variance ratio | 0.43 | 1.00 |

Ingredient selection appears to target a relatively fixed total count (~5-6 ingredients), creating **negative correlations between all ingredients**. When one ingredient is present, others are slightly less likely.

### Roc Roasted Rare "Counts as 2"

When treating Roc as contributing 2 to the effective count:

- Minimum effective count = 3 (never violated)

- Mean effective count = 5.88

- The variance compression is consistent with targeting this effective count

### No Hard Constraints on Ingredient Co-occurrence

After properly adjusting for the variance constraint and multiple testing, **no statistically significant hard constraints** were found on which ingredients can appear together.

**The "4-tuple constraint" is likely noise:**

We initially found that {Ambrosial Applesauce, Ettin Eye Eclairs, Honeyed Hydra Hearts, Wyvern Wing Wraps} never all appear together (max 3 of 4).

| Test | p-value |

|------|---------|

| Raw (vs independence) | 0.0006 |

| Bonferroni-corrected (2380 4-tuples) | 1.0 |

| Vs variance-constrained simulation | 0.016 |

With 2380 possible 4-tuples, finding one with this property is expected ~43% of the time by chance. This is **probably not a real constraint**.

### Duplicate Feasts

110 of 1714 feasts are duplicates (6.4%), with 1604 unique combinations.

|------|---------------|-------------------|-------------------------|

| 2 | 4 | 0.3 | 0.4 |

| 3 | 18 | 5.7 | 7.5 |

| 4 | 50 | 22.8 | 31.3 |

| 5 | 29 | 20.6 | 29.6 |

| 6 | 7 | 8.3 | 12.3 |

Small feasts (size 2-4) have **more duplicates than expected** even after accounting for non-uniform ingredient probabilities. The variance constraint likely causes this: small feasts are in the tail of the distribution, so only the most probable combinations occur.

### Positive Correlations (Unusual)

Under a pure variance constraint, all ingredient correlations should be negative. However, several **positive correlations** were found:

| Pair | Correlation |

|------|-------------|

| Displacer Dumplings + Mighty Minotaur Meatballs | +0.050 |

| Ambrosial Applesauce + Roc Roasted Rare | +0.028 |

| Ambrosial Applesauce + Scorching Salamander Stew | +0.024 |

| Displacer Dumplings + Fiery Formian Fritters | +0.021 |

| Displacer Dumplings + Killer Kraken Kebabs | +0.018 |

| Fiery Formian Fritters + Killer Kraken Kebabs | +0.018 |

| Ettin Eye Eclairs + Roc Roasted Rare | +0.017 |

These pairs appear together more often than the variance constraint would predict. This could indicate thematic groupings, ingredient affinities, or statistical noise (correlations are small, ~0.02-0.05).

### SPICY and SWEET Groups: Selection vs Quality

**SPICY**: Killer Kraken Kebabs, Scorching Salamander Stew, Vicious Vampire Vindaloo, Fiery Formian Fritters, Chili Con Chimera

**SWEET**: Ambrosial Applesauce, Ettin Eye Eclairs, Geometric Gelatinous Gateau, Honeyed Hydra Hearts

Do these have co-occurrence constraints?

- **SPICY**: All 5 appear together in 5 feasts (no hard constraint)

- **SWEET**: All 4 appear together in 6 feasts (no hard constraint)

**Conclusion**: These groups affect **quality** (pairwise conflicts reduce quality) but do NOT create hard constraints on ingredient selection. The slight under-representation is explained by the variance constraint.

### Time-Based Changes

Roc Roasted Rare and Honeyed Hydra Hearts show apparent frequency changes over time:

| Ingredient | Raw p-value | Permutation p-value |

|------------|-------------|---------------------|

| Roc | 0.0006 | 0.032 |

| Hydra | 0.004 | 0.137 |

With 17 ingredients, some apparent changes are expected by chance. The permutation p-values (accounting for multiple testing over split points) are weak. There may be a real change in Roc frequency, but the evidence is weak after accounting for multiple testing.

---

## 8. Summary of Key Findings

### Quality Model

1. **Noise structure**: Evidence is consistent with a discrete noise model (U[-1,0,1] + occasional penalty), but sample size limits certainty

2. **Heteroscedasticity**: Statistically significant; some dishes (Honeyed Hydra Hearts, Roc) increase variance by ~15-20%

3. **Interaction structure**: SPICY-SPICY and SWEET-SWEET combinations have strong negative interactions (~-2 to -4 quality points); cross-category mixing is neutral. The SPICY/SWEET category assignments are well-supported by the interaction data.

4. **Count effects**: Adding count features significantly improves the model. First SPICY dish adds ~1.8 points; 2+ SPICY or 2+ SWEET dishes show diminishing/negative returns. Optimal total count is ~6-7 dishes.

5. **Optimal feasts**: Core dishes include Ambrosial Applesauce, BBQ Basilisk Brisket, and Vicious Vampire Vindaloo. Best Mean and Best Q10 feasts include Roc Roasted Rare; Best Q90 swaps in high-variance dishes instead. Different model specifications produce somewhat different recommendations.

6. **Roc Roasted Rare**: Appears frequently in high-quality observations and is included in Best Mean and Best Q10 optimal feasts. Despite negative interactions with both SPICY and SWEET groups, its high base value compensates.

### Ingredient Selection

1. **Target effective count**: Selection targets ~5.9 effective ingredients (where Roc counts as 2)

2. **Variance compression**: The selection process constrains variance to ~43% of what independence would produce

3. **No hard constraints**: Any ingredient combination is possible; apparent constraints are explained by variance compression

4. **Possible weak affinities**: A few ingredient pairs (Displacer+Mighty, Ambrosial+Roc) may have slight positive associations, but evidence is weak

5. **Possible time change**: Roc frequency may have increased around index 1134, but evidence is marginal

---

## Limitations

### Quality Model

- Model has many parameters relative to data (160-181 vs 1714)

- Optimal feasts are model extrapolations, not observed configurations

- Different model specifications produce different optimal feasts

- Noise model analysis limited by small duplicate sample (n=67)

- Some optimizer convergence issues in joint model fitting

### Ingredient Distribution

- Sample size (1714) limits power to detect subtle effects

- Simulation-based null models may not perfectly match true data-generating process

- Small positive correlations (~0.02-0.05) are at the edge of detectability

Whelp that was even longer than I feared, and has some formatting issues. For completeness I'll mention that there was some speculation of a hearty/ethereal aspect as well the sweet and spicy categories, where e.g. (IIRC) Roc is hearty and Vampire is ethereal.

D&D.Sci Thanksgiving: the Festival Feast

simon2mo*20

Thanks for the extension aphyer.

And after the extension, I wound up just doing more LLM analysis (having already done some beforehand) and while it's probably better than nothing, I would not be surprised at all if there were a simple ruleset which wasn't found due to being outside the considered hypothesis space. Anyway, with some trepidation I'll go with Claude's Q10 feast from the summary by Claude 4.5 Opus, which I'll put in a reply comment so it can be separately collapsed. That feast is:
~~Ambrosial Applesauce, BBQ Basilisk Brisket, Geometric Gelatinous Gateau, Roc Roasted Rare, Troll Tenderloin Tartare, Vicious Vampire Vindaloo.~~

Edit: now
~~**['Geometric Gelatinous Gateau', 'Pegasus Pinion Pudding', 'Roc Roasted Rare', 'Troll Tenderloin Tartare', 'Vicious Vampire Vindaloo']**~~
actually
**BBQ Basilisk Brisket, Displacer Dumplings, Geometric Gelatinous Gateau, Killer Kraken Kebabs, Opulent Owlbear Omelette, Pegasus Pinion Pudding, Troll Tenderloin Tartare**
following revised model in followup comment.

AI summary in reply comment:

Don't let people buy credit with borrowed funds

simon3mo20

In my view:

facilitation of stag-hunt-like cooperation is really useful. Because cooperating to do stuff beyond the capabilities of individuals is useful but hard.
the dynamic you discuss in the post applies to stag hunt facilitation because their success depends on willingness of others to provide more resources (up to some point where they can generate more)
the difference between, e.g., Theranos and standard entrepreneurship does not lie in the dynamic you discuss in the post. It lies in how egregiously Elizabeth Holmes was lying relative to the standard level of misleadingness. (and of course, more honesty would be better...)
It would of course be very valuable to determine if a stag hunt will pay off or will fail! But the difference between the two does not lie in the dynamic you discuss in the post (which applies to both ultimately successful and unsuccessful stag hunts).

Don't let people buy credit with borrowed funds

simon3mo112

aimed in a relatively broad sense at people who care about how well societies and groups of people function.

I don't think allowing financial fraud is a thing current institutions mostly want? The difficulty is more in figuring out how to stop it without stopping legitimate activity as well (A lot of successful entrepreneurship will look kind of a lot like this, I think). If you are calling for normal speculative investment to be banned, it's very likely not worth the loss of innovation. (It may make sense perhaps to be more strict about what level of falsehood leads to fraud prosecution, but I would keep it to banning false claims).

D&D.Sci: Serial Healers

simon5mo20

My accusations, at least so far:

Danny Nova for curing:

Babblepox (always present in same sector)
Bumblepox, Scramblepox (always present in same sector OR Azeru in adjacent sector
Gurglepox (always present in same sector OR Dankon Ground in opposite sector)
Chucklepox (Danny Nova in same sector for about half of cures. All other cases Lomerius Xardus was present somewhere (todo: investigate if more patterns than mere presence))
Rumblepox (suspiciously high ratio of presence/absence in Calderia when cured, but plenty more cases unexplained - TODO: explain more)
~~Disquietingly Serene Bowel Syndrome (on weak evidence, see below)~~

Azeru for curing Bumblepox, Scramblepox (see Danny Nova)

Lomerius Xardus for curing Chucklepox (see Danny Nova)

Boltholopew and Moon Finder, collectively, for curing Disease Syndrome, Parachordia, Problems Disorder (each one positioned in different adjacent sectors to the curing event)

Tehami Darke on weak evidence (see below) for curing Disquetingly Serene Bowel Syndrome

Dankon Ground for curing;

Gurglepox (present in opposite sector OR Danny Nova in same sector)
Mildly But Persistently Itchy Throat (always present in opposite sector)

Zancro for curing Scraped Elbow, Scraped Knee (always present in same sector)

Nettie Silver for curing Smokesickness (always present in same sector. It seems she is always present in Calderia, but her location and the Smokesickness curing location, when it occurs, do vary)

I have not figured out The Shivers.

Disquietingly Serene Bowel Syndrome seems tricky. Cures only started on Day 390. Tehami Darke came to Calderia on Day 162, Lomerius Xardus first arrived on day 494, Zeledin Zura first arrived on day 427, Gouberi first arrived on day 397, Ricewined first arrived on day 994. None of which is particularly suggestive of a causal connection though the p-values would be very low assuming (falsely) that each day was independent. Cures seem more likely to occur in the sector where Tehami Darke is present, which is pretty weak without further investigation (e.g. is he hanging out in some sector where it's more likely to occur anyway? I haven't checked). Conditioned on day 390+, the average number of cures is slightly higher on days when Danny Nova is present, also weak, but the prior for Danny being involved is high, so ~~might as well accuse him as well~~. (edit: no, seems there is no upside to this accusation given the scoring rule and the high confidence Danny Nova has been curing other stuff)

D&D.Sci Tax Day: Adventurers and Assessments Evaluation & Ruleset

simon9mo*20

Interesting link on symbolic regression. I actually tried to get an AI to write me something similar a while back^[1] (not knowing that the concept was out there and foolishly not asking, though in retrospect it obviously would be).

From your response to kave:

calculate a quantity then use that as a new variable going forward

In terms of the tree structure used in symbolic regression (including my own attempt), I would characterize this as wanting to preserve a subtree and letting the rest of the tree vary.

Possible issues:

If the coding modifies the trees leaf-first, trees with different roots but common subtrees aren't treated as close to each other. This is an issue that my own version would likely have had even if actually implemented^[2]. However, I think PySR might at least partially address this issue (It uses genetic programming and the pictures in the associated paper seem to indicate that it is generating trees which at least sometimes preserve subtrees.) (Though the genetic programming approach is likely to make it hard to find the very simplest solutions in practice imo.^[3])
Even if you are treating trees with common subtrees as close to each other, if your evaluation of trees is only comparing final calculated values on the entire dataset, then it's hard to make the call "I know this subtree is important even if I don't know the rest of the tree" because the results are not likely to be all that close unless you already have a reasonable guess for the rest of the tree. One partial (heh) answer might be to award part marks to solutions that work well for some of the data even if wildly off for other parts. Careful thinking might be required to do this in a way that doesn't backfire horribly, though. Hmm - or maybe you CAN do that in the existing paradigm by including if/then nodes in the tree? Say, a node that has three child nodes/subtrees, and chooses between two of them based on the value of the third? And then (in some genetic-programming-like approach perhaps) explore what happens if you copy those subtrees elsewhere, or existing subtrees into new if-then nodes?) (I can imagine the horrific unreadable mess already though...)

edited to add: it might be more appropriate to say that I had been planning on asking an AI to code something, but the initial prototype was sufficiently lame and gave me enough insight into the difficulties ahead I didn't continue. Claude chat link if anyone's interested.

edited to further add: hmm, what you are wanting ("new variable") is probably not just preserving a subtree, but for the mutation system to be able to copy that subtree to other parts of the tree (and the complexity calculator to not give to much penalty to that, I guess). Interestingly, it seems that PySR's backend at least (SymbolicRegression.jl) does have the capability to do this already, using a "form_random_connection!" mutation function that apparently allows the same subtree to appear as child of multiple parents, making a DAG instead of a tree. In general, I've been pretty impressed looking at SymbolicRegression.jl. Maybe other symbolic regression software is as feature-rich, but haven't checked.

^{^}
Apparently November 2024. Feels longer ago somehow.
^{^}
I hadn't actually gone beyond breadth-first search though.
^{^}
This is informed by (a tiny amount of) practical experience. After SarahNibs' comment suggested genetic programming would have worked on the "Arena of Data", I attempted genetic programming on it and on my initial attempt got ... a horrific unreadable mess. Maybe it wasn't "halfway decently regularized" but I updated my intuition to say: complicated ways to do things so greatly outnumber the simple ways that anything too reliant on randomness is not likely to find the simple way.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments