D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset]

abstractapplic

This is a followup to the D&D.Sci post I made ten days ago; if you haven’t already read it, you should do so now before spoiling yourself.

Here is the web interactive I built to let you evaluate your solution; below is an explanation of the rules used to generate the dataset (my full generation code is available here, in case you’re curious about details I omitted). You’ll probably want to test your answer before reading any further.

Ruleset

Turtle Types

There are three types of turtle present in the swamp: normal turtles, clone turtles, and vampire turtles.

Clone turtles are magically-constructed beasts who are mostly identical. They always have six shell segments, bizarrely consistent physiology, and a weight of exactly 20.4lb. Harold is a clone turtle.

Vampire turtles can be identified by their gray skin and fangs. They’re mostly like regular turtles, but their flesh no longer obeys gravity, which has some important implications for your modelling exercise. Flint is a vampire turtle.

Turtle characteristics

Age

Most of the other factors are based on the hidden variable Age. The Age distribution is based on turtles having an Age/200 chance of dying every year. Additionally, turtles under the age of 20 are prevented from leaving their homes until maturity, meaning they will be absent from both your records and the Tyrant’s menagerie.

Wrinkles

Every non-clone turtle has an [Age]% chance of getting a new wrinkle each year.

Scars

Every non-clone turtle has a 10% chance of getting a new scar each year.

Shell Segments

A non-clone turtle is born with 7 shell segments; each year, they have a 1 in [current number of shell segments] chance of getting a new one.

Color

Turtles are born green; they turn grayish-green at some point between the ages of 23 and 34, then turn greenish-gray at some point between the ages of 35 and 46.

Miscellaneous Abnormalities

About half of turtles sneak into the high-magic parts of the swamp at least once during their adolescence. This mutates them, producing min(1d8, 1d10, 1d10, 1d12) Miscellanous Abnormalities.

This factor is uncorrelated with Age in the dataset, since turtles in your sample have done all the sneaking out they’re going to. (Whoever heard of a sneaky mutated turtle not being a teenager?)

Nostril Size

Nostril Size has nothing to do with anything (. . . aside from providing a weak and redundant piece of evidence about clone turtles).

Turtle Weight

The weight of a regular turtle is given by the sum of their flesh weight, shell weight, and mutation weight. (A vampire turtle only has shell weight; a clone turtle is always exactly 20.4lb)

Flesh Weight

The unmutated flesh weight of a turtle is given by (20+[Age]+[Age]d6)/10 lb.

Shell Weight

The shell weight of a turtle is given by (5+2*[Shell Segments]+[Shell Segments]d4)/10 lb. (This means that shell weight is the only variable you should use when calculating the weight of a vampire turtle.)

Mutation Weight

A mutated turtle has 1d(20*[# of Abnormalities])/10 lb of extra weight. (This means each abnormality increases expected weight by about 1lb, and greatly increases expected variance).

Strategy

The optimal^[1] predictions and decisions are as follows:

Turtle	Average Weight (lb)	Optimal Prediction (lb)
Abigail	20.1	22.5
Bertrand	17.3	18.9
Chartreuse	22.7	25.9
Dontanien	19.3	21.0
Espera	16.6	18.0
Flint	6.8	7.3
Gunther	25.7	30.6
Harold	20.4	20.4
Irene	21.5	23.9
Jacqueline	18.5	20.2

Leaderboard

Player	EV(gp)
Perfect Play (to within 0.1lb)	1723.17
gjm	1718.54
Malentropic Gizmo	1718.39
aphyer	1716.57
simon	1683.60
qwertyasdef	1674.54
Yonge^[2]	1420.00
Just predicting 20lb for everything	809.65

Reflections

The intended theme of this game was modelling in the presence of asymmetric payoffs. When mistakes in one direction are ‘punished’ more stringently than mistakes in another – by the conditions at play, or by local Mad Tyrants – it becomes reasonable to provide predictions slanted in the safer direction; and when the uncertainty of a given prediction is greater, the optimal size of this skew grows proportionately.

(This isn’t even getting into the really interesting kinds of asymmetric payoffs. For example, when deciding how much to bid in a blind auction, bidding much too frugally has the same ‘punishment’ as bidding slightly too frugally – you just don’t get the lot – whereas large mistakes in the too-generous direction continue to hurt you relative to small mistakes^[3].)

The actual theme, from my point of view, turned out to be ‘diminishing returns’^[4]: successful players’ scores were very close together (congratulations in particular to gjm, Malentropic Gizmo, and aphyer), with each extra epicycle of their reasoning resulting in markedly less benefit. I think this is ‘fair’ in the sense that any coherent system is ‘fair’, but suspect engineering a more consistently steep input-output curve would have made for a better game. Feedback on this point, and on all other points, would be greatly appreciated.

Scheduling

My current, tentative plan is to run the next challenge from the 19^th to the 29^th of this month, but I could very easily be persuaded to delay its release if that would be inconvenient for anyone or if enough people believe there should be a larger gap between releases. Please share your thoughts!

ETA: I have once again underestimated how long a making a challenge will take, and overestimated how much time and energy I will be able to devote to it. I now expect it to be ready by the 26th; I don't know how accurate I expect this expectation to be.

^{^}
At least, according to my Bayesian turtle-weight-guess-optimization code; let me know if you find any bugs.
^{^}
Yonge conscientiously objected to skewing estimates in an attempt to squeeze more money from a Mad Tyrant, reasonably deciding that ~300gp-in-expectation isn't worth sacrificing your reputation and intellectual integrity (especially when you already have ~1400gp-in-expectation incoming).
^{^}
A significant part of my day job is attempting to accommodate this effect.
^{^}
On a meta level, this theme is reversed. There were a lot of minor changes I made to the premise which resulted in significant improvements (originally, the Tyrant only had the one turtle; this would have sucked). In retrospect, I can see a lot of very easy ways I could have made this game slightly better still (the most galling: I could have increased the variation in age - and thereby made modellable effects more meaningful - by changing one character in the generation code), but by that point I was tired of tweaking the traits and tendencies of the Tyrant’s turtles. I guess the lesson here is “make the core premise & associated codebase simple and strong enough that you don’t end up prematurely experiencing Tweak Fatigue”?

[-]Malentropic Gizmo4mo40

I enjoyed the exercise, thanks!

My solution for the common turtles was setting up the digital cradle such that the mind forged inside was compelled to serve my interests (I wrote a custom loss function for the NN). I used 0.5*segments+x for the vampire one (where I used the x which had the best average gp result for the example vampire population). Annoyingly, I don't remember what I changed between my previous and my current solution, but the previous one was much better 🥲

Looking forward to the next challenge!

[-]abstractapplic4mo30

You're welcome, and thank you for playing.

(I wrote a custom loss function for the NN)

I'm curious how you defined that. (i.e. was it "gradient = x for rows where predicted>actual, gradient = -8x for rows where actual>predicted", or something finickier?)