This is a D&D.Sci scenario: a puzzle where players are given a dataset to analyze and an objective to pursue using information from that dataset.

You steel your nerves as the Mad Tyrant[1] peers at you from his throne. In theory, you have nothing to worry about: since the Ninety Degree Revolution last year, His Malevolence[2] has had his power sharply curtailed, and his bizarre and capricious behavior has shifted from homicidally vicious to merely annoying. So while everyone agrees he’s still getting the hang of this whole “Constitutional Despotism”[3] thing, and while he did drag you before him in irons when he heard a Data Scientist was traveling through his territory, you’re still reasonably confident you’ll be leaving with all your limbs attached (probably even to the same parts of your torso).

Your voice wavering only slightly, you politely inquire as to why you were summoned.

He tells you that he needs help with a scientific problem: he’s recently acquired several pet turtles (by picking at random from a nearby magic swamp), and wants to know how heavy each of them is, without putting his Precious Beasts[4] to the trouble of weighing them. To encourage you to bring your best, he will be penalizing you 10gp for each pound you overestimate by-

(An advisor with robes like noontime in summer rushes to the Tyrant’s side and whispers something urgent in his ear before scuttling away.)

-which will be deducted from the 2000gp stipend he will of course be awarding you for undertaking this task, because compelling unpaid labor from foreign nationals is no longer the done thing.

(The bright-robed advisor visibly sighs in relief.)

However, he snarls with a sudden ferocity, if you dare to insult his turtles by underestimating their weight, he will have you executed-

(An advisor with robes like the space between stars rushes to the Tyrant’s other side and whispers something urgent in his other ear before scuttling away.)

-that is, he’ll have you maimed-

(The Tyrant looks briefly to the dark-robed advisor, who shakes their head sadly.)

-lightly tortured-

(Another sad head-shake.)

-he’ll deduct 80gp-

(An encouraging gesture.)

-for each pound you underestimate by-

(An approving nod.)

-and he’ll also commission an unflattering portrait of you to hang in his throne room.

(The dark-robed advisor gives the Tyrant a big smile and two thumbs up.)

The meeting apparently having been concluded to his satisfaction, the guards see you out. Some time, some help, some adverse reactions to ambient magic[5], and several waterlogged sets of clothes later, you have a dataset representing a random sample[6] of the other turtles in that swamp. You also convince some palace officials to give reliable testimony on some characteristics of the Tyrant’s pets, though no-one is willing to provide any actual measurements[7].

What numbers will you give the Tyrant?


I’ll post an interactive you can use to test your choices, along with an explanation of how I generated the dataset, sometime on Monday 8th April Tuesday 9th April or Wednesday 10th April. I’m giving you nine days, but the task shouldn’t take more than an evening or two; use Excel, R, Python, Tiger Instincts, or whatever other tools you think are appropriate. Let me know in the comments if you have any questions about the scenario.

If you want to investigate collaboratively and/or call your choices in advance, feel free to do so in the comments; however, please use spoiler blocks or rot13 when sharing inferences/strategies/decisions, so people intending to fly solo can look for clarifications without being spoiled.


Notes:

  • You may assume that you are wealthy and courageous enough to prioritize maximizing Expected Value, though the value you assign to providing honest estimates and to the possibility of being unflatteringly depicted is entirely up to you.)
  • To provide an example of the scoring function: if you predict 10.1lb for a turtle which is actually 11.3lb, you'll be penalized 96gp; if you predict 13.7lb for that same turtle, you'll be penalized 24gp.
  1. ^

    You checked, that’s his actual job title.

  2. ^

    You checked, that’s his actual preferred term of address.

  3. ^

    You checked, that’s the actual name of their new system of government: between this and the fact they’re voluntarily keeping him on the throne, you’re beginning to suspect this population deserves their ruler.

  4. ^

    You can hear him enunciate the capital letters.

  5. ^

    There’s so much thaumatic interference in the swamp, you wouldn’t be surprised if these creatures’ biology was completely uncorrelated with that of ordinary turtles.

  6. ^

    Of course, you made sure to mark each turtle, to avoid counting it twice.

  7. ^

    While gathering this information, you ask some courtiers how the Tyrant could determine the accuracy of your estimates. They reply that the Tyrant will simply weigh his turtles: while His Malevolence is too honorable to subject his turtles to measurement merely for the sake of satisfying his curiosity, he will absolutely do it in order to determine whether a suspicious outsider is impugning his pets. They don’t seem to see anything amiss with this logic; your suspicion that they deserve their ruler swiftly matures into a conviction.

New to LessWrong?

New Comment
18 comments, sorted by Click to highlight new comments since: Today at 12:30 AM

I did some initial exploration of the dataset and came to similar conclusions as others on the thread.

I then decided this was a good excuse to finally learn how to use LightGBM, one of the best-in-class tools for creating decision trees, and widely used in the data science industry. In other words, let's make the computer do the fun part!

The goal was to output something like:

If color = blurple: weight is 1234
Else
  If segments > 42: weight is 2345
  Else weight is 3456

What I actually got:

Fangs: ~17 pounds
No fangs: a big tree that outputs in the range of 18-19.5 pounds

I used default settings, transformed color/fangs/nostrils into 0-N categorical variables and marked them accordingly, then basically did "give me a regression with a single tree and 15 leaves".

As others have mentioned, all gray turtles have fangs and weigh noticeably less (4-7 pounds), so this is obvious nonsense.

This tool is supposedly the non-AI state-of-the-art. It confidently fails with out-of-the-box settings. I remain baffled as to how anyone in tech ever gets anything done, myself included.

I think this is because

LightGBM and its kin are tools for creating decision forests, not decision trees. If you use standard hyperparameters while creating a single-tree model then they will under-train, resulting in the "predict in a way that's correlated with reality but ridiculously conservative in its deviations from the average" behavior you see here. Setting num_boost_round (or whatever parameter decides the number of trees) to 200 or so should go some way to fixing that problem (while giving you the new problem of having produced an incomprehensible-to-humans black-box model which can only be evaluated by its output).

(I would have said this sooner but helping a player while the challenge was still running seemed like a bad look.)

[-]gjm17d40

With rather little confidence, I estimate for turtles A-J respectively:

22.93, 18.91, 25.47, 21.54, 17.79, 7.24, 30.36, 20.40, 24.25, 20.69 lb

Justification, such as it is:

The first thing I notice on eyeballing some histograms is that we seem to have three different distributions here: one normal-ish with weights < 10lb, one maybe lognormal-ish with weights > 20lb, and a sharp spike at exactly 20.4lb. Looking at some turtles with weight 20.4lb, it becomes apparent that 6-shell-segment turtles are special; they all have no wrinkles, green colour, no fangs, normal nostrils, no misc abnormalities, and a weight of 20.4lb. So that takes care of Harold. Then the small/large distinction seems to go along with (gray, fangs) versus (not-gray, no fangs). Among the fanged gray turtles, I didn't find any obvious sign of relationships between weight and anything other than number of shell segments, but there there's a clear linear relationship. Variability of weight doesn't seem interestingly dependent on anything. Residuals of the model a + b*segs look plausibly normal. So that takes care of Flint. The other pets are all green or grayish-green so I'll ignore the greenish-gray ones. These look like different populations again, though not so drastically different. Within each population it looks as if there's a plausibly-linear dependence of weight on the various quantitative features; nostrils seem irrelevant; no obvious sign of interactions or nonlinearities. The coefficients of wrinkles and segments are very close to a 1:2 ratio and I was tempted to force that in the name of model simplicity, but I decided not to. The coefficient of misc abs is very close to 1 and I was tempted to force that too but again decided not to. Given the estimated mean, the residuals now look pretty normally distributed -- the skewness seems to be an artefact of the distribution of parameters -- with stddev plausibly looking like a + b*mean. The same goes for the grayish-green turtles, but with different coefficients everywhere (except that the misc abs coeff looks like 1 lb/abnormality again). Finally, if we have a normally distributed estimate of a turtle's weight then the expected monetary loss is minimized ifwe estimate mu + 1.221*sigma.

I assume

that there's a more principled generation process, which on past form will probably involve rolling variable numbers of dice with variable numbers of sides, but I didn't try to identify it.

I will be moderately unsurprised if

it turns out that there are subtle interactions that I completely missed that would enable us to predict some of the turtles' weights with much better accuracy. I haven't looked very hard for such things. In particular, although I found no sign that nostril size relates to anything else it wouldn't be very surprising if it turns out that it does. Though it might not! Not everything you can measure actually turns out to be relevant! Oh, and I also saw some hints of interactions among the green turtles between scar-count and the numbers of wrinkles and shell segments, though my brief attempts to follow that up didn't go anywhere useful.

Tools used: Python, Pandas, statsmodels, matplotlib+seaborn. I haven't so far seen evidence that this would benefit much from

 fancier models like random forests etc.

I've been loving reading these for a while and figured I'd give it a shot for once.

Random early observations

  1. Focusing just on gray turtles for now because they're outliers on every metric.
  2. All gray turtles and only gray turtles have fangs.
  3. Weight is approximately Shell Segments / 2 for gray turtles.
  4. Nothing else seems obviously correlated.

Edit

There are way too many green turtles with 6 shell segments, and they all have no wrinkles, normal nostril size, no miscellaneous abnormalities, and weight 20.4.

Tentative guesses:

Nothing else is standing out to me so I just threw some linear regression at it and added 1.22 times the standard deviation of the residual to be safe.

  1. Abigail: 21.9 lb
  2. Bertrand: 19.5 lb
  3. Chartreuse: 25.4 lb
  4. Dontanien: 21.8 lb
  5. Espera: 19.1 lb
  6. Flint: 7.3 lb
  7. Gunther: 27.4 lb
  8. Harold: 20.4 lb
  9. Irene: 24.4 lb
  10. Jacqueline: 21.0 lb

I completely ignored the greenish-gray turtles because His Malevolence didn't have any and there weren't that many of them in the data, I hope that wasn't a mistake. It bothers me that I can't figure out anything regarding nostril size. From a meta perspective, I feel like there wouldn't be two irrelevant columns given that fangs was already redundant. Everything else was at least somewhat correlated with weight.

Thanks abstractapplic! Initial observations:

There are multiple subpopulations, and at least some that are clearly disjoint.

The 3167 fanged turtles are all gray, and only fanged turtles are gray. Fanged turtles always weigh 8.6lb or less. Within the fanged turtles it seems shell segment number is pretty decently correlated with weight. wrinkles and scars have weaker correlations with weight but also correlate to shell segment number so not sure they have independent effect, will have to disentangle.

Non-fanged turtles always weigh 13.0 lbs or more. There are no turtles weighing between 8.6lb and 13.0lb.

The 5404 turtles with exactly 6 shell segments all have 0 wrinkles or anomalies, are green, have no fangs, have normal sized nostrils, and weigh exactly 20.4lb. None of that is unique to 6-shell-segment turtles, but that last bit makes guessing Harold's weight pretty easy.

Among the 21460 turtles that don't belong in either of those groups, all of the numerical characteristics correlate with weight, and notably number of abnormalities don't seem to correlate with other numerical characteristics so likely have some independent effect. Grayer colours tend to have higher weight, but also correlate with other things that seem to effect weight, so will have to disentangle.

edit: both qwertyasdef and Malentropic Gizmo identified these groups before my comment including 6-segment weight, and qwertyasdef also remarked on the correlation of shell segment number to weight among fanged turtles. 

updates:

In the fanged subset:

I didn't find anything that affects weight of fanged turtles independently of shell segment number. The apparent effect from wrinkles and scars appears to be mediated by shell segment number. Any non-shell-segment-number effects on weight are either subtle or confusingly change directions to mostly cancel out in the large scale statistics.

Using linear regression, if you force intercept=0, then you get a slope close to 0.5 (i.e. avg weight= 0.5*(number of shell segments) as suggested by qwertyasdef), and that's tempting to go for for the round number, but if you don't force intercept=0 then 0 intercept is well outside the error bars for the intercept (though it's still low, 0.376-0.545 at 95% confidence). If you don't force intercept=0 then the slope is more like 0.45 than 0.5. There is also a decent amount of variation which increases in a manner that could be plausibly linear with the number of shell segments (not really that great-looking a fit to a straight line with intercept 0 but plausibly close enough, I didn't do the math). Plausibly this could be modeled by each shell segment having a weight drawn from a distribution (average 0.45) and the total weight being the sum of the weights for each segment. If we assume some distribution in discrete 0.1lb increments, the per-segment variance looks to be roughly the amount supplied by a d4. 

So, I am now modeling fanged turtle weight as 0.5 base weight plus a contribution of 0.1*(1d4+2) for each segment. And no, I am not very confident that's anything to do with the real answer, but it seems plausible at least and seems to fit pretty well.

The sole fanged turtle among the Tyrant's pets, Flint, has a massive 14 shell segments and at that number of segments the cumulative probability of the weight being at or below the estimated value passes the 8/9 threshold at 7.3 lbs, so that's my estimate for Flint.

In the non-fanged, more than 6 segment main subset:

Shell segment number doesn't seem to be the dominant contributor here, all the numerical characteristics correlate with weight, will investigate further.

Abnormalities don't seem to affect or be affected by anything but weight. This is not only useful to know for separating abnormality-related and other effects on weight, but also implies (I think) that nothing is downstream of weight causally, since that would make weight act as a link for correlations with other things. 

This doesn't rule out the possibility of some other variable (e.g age) that other weight-related characteristics might be downstream of. More investigation to come. I'm now holding reading others' comments (beyond what I read at the time of my initial comment) until I have a more complete answer myself.

So had some results I didn't feel were complete enough in to make a comment on (in the senses that subjectively I kept on feeling that there was some follow-on thing I should check to verify it or make sense of it), then got sidetracked by various stuff, including planning and now going on a trip sacred pilgrimage to see the eclipse. Anyway:

all of these results relate to the "main group" (non-fanged, 7-or-more segment turtles):

Everything seems to have some independent relation with weight (except nostril size afaik, but I didn't particularly test nostril size). When you control for other stuff, wrinkles and scars (especially scars) become less important relative to segments. 

The effect of abnormalities seems suspiciously close to 1 lb on average per abnormality (so, subjectively I think it might be 1). Adding abnormalities has an effect that looks like smoothing (in a biased manner so as to increase the average weight): the weight distribution peak gets spread out, but the outliers don't get proportionately spread out.  I had trouble finding a smoothing function* that I was satisfied exactly replicated the effect on the weight distribution however. This could be due to it not being a smoothing function, me not guessing the correct form, or me guessing the correct form and getting fooled by randomness into thinking it doesn't quite fit.

For green turtles with zero miscellaneous abnormalities, the distribution of scars looked somewhat close to a Poisson distribution. For the same turtles, the distribution of wrinkles on the other hand looked similar but kind of spread out a bit...like the effect of a smoothing function. And they both get spread out more with different colours. Hmm. Same spreading happens to some extent with segments as the colours change.

On the other hand, segment distribution seemed narrower than Poisson, even one with a shifted axis, and the abnormality distribution definitely looks nothing like Poisson (peaks at 0, diminishes far slower than a 0-peak Poisson).

Anyway, on the basis of not very much clear evidence but on seeming plausibility, some wild speculation:

I speculate there is a hidden variable, age. Effect of wrinkles and greyer colour (among non-fanged turtles) could be a proxy for age, and not a direct effect (names of those characteristics are also suggestive). Scars is likely a weaker proxy for age and also no direct effect. I guess segments likely do have some direct effect, while also being a (weak, like scars) proxy for age. Abnormalities clearly have a direct effect. Have not properly tested interactions between these supposed direct effects (age, segments, abnormalities), but if abnormality effect doesn't stack additively with the other effects, it would be harder for the 1-lb-per-abnormality size of the abnormality effect to be a non-coincidence.

So, further wild speculation: so age affect on weight could also be smoothing function (though, looks like high weight tail is thicker for greenish-gray - does that suggest it is not a smoothing function?

unknown: is there an inherent uncertainty in the weight given the characteristics, or does there merely appear to be because of the age proxies being unreliable indicators of age? is that even distinguishable? 

* by smoothing function I think I mean another random variable that you add to the first one, this other random variable takes on a range of values within a relatively narrow range. (e.g. uniform distribution from 0.0 to 2.0, or e.g. 50% chance of being 0.2, 50% chance of being 1.8).

Anyway, this all feels figure-outable even though I haven't figured it out yet. Some guesses where I throw out most of the above information (apart from prioritization of characteristics) because I haven't organized it to generate an estimator, and just guess ad hoc based on similar datapoints, plus Flint and Harold copied from above:

Abigail 21.6, Bertrand 19.3, Chartreuse 27.7, Dontanien 20.5, Espera 17.6, Flint 7.3, Gunther 28.9, Harold 20.4, Irene 26.1, Jacqueline 19.7

Two things I saw:

  1. The 'Fangs for some reason' column is not needed, because every gray turtle has a fang and no other color has any fang.
  2. There is a lot of turtles (around 5404 more than expected) with the following characteristics: (20.4lb weight, no wrinkles, 6 shell segments, green, normal nostril size, no miscellaneous abnormalities)

My Solution (this might change before the end)::

[23.14, 19.24, 25.98, 21.52, 18.17, 7.40, 31.15, 20.40, 24.0, 20.52]

Previous solution:

22.652468, 18.932825, 25.491783, 20.964714, 18.029692, 7.4, 30.246178, 20.4, 24.039215, 20.40147

More of... whatever this is on LessWrong, please! Great humor! Imma go open sheets now and optimally estimate turtle weights (as one does on a good friday night). 

Edit: hot damn, you've got a whole sequence of this stuff!

Note: I'll be unavoidably and unexpectedly busy at the start of next week, and so will have to delay resolution of this challenge until either Tuesday or Wednesday (probably Tuesday). I'd apologise for the inconvenience but I'm pretty sure no-one minds.

EDITED TO ADD FINAL ANSWER:

  • Abigail: 23.0lb
  • Bertrand: 19.0lb
  • Chartreuse: 26.2lb
  • Donatello Dontanien: 21.1lb
  • Espera: 17.3lb
  • Flint: 7.3lb
  • Gunther: 30.0lb
  • Harold: 20.4lb
  • Irene: 23.7lb
  • Jacqueline: 20.0lb

Getting started with my favorite first step of calculating a bunch of correlations:

  • It turns out that all fanged turtles are gray, and all gray turtles are fanged.
  • This suggests some kind of speciation by color.
  • When we break down by color:
    • Grayish-green and greenish-gray turtles show near-identical patterns - I assume those are the same species and you've just categorized them a couple different way:
      • Weight is positively correlated with wrinkles, scars, shell segments and abnormalities.
      • The first three of these are positively correlated with one another, and probably reflect some hidden 'age' variable.  Number of abnormalities is not correlated with the others, and seems to do its own thing.  (Turtles grown larger with age, and also weigh more per extra mutant tentacle they have grown?)
      • Nostril size has no effect.
    • Gray turtles work differently:
      • They show the same pattern of wrinkles, scars and shell segments being positively correlated.
      • However, weight in this case seems to be almost entirely determined by # of shell segments.
      • Perhaps these turtles grow at a more predictable rate, with one shell segment per year that adds a regular amount of weight?
    • And green turtles also work differently:
      • The correlations between wrinkles, scars and shell segments have broken down.
      • Additionally, those variables have only small correlations with weight.
      • The most predictive variable towards weight is the # of abnormalities.
      • Perhaps these are strange mutant ninja turtles of some kind that are perpetually teenage don't have a regular growth lifecycle?
  • It looks like these three species of turtle behave differently enough that I'm probably going to end up modelling the three of them all separately (except maybe the what-I'm-assuming-is-age effect that shows up on both gray and mixed-color turtles).
  • My planned next step is to try three independent simple regressions and see how predictive they are for each of those three types of turtle.

When we look at the distributions of variables individually, there's a startling number (5-6k out of 30k) of green turtles with 6 shell segments (the lowest number, never seen otherwise), zero wrinkles, and zero abnormalities, that weigh exactly 20.4lb.  

They do have varying numbers of scars, though, which makes me incline more towards 'some very particular turtle subspecies' and less towards 'one very friendly turtle that figured out that it can get extra attention by wiping off the mark you put on it and coming by again'.

Harold from the King's pets matches this pattern (and thus presumably is one of these strange clone turtles).

Removing those and looking at the rest of the universe:

  • The remaining green turtles now resemble the grayish-green and greenish-gray turtles, making me draw the following three species:
    • Fanged Gray Turtles.
    • Six-Segmented Harold Clones.
    • All Other Turtles.
  • Most variables are now reasonably smoothly-distributed:
    • Scars and Wrinkles look Poisson-like.
    • Abnormalities peak at 0 and fall off: that might also be a poisson distribution, just from a lower mean, or might be something else.
    • Weights are bimodal (with one peak around 5-6lb for the Fanged Gray Turtles, and one wider peak around 15-25lb for All Other Turtles).

The Fanged Gray Turtle seems relatively simple, so we look at that first.

The weight of a Fanged Gray Turtle seems well-approximated by (0.425 + 0.4568*#segments) lb.

This leaves behind a residual that looks roughly like a normal distribution with stdev ~0.357lb.  I'm not able to find any interaction of this residual with any other properties of the turtles - scars, mutations, etc. all seem unpredictive for the Fanged Gray Turtle.

Some quick math reveals that the Tyrant's asymmetric payoff distribution encourages us to overestimate a turtle's weight by ~1.22 standard deviations.  Therefore, we're going to bump up all our weight estimates by 0.435lb in order to flatter His Tyranny.  

(We could bump them up a bit further if we thought that reducing the odds of him having an unflattering portrait of us was worth trading off money for.  However, I actually think we can plausibly use that to extract more money: whatever itinerant artist he kidnaps to do that portrait, we can demand that they give us part of their commission in exchange for us being helpful and sitting for the portrait!  Kaching!)

There's only one Fanged Gray Turtle among the Tyrant's pets: Flint, with 14 Shell Segments.  Our best guess of Flint's true weight is 6.8lb, but we're going to overestimate this to 7.3lb in order to optimize our payoff.

 

And two(low-priority) questions for the GM:

  1. Are we unusually careful and competent at weighing turtles in a way that the Tyrant is not likely to be?  If he is careless about weighing his turtles, and introduces additional error, that increased variance makes us want to slightly increase how far we overestimate by.
  2. What level of granularity are we able to give the Tyrant in our weight estimates?  I think that an estimate of 7.25lb for Flint is slightly higher-payoff than 7.3lb in expectation, but don't know if that's something I'm allowed to give.

Clarifications:

The Tyrant will weigh his Precious Beasts with the same level of diligence you would: no more, no less.

You can predict weights with as fine a granularity as you like; if you want to claim a turtle has a weight of 12.345678lb, that's fine.

A simple linear regression analysis on the remaining turtles (everything that isn't a Fanged Gray Turtle or a Six-Segmented Harold Clone) gives the following formula:

  • 10.56lb base weight if green...
  • +2.02lb if grayish-green,
  • +5.47lb if greenish-gray,
  • +0.359lb/Wrinkle
  • +0.142lb/Scar
  • +0.598lb/Segment
  • +1.000lb/Abnormality

This does a reasonable job of prediction, but has a residual with a fairly-large ~2lb standard deviation.  Our standard-deviation math suggests that this means we should give the Tyrant answers overestimating each turtle by 2.4-2.5lb, and should expect to lose on average ~35gp/turtle to error.

That seems like we might be able to improve on it, but I'm not sure how.  I haven't been able to find any useful interactions yet.  There does seem to be an obvious explanation of all the traits except Abnormalities being driven by some hidden Age variable: old turtles start getting grayish, are wrinklier, have grown more shell segments and accumulated more scars, and are larger.  However, I'm not sure how actionable this is for us.  

The one thing it does look like I can do is adjust the amount of overestimation I do: it does seem that our estimate is less accurate as turtles get older and larger, and so rather than overestimating by 2.44lb for every turtle I should overestimate the larger ones by more and the smaller by less.  That's not going to be a very large improvement, though.  I feel like there ought to be something else to do, but haven't found anything yet.

Haven't found anything particularly good, but I've probably gone as far as I'll go.  I've done some analysis trying to predict how much variance we expect from each turtle so that I know how much to overestimate, and for the non-special turtles I'm predicting:
 

Abigail: 23.0lb

Bertrand: 19.0lb

Chartreuse: 26.2lb

Donatello Dontanien: 21.1lb

Espera: 17.3lb

(Flint is already estimated as a gray turtle as 7.3lb)

Gunther: 30.0lb

(Harold is already estimated as a six-segmented clone as 20.4lb)

Irene: 23.7lb

Jacqueline: 20.0lb

I'm rounding these to 0.1lb even though I'm allowed to go more granular, because if the Tyrant weighs to the same precision we do he will also be rounding to 0.1lb, which means we don't gain anything from more precision (estimating 7.25lb gives a payoff exactly halfway between estimating 7.3 and 7.2).

I'll put these estimates in the parent comment for ease of GM extraction.

The one interesting thing I've turned up is that Abnormalities appear to convey a very large amount of variance: each abnormality adds ~1lb of average weight, but actually slightly over 1lb of stdev-weight.  I suspect that abnormalities are adding weight in a highly-random way: my weight estimates for Espera, Irene and Jacqueline (0-abnormality turtles) are relatively low as a result because my confidence was higher, while my estimate for Gunther (6 abnormalities?) has a lot more safety margin built in.

>!Grey turtles have a much lower weight than most 3.9->7.9
Greyish green turtles have a lower weight then the green ones, though the ranges overlap. Lowest 13 highest 42.9
There is a big spike in the number of green turtles with a weight of 20.4
Suggests we are dealing with multiple distinct species.

 
The spike in green turtles with a weight of 20.4 all have 6 shell segments. 
No green turtle with 6 shell segments has a weight other than 20.4.
Therefore Harold has a weight of 20.4

 
All gray turtles have fangs, and no other coloured turtles do. Means we can ignore this as any effect will be entirely contained in the colour.

There appears to be a slight increase in weight with the number of wrinkles, scars, shell segments, and miscellaneous abnormalities, though the rate of increase depends on shell colour, and to a lesser extent on nostril size.
 
Fitting a linear model explains just under 80 percent of the variation for grey turtles, and a little over 50 percent for the rest.
There is no obvious pattern to the deviations, and there is clearly a lot of randomness as a lot of identical turtles have widely differing weights.

My best estimate for the weights of the turtles based on the linear model is as follows:
Abigail      20.0
Bertrand     17.3
Chartreuse   22.8
Dontanien    19.2
Espera       16.5
Flint         6.8
Gunther      25.5
Harold       20.4
Irene        21.7
Jacqueline   18.6

If I wanted to maximise my income from the constitutional despot I should bump up the estimates a bit, however I don't need the money, and frankly my reputation as an honest scholar is worth more than a few gp. And who knows if enough people ignore perverse incentives like this he may stop offering them and become a less wrong constitutional despot? I can dream at least. As for the unflattering portrait, you can always judge someone by the quality of the pepole you have offended. Coming from him that is going to be seen as a compliment by the people that I care about, not an insult. So I will just give him my best estimates and move on.