This is an entry in the 'Dungeons & Data Science' series, a set of puzzles where players are given a dataset to analyze and an objective to pursue using information from that dataset. 

STORY (Skippable)

You were saddled with debt, and despair, and regret;

But you left it behind to embark,

With a visiting ship who were planning a trip,

Hunting some strange sea-beasts they call . . . “Snark”?

 

(After climbing aboard and departing the shore,

Your life is if anything worse.

The grog makes you groggy; the sea makes you soggy;

The songs leave you thinking in verse.)

 

Snark-hunting, you find, is a peaceful pastime.

By now, every crew knows the way,

To - with ease! - guarantee their success and safety,

As they seek, and they lure, and they slay.

 

A single exception proves the above rule:

While with most Snarks, Snark-hunting’s a breeze,

If your Snark is a Boojum, it hunts you right back,

And slays you with similar ease.

 

When you learn the detail that a Snark-hunt can fail,

You speak with the crew and insist,

On wielding Science skills to discern which Snarks kill,

And becoming the ship’s Analyst.

 

(The Butcher and Baker and Barnacle-Scraper,

All tell you that simply won’t do.

“You’ll be telling us what is a Boojum, what’s not;

Thus, you’re our Boojumologist; true?”

 

When you meekly protest you’ve not earned that address,

They rebut: “As you clearly percieve,

The Bellman can’t ring, the Belter can’t sing,

And the Beaver knows not how to Beave.”)

 

Notwithstanding contention viz naming conventions,

Your task is as clear as could be:

Using data from trips made by similar ships,

Choose which Snarks to fight, which to flee.

DATA & OBJECTIVES

  • You have a dataset of Snark sightings. This tells you whether a sighted Snark was hunted and whether – if hunted – it turned out to be a Boojum. (To clarify, the order of operations is “someone sights a Snark” “they take notes on its behavior and relay this information to the nearest Snark-hunting ship” “the ship’s crew decide whether to hunt it”).
  • You have a list of Snarks your ship can choose to go after. The ship must hunt six of them for the venture to be profitable; the crew won’t accept a plan targeting fewer than six. If your ship hunts a Boojum, everyone aboard vanishes, including you.
  • Your objective can be to choose the six Snarks which maximize probability of survival, maximize the EV of the number of Snark bodies in your possession at the end of the trip (i.e. max([Snarks hunted] * P(Survival))), or do anything inbetween.
  • (You could also optimize for killing the crew as quickly and reliably as possible, if you like. Or you could ignore the 'challenge' part of the challenge entirely, and just try to characterize the dataset. I’m not the boss of you.)

BONUS TASK

If you post a selection of Snarks before the deadline, and your selection is on the efficient frontier – that is, if you’re the first to submit that selection, and if no other submission has both better EV and better survival probability – you will be entitled to receive [UNSPECIFIED BENEFIT (UNDERWHELMING)] at [UNSPECIFIED TIME (DISTANT)]. Every player can make one submission; if you post multiple plans, please specify clearly which of these is for the Bonus Task.


I’ll post an interactive you can use to test your choices, along with an explanation of how I generated the dataset, sometime on Monday the 12th. I’m giving you nine days, but the task shouldn’t take more than an evening or two; use Excel, R, Python, recollections from a previous timeloop, or whatever other tools you think are appropriate. Let me know in the comments if you have any questions about the scenario.

If you want to investigate collaboratively and/or call your decisions in advance, feel free to do so in the comments; however, please use spoiler tags or rot13 when sharing inferences/strategies/decisions, so people intending to fly solo can look for clarifications without being spoiled.

New to LessWrong?

New Comment
9 comments, sorted by Click to highlight new comments since: Today at 10:34 AM

I did not get make a chance to work on this but I really want to. I commit to making my own solution before Monday January 9th and to avoid looking at the posted solution until I've made my own.

I tried for a while to identify

the histogram of snarks by waking-time as a mixture of gaussians

but was unable to make much progress; my guess is that either

there are 2+ classes with very high variance, polluting everything, or maybe some snarks have a different distribution with a longer right tail than normal.

I did note that

Looks like we don't hunt Crumbling and are strongly biased against hunting Blunt, so maybe I'll just keep to conventional wisdom there. Of the remaining, very close to exactly 3% aren't hunted, with no discernible correlations? Maybe that's just the baseline "don't hunt this" chance.

So I fell back to

First remove all Crumbling and Blunt, then Just pretend you can treat every variable as independent, then pretend you can treat every pair of variables as independent after accounting for first-order effects. Grab log-likelihoods and smush them all together.

Which led me to

Not sure how many to hunt, but in this order:
['V', 'Y', 'G', 'P', 'Q', 'L', 'H', 'W', 'C', 'M', 'B', 'N', 'J', 'R', 'D', 'X', 'K', 'F']

The two suggested criteria would lead to either
['V', 'Y', 'G', 'P', 'Q', 'L'] (maximize my survival), or
['V', 'Y', 'G', 'P', 'Q', 'L', 'H', 'W', 'C', 'M', 'B', 'N', 'J', 'R', 'D'] (maximize snark count EV)

Had I submitted in time, I probably would have chosen to stop at either H, W, or C, because my estimated boojum probabilities have an inflection point there. If I were trying to "beat" everyone else, I'd have stopped at C; if I were ignoring everyone else, I'd have stopped at H, which, interestingly, is only one more snark than the bare minimum.

Trying to pick a word of different length to other people for prizes, my final answer is 

ABGHLPQVWY

Didn't find much out, but a couple notes:

Taste generally seems to be upstream of other factors, and to screen them off a bit, e.g. later-waking Snarks are better but a lot of this effect is that later-waking Snarks are more likely to have good tastes.

Crumbling Snarks are never hunted at all.  We don't have any idea what Crumbling means, so can't tell for sure whether this is 'they will definitely eat you and people know this' or 'people are silly', but they do seem to look bad on some other metrics and I've avoided them to be safe.

My approach

As hinted in my other comment, I believe that this problem is simply a gaussian mixture model. 

I assume there are d species of snarks. Species i is defined by a set of variables:

  •  the probability of a snark to belong to this species
  •  the parameters of the gaussian of this species
  •  the probability for members of this species to kill
  •  the vector giving, for each phenotype k , the probability for a member of species i to have phenotype k. Remember that there are 675 phenotypes in total, but I believe that there are way less species, thus there must be some phenotypic variation within each species

On top of those variables, there is also a hidden matrix  if and only if snark i belongs to species j. We want to estimate all of those numbers

I must confess I don't really remember how the approach I implemented is called, but basically it has to do with alternating estimations:

  • Assuming known , we can compute the MLE for 
    •  is just the weighted average of the times of sights, \(\\) is simply the standard deviation
    •  is the proportion of each species in 
    •  is the proportion of each phenotype in each species
    •  is a bit tricky, I estimated it using only the data for which the crew actually tried to hunt. This is not a problem if nobody is capable of knowing what a Boojan looks like. This might be a problem if, for any reason, the decision to give up or not is not random, and especially if the crew has access to information hidden to us. Anyhow, I had to make an assumption
  • Assuming known , we can estimate  by computing for each observation , the likelihood of  belonging to each species, and then making  proportional to this likelihood
  • By randomly initializing everything, and going back and forth between the two estimates, we should end up with a pretty satisfying result

This approach is super slow, especially because I am trying several numbers of species , I will reply to this comment when I get results

So, how many species are there?

Hum, about 12 ? Probably at least 8, maybe up to 20, or even more, I don't really know

One interesting point, it looks like there is one species, the Goojam, (or maybe 2 Goojam species), which always attacks, while all other species never attack. I.e.  only contains ones and zeros. This is not something I anticipated, it appeared when running my code, but it makes sense, so this might confirm the approach

 

My submission

BPGYHQ should be the set of 6 letters minimizing the probability of death

The letters, in increasing order of danger, are BPGYHQTSVINRDMKLOJUACWXEF, with BPGYHQTSVINRDMKLOJUACW (without XEF) being the solution maximizing the expected return, at least according to my models

If this is true, the pareto frontier is exactly the set of prefixes of this word, but since one can only submit one word for the bonus task, I will submit BPGYHQTS, a word of length 8 with a very low probability of death

It is remarkable to see that I find similar results to Yonge, despite having wildly different approaches!

I haven't had a chance to analyse any effects that depend on Waking-Time yet. But based on the other data every other Snark that was hunted that matched  one of B/G/H/P/Q/Y was successfully hunted so if this was for real I would play it safe and go with:

B G H P Q Y

To maximise value:

A B C D G H J K L M N P Q R V W Y

Looks like it will yield an expected value of just over 14, albeit with an uncomfortably low 84 percent chance of survival. This is my provisonal entry to the bonus task if I don't get a chance to analyse Waking-Time dependant effects.

My selection:

YNVGQRPHDBL

(ordered from best to not-as-best looking to me) 

my total ordering from best to worst would have been (this is not my selection:)

YNVGQRPHDBLJMCASWXOIKTUFE

where I think EV starts to drop after about 19 picks

further commentary to be included in reply

techniques used:

baseline  was just the numbers of snarks vs. the number of boojums for cases where all data was the same except waking time, then I fudged based on:

  • if there are few or no boojums, see what numbers you get if you relax the non-taste characteristics one by one
  • is the waking time typical for a boojum? for a snark? can relax multiple non-taste characteristics for more data
  • If Crumbling, I assumed that the early part of the Crumbling distribution is probably snarks and the late is probably boojums, see below. Though, I wasn't actually confident enough to pick any Crumbling ones.

Also, I tried to figure stuff out about the nature of the dataset but didn't end up with particularly actionable information; I did come to a strong suspicion based on comparing the  Crumbling data with the Blunt data that the late Crumbling ones are boojums and the early ones are snarks, but I wasn't confident enough in this to actually pick any Crumbling ones.

Also, early on  I was making a poem in homage to the OP poem as I went along, which was kind of silly I guess, but included below. I didn't want to post it at the time since I hadn't finished either the poem or the analysis which I was writing about...and it is still not updated with my later analysis, but posting anyway. I never did follow up the issue I was reporting on at the end...

The Snarks, so it seems, know just when to wake up;

From one beyond seven, the sun's nigh still up!

But the graph of those times, it hops and it hops:

A possible sign of some smaller sub-pops!

One-forty, two-thirty, two-fifty or so,

These are some peaks one can spot on the go.

But there looks like a broad one whose lateness abounds,

Though it could be a mishmash of all groups not found.

Four-fifty the top of another small mound.

 

And the Boojums, you see, don't quite seem to respect

The waking-time patterns that one would expect.

 

But enough of the times - on to matters of Taste!

It looks like the Crumbling ones all go to waste!

But the Crumbling wake times! - one peak is at two,

And it doesn't line up with the Snarks one can view.

Another one peaks around four thirty-five,

Are they from the broad set of Snarks found alive?

But that seems to peak quite a lot before then,

Unless I am missing a time where again

Some Snark group decidedly tends to revive,

So why take the risk if you like to survive?

 

The Snarks that are Artless are heaped to be broad

With some showing up in the two thirty pod,

Especially when they are Crisp as a pier,

But looking at those that are Neat and are Clear,

The peaks don't line up! Could it be that these here ...

The unsolved issue being - does each sub-population have a consistent wake time distribution regardless of other characteristics individual members may have, or do the other characteristics alter the wake times? I was implicitly assuming the former, but I didn't follow up to really convince myself that this was the case.

A few observations:

Big spoiler (but short)

This is a gaussian mixture

Big spoiler (but long)

Each observation of snarks contains several pieces of information:

  • The time at which the snark was sighted. I will come back to this one very soon.
  • Several traits of the snark. I call them the phenotype of the snark. 
    • The first two adjectives each have 5 possibilities. This gives 25 combinations of what I call the main phenotype.
    • The last three characteristics each have 3 possibilities, giving 27 combinations. I call this part the secondary phenotype.
  • Whether the snark was a Boojum. This last one is a bit messy because there are 2 booleans, but only 3 possibilities
    • either the boat didn't check
    • or checked, and it wasn't a Boojum
    • or checked, and it was a Boojum

Now that we know that, we could try to observe how each trait impacts the dataset. For example, the most common main phenotype is Hollow, yet Crisp. We could try to plot a histogram of the times at which each phenotype 0 appeared.

Aaaaand..... It's obviously the mixture of two gaussians. This hints several things:

  • Snarks can be divided into species.
  • The density of sights for a given species is a gaussian.
  • Main phenotype 0 is made of 2 species. Maybe we could refine this observation by including secondary phenotype into the analysis. Maybe one can deduce or guess the species from the phenotype.