This is a post (hopefully eventually a short series of posts) detailing my analysis of the recently released D&D.Sci scenario by abstractapplic. I've decided to go through documenting what I do - if you intend to play the scenario yourself without help, you should do that before reading this. If you want to use this information in your solutions, go ahead.
I start by grabbing the raw data and saving it as a csv, then import it into Python and Excel to play with it. I'm not trying to do any particular analysis at this point, just trying to get familiar with the data and see if anything jumps out.
We'll go through the columns one by one, and make some charts in Excel for visualization[1].
Longitude is relatively evenly-distributed from -180 to +180.
Latitude can vary from -90 to +90 but is bimodal, usually taking on values around 45 - we rarely land near the equator or near the poles.
And Shortitude and Deltitude look the same as Latitude:
This might be what you naturally expect from landing at a random point on a 4D sphere - certainly values near +-90 being less likely happens naturally, since those are 'the poles' and take up less area. I don't have a good enough intuition for this to know if values near 0 are also naturally rare once you add more dimensions, or if we're avoiding the 'equator' for some reason.
Strange Smell is often Somewhat present, but only rarely EXTREMELY present:
Air Tastes Like has a few different values, some common and some rare:
Feng Shui is generally Adequate, sometime bad, and very rarely good:
Weird Sounds has five possible values, of which we see up to three at a time:
Eerie Silence appears exactly when no other sounds appear.
Skittering and Buzzing never happen together (though they are the most common sounds separately).
Local Value of Pi is distributed in what looks like a very neat distribution:
3.141 + (1d41-1d41)/1000 isn't literally accurate, since we see more decimal figures than that, but it's probably a reasonable way of thinking about it?
Murphy's Constant has a very weird distribution:
I'm going to guess something like this:
Performance is not looking great:
Across 10k+ sites, there have been 2 where performance was >=100%.
We do admittedly have 110k precleared sites to choose from. This suggests that there are going to be ~20 sites in the full possible data that will perform acceptably - this in turn suggests that there's not going to be much leeway in our task here. If we can't almost-perfectly identify all factors that affect performance, we're not going to realistically be able to find 12 100%+ sites.
I tweaked a few of the columns to be more user-friendly:
and then outputted the result as a new file that I think is more convenient.
One of the simplest useful things we can do with this data is build a correlation matrix.[2]
Lo | La | Sh | De | Pi | Mu | Smell | Feng | Appl | Burn | Copp | Mint | Hum | Skit | Sque | Buzz | Perf | |
Lo | 0.01 | -0.01 | 0.00 | 0.00 | 0.00 | 0.00 | -0.01 | -0.01 | -0.01 | 0.02 | 0.00 | 0.00 | 0.00 | -0.01 | 0.00 | 0.07 | |
La | 0.01 | 0.00 | -0.01 | 0.00 | 0.00 | 0.02 | 0.01 | 0.01 | -0.02 | 0.00 | 0.00 | 0.00 | 0.02 | -0.01 | -0.02 | 0.00 | |
Sh | -0.01 | 0.00 | 0.01 | -0.01 | 0.00 | -0.01 | -0.01 | 0.00 | -0.01 | -0.01 | 0.01 | -0.02 | 0.01 | -0.01 | -0.01 | 0.01 | |
De | 0.00 | -0.01 | 0.01 | 0.00 | 0.02 | 0.00 | 0.00 | -0.01 | 0.01 | 0.02 | -0.01 | -0.01 | -0.01 | -0.01 | 0.01 | 0.00 | |
Pi | 0.00 | 0.00 | -0.01 | 0.00 | -0.01 | 0.00 | -0.01 | -0.02 | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 | 0.00 | -0.01 | 0.11 | |
Mu | 0.00 | 0.00 | 0.00 | 0.02 | -0.01 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.00 | -0.01 | -0.01 | -0.40 | |
Smell | 0.00 | 0.02 | -0.01 | 0.00 | 0.00 | 0.00 | -0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | -0.02 | -0.01 | 0.01 | -0.16 | |
Feng | -0.01 | 0.01 | -0.01 | 0.00 | -0.01 | 0.01 | -0.01 | 0.01 | -0.02 | -0.01 | 0.01 | -0.01 | 0.01 | 0.00 | -0.01 | 0.10 | |
Appl | -0.01 | 0.01 | 0.00 | -0.01 | -0.02 | 0.00 | 0.00 | 0.01 | -0.15 | -0.11 | -0.45 | -0.01 | 0.00 | 0.00 | 0.00 | -0.07 | |
Burn | -0.01 | -0.02 | -0.01 | 0.01 | 0.01 | 0.00 | 0.00 | -0.02 | -0.15 | -0.05 | -0.23 | 0.00 | -0.01 | 0.00 | 0.01 | 0.09 | |
Copp | 0.02 | 0.00 | -0.01 | 0.02 | 0.00 | 0.00 | 0.00 | -0.01 | -0.11 | -0.05 | -0.16 | 0.02 | 0.01 | 0.00 | 0.00 | 0.04 | |
Mint | 0.00 | 0.00 | 0.01 | -0.01 | 0.00 | 0.00 | 0.00 | 0.01 | -0.45 | -0.23 | -0.16 | -0.01 | 0.01 | 0.00 | -0.01 | 0.26 | |
Hum | 0.00 | 0.00 | -0.02 | -0.01 | 0.00 | 0.01 | 0.01 | -0.01 | -0.01 | 0.00 | 0.02 | -0.01 | 0.01 | 0.01 | 0.00 | -0.47 | |
Skit | 0.00 | 0.02 | 0.01 | -0.01 | 0.01 | 0.00 | -0.02 | 0.01 | 0.00 | -0.01 | 0.01 | 0.01 | 0.01 | 0.00 | -0.53 | 0.06 | |
Sque | -0.01 | -0.01 | -0.01 | -0.01 | 0.00 | -0.01 | -0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.00 | 0.01 | -0.29 | |
Buzz | 0.00 | -0.02 | -0.01 | 0.01 | -0.01 | -0.01 | 0.01 | -0.01 | 0.00 | 0.01 | 0.00 | -0.01 | 0.00 | -0.53 | 0.01 | -0.17 | |
Perf | 0.07 | 0.00 | 0.01 | 0.00 | 0.11 | -0.40 | -0.16 | 0.10 | -0.07 | 0.09 | 0.04 | 0.26 | -0.47 | 0.06 | -0.29 | -0.17 |
A few things stand out:
There are two main limitations to this analysis.
If I try using the information here to make my initial guesses:
We have a very large number of possible sites we can try. Even after several requirements:
we still have 2628 entries. My guesswork score based on murphy's constant/pi/a tiny bit of Longitude suggests trying the following sites:
6123
10709
11789
16118
23695
24728
29720
33672
36008
48703
53187
61818
However, when I apply the same logic to the main dataset and look at how generators in sites like this actually scored, they tend to be in the 50-90% range. This is much better than the overall average of 23%, but obviously not actually good enough that I should risk my neck on them.
Presumably digging deeper into Longitude/Latitude/Shortitude/Deltitude will provide more detail. I'll do that at some point, and will try to get a writeup of what I've done.
(Personal preference note: Excel is a very bad programming language that gets used to do a bad job of solving simple programming tasks by people who really ought to just use Python, but a very good data visualization tool.)
These are quite useful, and really startlingly easy to make in Excel if you use paste-transpose and mixed absolute referencing.
(I hate this particular phrasing of it, because the word 'imply' can be used to mean either 'prove' or 'suggest', and while correlation doesn't prove causation it definitely does suggest causation.)