I am Andrew Hyer, currently living in New Jersey and working in New York (in the finance industry).
AI has solved DEFCON! Oh no!
That would be a convenient resolution to the Mugging, but seems unlikely to in fact be true? By the time you get up to numbers around $1 million, the probability of you being paid is very low, but most of it is in situations like 'Elon Musk is playing a prank on me,' and in many of these situations you could also get paid $2 million.
It seems likely that 'probability of payment given offer of $2 million' is substantially more than half of 'probability of payment given offer of $1 million'.
I liked this one a lot. I imagined that 'train a linear classifier' would be the next step, but didn't do it due to laziness: it looks like that would probably have worked.
I do feel like my approach should have worked worse than it did - I did most of my evaluation by ignoring the scores and looking only at your historical classifications, and the one place where I let scores guide me into overriding my initial model (moving Student P from Humblescrumble to Serpentyne) it turned out I was incorrect and the initial model would have scored better (oops).
I think there are several similar such markets - the one I was looking at was at https://manifold.markets/Gabrielle/will-russia-use-chemical-or-biologi-e790d5158f6a and lacks such a comment.
EDITED: Ah, you are correct and I am wrong, the text you posted is present, it's just down in the comments section rather than under the question itself. That does make this question less bad, though it's still a bit weird that the question had to wait for someone to ask the creator that (and, again, the ambiguity remains).
I'll update the doc with links to reduce confusion - did not do that originally out of a mix of not wanting to point too aggressively at people who wrote those questions and feeling lazy.
Current model of how your mistakes work:
Your mistakes have always taken the form of giving random answers to a random set of students. You did not e.g. get worse at solving difficult problems earlier, and then gradually lose the ability to solve easy problems as well.
The probability of you giving a random answer began at 10% in 1511. (You did not allocate perfectly even then). Starting in 1700, it began to increase linearly, until it reached 100% in 2000.
This logic is based on: student 37 strongly suggesting that you can make classification mistakes early, and even in obvious cases; and looking at '% of INT<10 students in Thought-Talon' and '% of COU<10 students in Dragonslayer' as relatively unambiguous mistakes we can track the frequency of.
No objection to you commenting. The main risk on my end is that my fundamental contrariness will lead me to disagree with you wherever possible, so if you do end up being right about everything you can lure me into being wrong just to disagree with you.
P is a very odd statblock, with huge Patience and incredibly low Courage and Integrity. (P-eter Pettigrew?) I might trust your models more than my approach on students like B, who have middle-of-the-road stats but happen to be sitting near a house boundary. I'm less sure how much I trust your models on extreme cases like P, and think there might be more benefit there to an approach that just looks at a dozen or so students with similar statblocks rather than trying to extrapolate a model out to those far values.
Based on poking at the score figures, I think I'm currently going to move student P from Humblescrumble to Serpentyne but not touch the other ambiguous ones:Thought-Talon: A, J, O, S
Serpentyne: C, F, P
Dragonslayer: D, G, H, K, N, Q
Humblescrumble: B, E, I, L, M, R, T
Robustness analysis: seeing how the above changes when we tweak various aspects of the algorithm.
I'm not certain whether this will end up changing my views, but K in particular looks very close between Dragonslayer and Serpentyne, and P plausibly better in Serpentyne.
Good catch, fixed.
Had trouble making further progress using that method, realized I was being silly about this and there was a much easier starting solution:
Rather than trying to figure out anything whatsoever about scores, we're trying for now just to mimic what we did in the past.
Define a metric of 'distance' between two people equal to the sum of the absolute values of the differences between their stats.
To evaluate a person:
*these numbers may be varied to optimize. For example, moving the year threshold earlier makes you more certain that the students you find were correctly sorted...at the expense of making them be selected from a smaller population and so be further away from the person you're evaluating. I may twiddle these number in future and see if I can do better.
We can test this algorithm by trying it on the students from 1511 (and using students from 1512-1699 to find close matches). When we do this:
Sadly this method provides no insight whatsoever into the underlying world. We're copying what we did in the past, but we're not actually learning anything. I still think it's better than any explicit model I've build so far.
This gives the following current allocations for our students (still subject to future meddling):
Thought-Talon: A, J, O, S
Serpentyne: C, F*
Dragonslayer: D, H, G*, K*, N*, Q*
Humblescrumble: B*, E*, I, L, M*, P*, R, T
where entries marked with a * are those where the nearby students were a somewhat close split, while those without are those where the nearby students were clearly almost all in the same house.
And some questions for the GM based on something I ran into doing this (if you think these are questions you're not comfortable answering that's fine, but if they were meant to be clear one way or the other from the prompt please let me know):
The problem statement says we were 'impressively competent' at assigning students when first enchanted.