Just reviewed PreDCA, and while I consider it to be a clever solution to the wrong* problem, I don't see how it requires perfect physical modelling? (maybe geting a perfect answer would require perfect modelling, but an approximate answer does not).
QACI seemed obviously overcomplicated to me, don't care enough to really check if it requires perfect modelling, though I'm not sure I would call that MIRI-style.
*to clarify, I think Pre-DCA is trying to solve both a) "how does the AI identify a target to help" and b) "how does it help them", and it is a clever solution to problem (a) for a clean sheet from-scratch AI but that is the wrong problem since it's more likely an AI would be trained as a non-agent or weaker agent before being made a strong agent so would already know what humans are, and it is a clever but inadequate attempt to solve (b) which is one of the right problems.
Can you link specifically an example of what you are criticizing here? Which formalisms require simulating the universe from the beginning to identify a specific being? (as opposed to, e.g. some probabilistic approximation?)
(trying to think of examples, maybe Aumann's theorem qualifies? But I'm not sure, and not sure what else does?)
Yes, but:
While (a) is risky (b) seems worse to me.
Agree on SGCA, if only because something is likely to self-modify to one, disagree on expected utility maximization necessarily being the most productive way to think of it.
Consider the following two hypothetical agents:
Agent 1 follows the deontological rule of choosing the action that maximizes some expected utility function.
Agent 2 maximizes expected utility, where utility is defined as how well an objective god's-eye-view observer would rate Agent 2's conformance to some deontological rule.
Obviously agent 1 is more naturally expressed in utilitarian terms, and agent 2 in deontological terms, though both are both and both can be coherent.
Now, when we try to define what decision procedure an aligned AI could follow, it might turn out that there's no easy way to express what we want it to do in purely utilitarian terms, but it might be easier in some other terms.
I especially think that's likely to be the case for corrigibility, but also for alignment generally.
Thanks aphyer, and thanks for enabling me to spend a lot of time learning Haskell instead of efficiently working towards solving this.
Since I have now been
officially recognized as "the best"
by the all-important Creator of This and Many Other Scenarios, I have some advice for others who might be inclined to participate and to prove your claim wrong:
About the difficulty of the scenario:
I think that the quantity of data was a major friction here, such that likely it would have had more participation and more success if both the number of class/aspect combos and the number of games were a lot smaller.
While the option of the reduced dataset was provided, it's hard to get oneself to do something obviously worse when there's an "objectively" superior choice available. I guess I ignored my own advice above!
I'm not going to request any more extra time at this point since I can't promise myself or anyone else that I'd get the things I wanted to do done in any reasonable amount of time, so will stop now and try those things later. In the end I looked at it with LibreOfficeCalc for a bit and came up with:
It looks like Sylph of Breath and Maid of Hope do decently well in pairs with Mage of Time and Knight of Blood and each other in 4-player groups. Could be completely bogus if more than 2-way interactions are important. But I'll go with that for the main question and also for Troll Bonus 1, since they sound nice and I can "Hope" that space isn't required.
For, Troll Bonus 2, I took the games with 4 or more players with class/aspect combos from the team in question and somewhat arbitrarily weighted them by 1.5^(players with combos from team in question - players with other combos). Then I took the weighted win/ weighted loss ratios. The resulting ratios:
Prince of Hope 0.30570369834603
Maid of Time 0.40748379375837
Mage of Doom 0.420345301283368
Witch of Life 0.423639393282491
Seer of Mind 0.431303161503186
Page of Breath 0.446236493426357
Thief of Light 0.452103112303768
Knight of Blood 0.49442917605839
Rogue of Heart 0.495321817146204
Sylph of Space 0.513254186889673
Bard of Rage 0.535594739892861
Heir of Void 0.55043319428873
The Prince of Hope is a notable outlier and prime target for elimination by the troll asking the question. This is also the class/aspect combo with the worst winrate in the dataset as a whole, btw.
Troll Bonus 3: I feel like I still don't know much about this dataset, but it does seem that class/aspect combos should be chosen with a consistent theme. E.g. if you are going to be an impressive sounding "prince" or "heir" you don't need "hope" and should go for something suitably edgy like "void". "Blood" tends to go with classes that sound like they would be physical attackers.Casters might be better with mental-sounding aspects, etc.
On the vaccine chart, if people are less likely to get vaccinated if they are already close to dying, then all cause mortality will spike among the unvaccinated and drop among the vaccinated when the vaccines are being rolled out, regardless of Covid-related effects. That's not likely to account for the full spike, but it isn't as straightforward as attributing the full difference to the vaccines.
A confidence interval is just an upper and lower bound according to some probability threshold. I.e. it's just probabilities, and does not require some super special technique.
Regarding the Dirichlet process:
Reading the wiki article it seems like it's designed for a particular class of problems, and is not a general solution to all problems. So, it would make sense to use it if your problem falls in that class, but not if it doesn't.
On the other hand, grading by word count, by incentivizing writing more, might lead to better outcomes through more practice.
https://www.lesswrong.com/posts/hY86FhYysQ7dBg3d8/just-try-it-quantity-trumps-quality
It's probably not the most effective thing possible, but would the teachers actually be able to pull off a better method?