LW1.0 username Manfred. PhD in condensed matter physics. I am independently thinking and writing about value learning.
Nicely written. But... no? Obviously no?
Direct Instruction is a thing that has studies on it, for one.
How about reading a fun book and then remembering the plot?
Spaced repetition on flashcards of utter pointless trivia seems to work quite well for its intended purpose.
Learning how to operate a machine just from reading the manual is a key skill for both soldiers and grad students.
Yeah, not a ton. For I think the obvious reason that real-world agents are complicated and hard to reason about.
Though search up "tiling agents" for some MIRI work in this vein.
Dunno. My random guess would be Meaningness, but it's probably not.
I almost stopped reading after Alice's first sentence because https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target
The rest was better, though I think that the more typical framing of this argument is better - what this is really about is models in RL. The thought experiment can be made closer to real-life AI by talking about model-based RL, and more tenuous arguments can be made about whether learning a model is convergent even for nominally model-free RL.
Cool idea. I think in most cases on the list, you'll have some combination of information asymmetry and an illiquid market that make this not that useful.
Take used car sales. I put my car up for sale and three prospective buyers come to check it out. We are now basically the only four people in the world with inside knowledge about the car. If the car is an especially good deal, each buyer wants to get that information for themselves but not broadcast it either to me or to the other buyers. I dunno man, it all seems like a stretch to say that the four of us are going to find a prediction market worth it,.
Yup, this all seems basically right. Though in reality I'm not that worried about the "we might outlaw some good actions" half of the dilemma. In real-world settings, actions are so multi-faceted that being able to outlaw a class of actions based on any simple property would be a research triumph.
Also see https://www.lesswrong.com/posts/LR8yhJCBffky8X3Az/using-predictors-in-corrigible-systems or https://www.lesswrong.com/posts/qpZTWb2wvgSt5WQ4H/defining-myopia for successor lines of reasoning.
I have taken the survey. Or at least the parts I can remember with my aging brain.
Yes, since you don't expect the copy of you on planet A to go anywhere, it would be paradoxical to decrease your probability that you're on planet A.
Which is why you have a 100% chance of being on planet A. At least in the third-person, we-live-in-a-causal-universe, things-go-places sense. Sure, in the subjective, internal sense, the copy of you that's on planet A can have a probability distribution over what's outside their door. But in the sense physics cares about you have a 100% probability of being on planet A both before and after the split, so nothing went anywhere.
Subjectively, you always expected your estimate of what's outside the door to change at the time of the split. It doesn't require causal interaction at the time of the split because you're just using information about timing. A lot like how if I know the bus schedule, my probability of the bus being near my house "acausally" changes over time - except weirder because an extra copy of you is added to the universe.
I think there's certainly a question people want to ask when they talk about things like Q1 and Q2, but the standard way of asking them isn't right. If there is no magic essence of "you" zipping around from place to place in the universe, then the probability of "you" waking up in your body can only be 1.
My advice: rather than trying to hold the universe fixed and asking where "you" goes, hold the subjective information you have fixed and ask what the outside universe is like. When you walk out of the door, do you expect to see planet A or planet B? Etc.
Neither, and that's ok.