The Principle of Predicted Improvement
I made a conjecture I think is cool. Mark Sellke proved it. I don't know what else to do with it, so I will explain why I think it's cool and give the proof here. Hopefully, you will think it's cool, too. Suppose we are trying to assign as much probability as possible to whichever of several hypotheses is true. The law of conservation of expected evidence tells us that for any hypothesis, we should expect to assign the same probability to that hypothesis after observing a test result that we assign to it now. Suppose that H takes values hi. We can express the law of conservation of expected evidence as, for any fixed hi: E[P(H=hi|D)]=P(H=hi) In English this says that the probability we should expect to assign to hi after observing the value of D equals the probability we assign to hi before we observe the value of D. This law raises a question. If all I want is to assign as much probability to the true hypothesis as possible, and I should expect to assign the same probability I currently assign to each hypothesis after getting a new piece of data, why would I ever collect more data? A. J. Ayer pointed out this puzzle in The Conception of Probability as a Logical Relation (I unfortunately cannot find a link). I. J. Good solved Ayer's puzzle in On the Principle of Total Evidence. Good shows that if I need to act on a hypothesis, the expected value of gaining an extra piece of data is always greater than or equal to the expected value of not gaining that new piece of data. Although there is nothing wrong with Good's solution, I found it somewhat unsatisfying. Ayer's puzzle is purely epistemic, and while there is nothing wrong with a pragmatic solution to an epistemic puzzle, I still felt that there should be a solution that makes no reference to acts or utility at all. Herein I present a theorem that I think constitutes such a solution. I have decided to call it the principle of predicted impro
What would a class aimed at someone like me (read lesswrong for many years, familiar with the basics of LLM architecture and learning to some extent) have to cover to get me up to speed on AI futurism by your lights? I am imagining the output here being like a bulleted list of 12-30 broad thingies.