I'm a last-year PhD student at the University of Amsterdam working on AI Safety and Alignment, and specifically safety risks of Reinforcement Learning from Human Feedback (RLHF). Previously, I also worked on abstract multivariate information theory and equivariant deep learning. https://langleon.github.io/
I repeatedly refer people to this post, and they repeatedly tell me that it explains a great many of conversations in their real life in a way they previously found hard to pin down. It's a great post.
Agreed that the post is not about causality.
You saying you don't have this experience sounds bizarre to me. Here is an example of this behavior happening to me recently:
It then invented another doi.
This is very common behavior in my experience.
Good idea, I now added the following to the opening paragraphs of the section doing the comparisons:
Importantly, due to Theorem 4, this means that the Solomonoff prior and a priori prior lead up to a constant to the same predictions on sequences. The advantages of the priors that we analyze are thus not statements about their induced predictive distributions.
I answered in the parallel thread, which is probably going down to the crux now. To add a few more points:
Okay, I think I overstated the extent to which the difference in priors matters in the previous comments and crossed out "practical".
Basically, I was right that the prior that gives 100% on cannot update, it gives all its weight to no matter how much data comes in. However, itself can update with more data and shift between and .
I can see that this feels perhaps very syntactic, but in my mind the two priors still feel different. One of them is saying "The world first samples a bit indicating whether the world will continue with world 0 or world 1", and the other one is saying "I am uncertain on whether we live in world 0 or world 1".
Yes. There are lots of different settings one could consider, e.g.:
For all of these cases, one can compare different notions of complexity (plain K-complexity, prefix complexity, monotone complexity, if applicable) with algorithmic probability. My sense is that the correspondence is only exact for universal prefix machines and finite strings, but I didn't consider all settings.
18-month postdoc position in Singular Learning Theory for Machine Learning Models in Amsterdam: https://werkenbij.uva.nl/en/vacancies/postdoc-position-in-singular-learning-theory-for-machine-learning-models-netherlands-14741
The PI Patrick Forré is an experienced mathematician with a past background in arithmetic geometry, and he also has extensive experience in machine learning. I recommend applying! Feel free to ask me a question if you want, Patrick has been my PhD advisor.