Are there any theorems that use SLT to quantify out-of-distribution generalization?
There is one now, though whether you still want to count this as part of SLT or not is a matter of definition.
I’ve said this many times in conversations, but I don’t think I’ve ever written it out explicitly in public, so:
I support some form of global ban or pause on AGI/ASI development. I think the current AI R&D regime is completely insane, and if it continues as it is, we will probably create an unaligned superintelligence that kills everyone.
Yes, subtracting from inequality (1.1) does yield . So, since the total KL divergence summed over the first data points is bounded by the same constant for any , and KL-divergences are never negative, must go to zero for large fast enough for the sum to not diverge to infinity, which implies it has to go to zero faster than 1/n.
Though note that in real life, where is finite, can still go to zero very unevenly; it doesn't have to be monotonic.
For example, you might have from to , then suddenly see a small upward spike at . A way this might happen is if the first data points the inductor receives come from one data distribution, and the subsequent data points are drawn from a very different distribution. If there is a program that is shorter than (so ) and that can predict the data labels for the first distribution but not the second distribution, whereas can predict both distributions, the inductor would favour over and assign it higher probability until it starts seeing data from the second distribution. It might make up to bits of prediction error early on before its posterior becomes largely dominated by predictions that match at . After that, the KL-divergence would go to zero for a while because everything is getting predicted accurately. Then, at , when we switch to the second data distribution, the KL-divergence would go up again for while, until the inductor has added another bits of prediction error to the total KL-divergence. From then on the inductor would make predictions that match and so the KL-divergence would go back down to zero again and this time stay zero permanently.
I think a potential drawback of this strategy is that people tend to become more hesitant to argue with you. Their instincts tell them you’re a high-status person they can’t afford to offend or risk looking stupid in front of. If you seem less confident, less cool, and less high-status, the mental barrier for others to be disagreeable, share weird ideas, or voice confusion in your presence is lower.
I try to remember to show off some uncoolness and uncertainty for this reason, especially around more junior people. I used to have a big seal plushie on my desk in the office, partially because I just like cute stuffed animals, but also to try to signal that I am approachable and non-threatening and can be safely disagreed with.
I don’t think quantum immortality changes anything. You can rephrame this in terms of standard probability theory and condition on them continuing to have subjective experience, and still get to the same calculus.
I agree that quantum mechanics is not really central for this on a philosophical level. You get a pretty similar dynamic just from having a universe that is large enough to contain many almost-identical copies of you. It's just that it seems at present very unclear and arguable whether the physical universe is in fact anywhere near that large, whereas I would claim that a universal wavefunction which constantly decoheres into different branches containing different versions of us is pretty strongly implied to be a thing by the laws of physics as we currently understand them.
However, only considering the branches in which you survive, or conditioning on having subjective experience after the suicide attempt, ignores the counterfactual suffering prevented in all the branches (or probability mass) in which you did die, which may be less unpleasant than the branches in which you survived, but are many many more in number! Ignoring those branches biases the reasoning toward rare survival tails that don’t dominate the actual expected utility.
It is very late here and I should really sleep instead of discussing this, so I won't be able to reply as in-depth as this probably merits. But, basically, I would claim that this is not the right way to do expected utility calculations when it comes to ensembles of identical or almost-identical minds.
A series of thought experiments might maybe help illustrate part of where my position comes from:
In real life, if you die in the vast majority of branches caused by some event, i.e. that's where the majority of the amplitude is, but you survive in some, the calculation for your anticipated experience would seem to not include the branches where you die for the same reason it doesn't include the dead copies in thought experiments 2 and 3.
(I think Eliezer may have written about this somewhere as well using pretty similar arguments, maybe in the quantum physics sequence, but I can't find it right now.)
I don't think it proves too much. Informed decision-making comes in degrees, and some domains are just harder? Like, I think my threshold for leaving people free to make their own mistakes if they are the only ones harmed by them is very low, compared to where the human population average seems to be at the moment. But my threshold is, in fact, greater than zero.
For example, there are a bunch of things I think bystanders should generally prevent four year old human children from doing, even if the children insist that they want to do them. I know that stopping four year old children from doing these things will be detrimental in some cases, and that having such policies is degrading to the childrens' agency. I remember what it was like being four years old and feeling miserable because of kindergarten teachers who controlled my day and thought they knew what was best for me. I still think the tradeoff is worth it on net in some cases.
I just think that the suicide thing happens to be a case where doing informed decision-making is maybe just too tough for way too many humans and thus some form of ban could plausibly be worth it on net. Sports betting is another case where I was eventually convinced that maybe a legal ban of some form could be worth it.
I think very very many people are not making an informed decision when they decide to commit suicide.
For example, I think quantum immortality is quite plausibly a thing. Very few people know about quantum immortality and even fewer have seriously thought about it. This means that almost everyone on the planet might have a very mistaken model of what suicide actually does to their anticipated experience.[1] Also, many people are religious and believe in a pleasant afterlife. Many people considering suicide are mentally ill in a way that compromises their decision making. Many people think transhumanism is impossible and won't arrange for their brain to be frozen for that reason.
I agree that there is some threshold on the fraction of ill-considered suicides relative to total suicides such that suicide should be legal if we were below that threshold. I used to think we were maybe below that threshold. After I began studying physics at uni and so started taking quantum immortality more seriously, I switched to thinking we are maybe above the threshold.
You might find yourself in a branch where your suicide attempt failed, but a lot of your body and mind were still destroyed. If you keep exponentially decreasing the amplitude of your anticipated future experience in the universal wave function further, you might eventually find that it is now dominated by contributions from weird places and branches far-off in spacetime or configuration space that were formerly negligible, like aliens simulating you for some negotiation or other purpose.
I don't really know yet how to reason well about what exactly the most likely observed outcome would be here. I do expect that by default, without understanding and careful engineering our civilisation doesn't remotely have the capability for yet, it'd tend to be very Not Good.
Assuming that the bits to parameters encoding can be relaxed, there's some literature about redundant computations in neural networks. If the feature vectors in a weight matrix aren't linearly independent, for example, the same computation can be "spread" over many linearly dependent features, with the result that there are no free parameters but the total amount of computational work is the same.
There's a few other cases like this where we know how various specific forms of simplicity in the computation map onto freedom in the parameters. But those are not enough in this case. We need more freedom than that.
If every bit of every weight were somehow used to store one bit of , excepting those weights used to simulate the UTM, that should suffice to derive the conjecture, yes.[1]
I think that's maybe even harder than what I tried to do though. It's theoretically fine if our scheme is kind of inefficient in terms of how much code it can store in a given number of parameters, so long as the leftover parameter description bits are free to vary.
There'd be some extra trickiness in that under these definitions, the parameters are technically real numbers and thus have infinity bits of storage capacity, though in real life they're of course actually finite precision floating point numbers.
That may be true[1]. But it doesn't seem like a particularly useful answer?
"The optimization target is the optimization target."
For the outer optimiser that builds the AI