LESSWRONG
LW

2140
Lucius Bushnaq
4481Ω196113581
Message
Dialogue
Subscribe

AI notkilleveryoneism researcher, focused on interpretability. 

Personal account, opinions are my own. 

I have signed no contracts or agreements whose existence I cannot mention.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
8Lucius Bushnaq's Shortform
1y
105
Alexander Gietelink Oldenziel's Shortform
Lucius Bushnaq17h50

That may be true[1]. But it doesn't seem like a particularly useful answer?

"The optimization target is the optimization target."

  1. ^

    For the outer optimiser that builds the AI

Reply
Neural networks generalize because of this one weird trick
Lucius Bushnaq1d*20

Are there any theorems that use SLT to quantify out-of-distribution generalization?

There is one now, though whether you still want to count this as part of SLT or not is a matter of definition.

Reply
Safety researchers should take a public stance
Lucius Bushnaq4d*208

I’ve said this many times in conversations, but I don’t think I’ve ever written it out explicitly in public, so:

I support some form of global ban or pause on AGI/ASI development. I think the current AI R&D regime is completely insane, and if it continues as it is, we will probably create an unaligned superintelligence that kills everyone.

Reply2
From SLT to AIT: NN generalisation out-of-distribution
Lucius Bushnaq5d31

Yes, subtracting ∑nH(Pμ(⋅|xn)) from inequality (1.1) does yield ∑Nn=1DKL(Pμ(⋅|xn),PM1(⋅∣xn,D<n))≤C(μ,M1). So, since the total KL divergence summed over the first N data points is bounded by the same constant for any N, and KL-divergences are never negative, DKL(Pμ(⋅|xn),PM1(⋅∣xn,D<n)) must go to zero for large n fast enough for the sum to not diverge to infinity, which implies it has to go to zero faster than 1/n.

Though note that in real life, where N is finite, DKL(Pμ(⋅|xn),PM1(⋅∣xn,D<n)) can still go to zero very unevenly; it doesn't have to be monotonic. 

For example, you might have DKL(Pμ(⋅|xn),PM1(⋅∣xn,D<n))=0 from n=103 to n=106, then suddenly see a small upward spike at n=106+1. A way this might happen is if the first 106 data points the inductor receives come from one data distribution, and the subsequent data points are drawn from a very different distribution. If there is a program μ′ that is shorter than μ (so C(μ′,M1)<C(μ,M1)) and that can predict the data labels for the first distribution but not the second distribution, whereas μ can predict both distributions, the inductor would favour μ′ over μ and assign it higher probability until it starts seeing data from the second distribution. It might make up to C(μ′,M1) bits of prediction error early on before its posterior becomes largely dominated by predictions that match μ′ at n=103. After that, the KL-divergence would go to zero for a while because everything is getting predicted accurately. Then, at n=106+1, when we switch to the second data distribution, the KL-divergence would go up again for while, until the inductor has added another ≤C(μ,M1)−C(μ′,M1) bits of prediction error to the total KL-divergence. From then on the inductor would make predictions that match μ and so the KL-divergence would go back down to zero again and this time stay zero permanently. 

Reply1
How To Dress To Improve Your Epistemics
Lucius Bushnaq7d*3124

I think a potential drawback of this strategy is that people tend to become more hesitant to argue with you. Their instincts tell them you’re a high-status person they can’t afford to offend or risk looking stupid in front of. If you seem less confident, less cool, and less high-status, the mental barrier for others to be disagreeable, share weird ideas, or voice confusion in your presence is lower.

I try to remember to show off some uncoolness and uncertainty for this reason, especially around more junior people. I used to have a big seal plushie on my desk in the office, partially because I just like cute stuffed animals, but also to try to signal that I am approachable and non-threatening and can be safely disagreed with.

Reply3
Mikhail Samin's Shortform
Lucius Bushnaq16d*32

I don’t think quantum immortality changes anything. You can rephrame this in terms of standard probability theory and condition on them continuing to have subjective experience, and still get to the same calculus. 

I agree that quantum mechanics is not really central for this on a philosophical level. You get a pretty similar dynamic just from having a universe that is large enough to contain many almost-identical copies of you. It's just that it seems at present very unclear and arguable whether the physical universe is in fact anywhere near that large, whereas I would claim that a universal wavefunction which constantly decoheres into different branches containing different versions of us is pretty strongly implied to be a thing by the laws of physics as we currently understand them. 


However, only considering the branches in which you survive, or conditioning on having subjective experience after the suicide attempt, ignores the counterfactual suffering prevented in all the branches (or probability mass) in which you did die, which may be less unpleasant than the branches in which you survived, but are many many more in number! Ignoring those branches biases the reasoning toward rare survival tails that don’t dominate the actual expected utility.

It is very late here and I should really sleep instead of discussing this, so I won't be able to reply as in-depth as this probably merits. But, basically, I would claim that this is not the right way to do expected utility calculations when it comes to ensembles of identical or almost-identical minds.

A series of thought experiments might maybe help illustrate part of where my position comes from:

  1. Imagine someone tells you that they will put you to sleep and then make two copies of you, identical down to the molecular level. They will place you in a room with blue walls. They will place one copy of you in a room with red walls, and the other copy in another room with blue walls. Then they will wake all three of you up.

    What color do you anticipate seeing after you wake up, and with what probability? 

    I'd say 2/3 blue, 1/3 red. Because there will now be three versions of me, and until I look at the walls I won't know which one I am.
  2. Imagine someone tells you that they will put you to sleep and then make two copies of you. One copy will not include a brain. It's just a dead body with an empty skull. Another copy will be identical to you down to the molecular level. Then they will place you in a room with blue walls, and the living copy in a room with red walls. Then they will wake you and the living copy up. 

    What color do you anticipate seeing after you wake up, and with what probability? Is there a 1/3 probability that you 'die' and don't experience waking up because you might end up 'being' the corpse-copy?

    I'd say 1/2 blue, 1/2 red, and there is clearly no probability of me 'dying' and not experiencing waking up. It's just a bunch of biomass that happens to be shaped like me.
  3. As 2, but instead of creating the corpse-copy without a brain, it is created fully intact, then its brain is destroyed while it is still unconscious. Should that change our anticipated experience? Do we now have a 1/3 chance of dying in the sense that we might not experience waking up? Is there some other relevant sense in which we die, even if it does not affect our anticipated experience?

    I'd say no and no. This scenario is identical to 2 in terms of the relevant information processing that is actually occurring. The corpse-copy will have a brain, but it will never get to use it, so it won't affect my expected anticipated experience in any way. Adding more dead copies doesn't change my anticipated experience either. My best scoring prediction will be that I have 1/2 chance of waking up to see red walls, and 1/2 chance of waking up to see blue walls. 
     

In real life, if you die in the vast majority of branches caused by some event, i.e. that's where the majority of the amplitude is, but you survive in some, the calculation for your anticipated experience would seem to not include the branches where you die for the same reason it doesn't include the dead copies in thought experiments 2 and 3. 

(I think Eliezer may have written about this somewhere as well using pretty similar arguments, maybe in the quantum physics sequence, but I can't find it right now.)

Reply
Mikhail Samin's Shortform
Lucius Bushnaq16d*20

I don't think it proves too much. Informed decision-making comes in degrees, and some domains are just harder? Like, I think my threshold for leaving people free to make their own mistakes if they are the only ones harmed by them is very low, compared to where the human population average seems to be at the moment. But my threshold is, in fact, greater than zero.

For example, there are a bunch of things I think bystanders should generally prevent four year old human children from doing, even if the children insist that they want to do them. I know that stopping four year old children from doing these things will be detrimental in some cases, and that having such policies is degrading to the childrens' agency. I remember what it was like being four years old and feeling miserable because of kindergarten teachers who controlled my day and thought they knew what was best for me. I still think the tradeoff is worth it on net in some cases.

I just think that the suicide thing happens to be a case where doing informed decision-making is maybe just too tough for way too many humans and thus some form of ban could plausibly be worth it on net. Sports betting is another case where I was eventually convinced that maybe a legal ban of some form could be worth it.

Reply
Mikhail Samin's Shortform
Lucius Bushnaq16d*2-9

I think very very many people are not making an informed decision when they decide to commit suicide. 

For example, I think quantum immortality is quite plausibly a thing. Very few people know about quantum immortality and even fewer have seriously thought about it.  This means that almost everyone on the planet might have a very mistaken model of what suicide actually does to their anticipated experience.[1] Also, many people are religious and believe in a pleasant afterlife. Many people considering suicide are mentally ill in a way that compromises their decision making. Many people think transhumanism is impossible and won't arrange for their brain to be frozen for that reason.

I agree that there is some threshold on the fraction of ill-considered suicides relative to total suicides such that suicide should be legal if we were below that threshold. I used to think we were maybe below that threshold. After I began studying physics at uni and so started taking quantum immortality more seriously, I switched to thinking we are maybe above the threshold. 

  1. ^

    You might find yourself in a branch where your suicide attempt failed, but a lot of your body and mind were still destroyed. If you keep exponentially decreasing the amplitude of your anticipated future experience in the universal wave function further, you might eventually find that it is now dominated by contributions from weird places and branches far-off in spacetime or configuration space that were formerly negligible, like aliens simulating you for some negotiation or other purpose. 

    I don't really know yet how to reason well about what exactly the most likely observed outcome would be here. I do expect that by default, without understanding and careful engineering our civilisation doesn't remotely have the capability for yet, it'd tend to be very Not Good. 
     

Reply5321
From SLT to AIT: NN generalisation out-of-distribution
Lucius Bushnaq18d60

Assuming that the bits to parameters encoding can be relaxed, there's some literature about redundant computations in neural networks. If the feature vectors in a weight matrix aren't linearly independent, for example, the same computation can be "spread" over many linearly dependent features, with the result that there are no free parameters but the total amount of computational work is the same.

There's a few other cases like this where we know how various specific forms of simplicity in the computation map onto freedom in the parameters. But those are not enough in this case. We need more freedom than that.

Reply
From SLT to AIT: NN generalisation out-of-distribution
Lucius Bushnaq18d40

If every bit of every weight were somehow used to store one bit of p, excepting those weights used to simulate the UTM, that should suffice to derive the conjecture, yes.[1]

I think that's maybe even harder than what I tried to do though. It's theoretically fine if our scheme is kind of inefficient in terms of how much code it can store in a given number of parameters, so long as the leftover parameter description bits are free to vary.

  1. ^

    There'd be some extra trickiness in that under these definitions, the parameters are technically real numbers and thus have infinity bits of storage capacity, though in real life they're of course actually finite precision floating point numbers.

Reply
Load More
Modularity
3 years ago
(+22/-89)
114From SLT to AIT: NN generalisation out-of-distribution
Ω
21d
Ω
8
72Circuits in Superposition 2: Now with Less Wrong Math
Ω
3mo
Ω
0
47[Paper] Stochastic Parameter Decomposition
Ω
3mo
Ω
15
41Proof idea: SLT to AIT
Ω
7mo
Ω
15
25Can we infer the search space of a local optimiser?
Q
8mo
Q
5
108Attribution-based parameter decomposition
Ω
8mo
Ω
22
150Activation space interpretability may be doomed
Ω
9mo
Ω
35
71Intricacies of Feature Geometry in Large Language Models
10mo
0
45Deep Learning is cheap Solomonoff induction?
10mo
1
131Circuits in Superposition: Compressing many small neural networks into one
Ω
1y
Ω
9
Load More