Hi, I am a Physicist, an Effective Altruist and AI Safety student/researcher.

Typo react from me. I think you should call your links something informative. If you think the title of the post is clickbate, you can re-title it something better maybe?

Now I have to click to find out what the link is even about, which is also click-bate-y.

Estimated MSE loss for three diffrent ways of embedding features into neuons, when there are more possible features than neurons.

I've typed up some math notes for how much MSE loss we should expect for random embedings, and some other alternative embedings, for when you have more features than neurons. I don't have a good sense for how ledgeble this is to anyone but me.

Note that neither of these embedings are optimal. I belive that the optimal embeding for minimising MSE loss is to store the features in almost orthogonal directions, which is similar to random embedings but can be optimised more. But I also belive that MSE loss don't prefeer this solution very much, which means that when there are other tradeofs, MSE loss might not be enough to insentivise superposstion.

This does not mean we should not expect superpossition in real network.

Many networks uses other loss functions, e.g. cross-entropy.
Even if the loss is MSE on the final output, this does not mean MSE is the right loss for modeling the dynamics in the middle of the network.

Setup and notation

features
$D$ neurons
$z$ active featrues

Assuming:

$z ≪ D < T$

True feature values:

$y$ = 1 for active featrus
$y$ = 0 for inactive features

Using random embedding directions (superpossition)

Estimated values:

$^y = a$ + $ϵ$ where $E [ϵ^{2}] = (z - 1) a^{2} / D$ for active features
$^y = ϵ$ where $E [ϵ^{2}] = z a^{2} / D$ for active features

Total Mean Squared Error (MSE)

M S E_{r a n d} = z ((1 - a)^{2} + (z - 1) \frac{a^{2}}{D}) + (T - z) z \frac{a^{2}}{D} \approx z (1 - a)^{2} + z \frac{T}{D} a^{2}

This is minimised by

a = \frac{D}{T + D}

Making MSE

M S E_{r a n d} = z \frac{T}{T + D} = z (1 - \frac{D}{T + D})

One feature per neuron

We emebd a single feature in each neuron, and the rest of the features, are just not represented.

Estimated values:

$^y = y$ for represented features
$^y = 0$ for non represented features

Total Mean Squared Error (MSE)

M S E_{s i n g l e} = z \frac{T - D}{D}

One neuron per feature

We embed each feature in a single neuron.

$^y = a \sum y$ where the sum is over all feature that shares the same neuron

We assume that the probability of co-activated features on the same neuron is small enough to ignore. We also assume that every neuron is used at least once. Then for any active neuron, the expected number of inactive neurons that will be wrongfully activated, are $\frac{T - D}{D}$ , giving us the MSE loss for this case as

M S E_{m u l t i} = z ((1 - a)^{2} + (\frac{T}{D} - 1) a^{2})

We can already see that this is smaller than $M S E_{r a n d}$ , but let's also calculate what the minimum value is. $M S E_{r a n d}$ is minimised by

a = \frac{D}{T}

Making MSE

M S E_{r a n d} = z (1 - \frac{D}{T})

I'm not sure that answered your question, but maybe you can ask a more specific one now.

The thing I was after was, what is the actual concreet causal chain from rationality training to you getting better at debuging.

I currently think the answer is that the rationality training made you motivated, and that was the missing part that stopped you from getting better before. Let me know if you think I'm missing something important.

Interesting. Reading your comment makes me notice that I'm more motivated to learn object level skills than meta level skills.

"meta level" != "rationality.
E.g. I would count most of the CFAR curiculum as object level skills. But the traingin you're working on seems more meta level skills.

I expect motivation to be super central for what leanring methods works. There has been a number of posts on ACX about school (including 2 that are part of the reveiw contest). The common theme is that the main bottleneck is students motivation.

I didn't improve much at debugging until I got generally serious about rationality training.

Can you expand on this please?

NSFW question

How do you maintain breath control on someone who is paniking.

I've tried a bit of hoding someones mouth and nose, from both sides of the experience, and haven't figured out a way that acctually stops the person from breathing if they try hard enough.

No, I don't think what you say maches my experience. My anxiety was pointing straight at the thing I needed. Although I acknolage I did not put forward enough details for thus to be clear to you.

But it did not tell me how much I would need exactly. So it's more like your hungry, and you eat some, and notice that you're still hungry, and then start to wonder if eating is actually what you need, or this hunger feeling is about something else.

I don't know what you mean by "generic safery net" or "safety in the literal sense". I assumed based on context that we're not talking about physical safety.

I mean things like: I'm not lonely and I expect to continue not to be lonely, because I found people I like who reliably also want me around.

I don't know what is true for the typical person, and I'm definatly not a typical person.

With those caviats, what you describe is not true for me. To feel ok, I need to have a handfull of close friends that I see regularly. This provides some sort of validation, among other values. If I have this, my social anxiety is low. If I don't have this my anxiety is high, and causes lot's of problems.

It might look like my anxiety was recistant to be cured by more safety, because it took me a long time to find the people I need. Before I found people of my approximat neurotype, I was so far from being ok, that it was unclear to me that the thing I could clearly feel I was missing, was something that could exist.

And it's not the case that the further from the safe situation I am, the more anxiety I feel. It's more like a step function.

Also, sometimes the anxiety need some time to fully update on a new situation. This looks like the anxiety comming back. And then I focus on the evidence that things are acctually ok, or ask for some help to do this, and then it goes away. This does not work if things are not acctually ok.

I can see how this could look like anxiety is conserved, over a lot of diffrent datapoints, and I don't know how someone can tell the diffrence untill they have experienced sufficient safety.

Maybe the reason people stick to what they are good at, is not lack of motivation to explore, but lack of safety net to explore. This seems to explain all your observations, if you assume most people are much more anxious than you. In this case, what other people need to grow is more acceptance in their life, not more pushing.

I disagree that it's hard, in the relevant context.

It's hard to communicate this to someone who don't have a distinction between the two concepts in their head. It's also hard to communicate this with someone who are two quick to jump to conclutions regarding what you mean to say, and also have bad priors about you. This is enough of a problem, that I don't recommend offering decernments to people you don't know well. But that's also kind of a mute point, since I think it's bad to offer unsolicited advice to people you don't know well, for other reasons.

But with someone like a romantic parner, or a close friend, with whom you'd have lots of long form conversation, I don't think it's hard.

You can infact just say: "I love you as you are, and among the things I love about you is the desire to grow stronger. I've noticed a way you could be stronger, do you want to hear it now or later?"

Or if you have extablished the words "desernment vs judgment" you can just pre-prease any suggestion for imporvment with "desernment". Or what ever communication style works for you.

Later into the relationship, you might not even have to clarify, but the person will just have the correct prior that you're expressing a desernment, and not a judgment.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments