Hi, I am a Physicist, an Effective Altruist and AI Safety student/researcher.

I'm supprised that intrumental convergence wasn't covered in the book. I didn't even notice it was left out untill reading this review.

Here's some alternative sources in anyone prefeers text over video:

Is there anyone in London who want to colaborate with me on some Comp-in-Sup reserch?

Prerequisits are knowing some Linear Algebra and some Python/Pytorch.

I like it! "SiLT" is also easier to say than "Es El Te"

Thanks for the link. I have read it but it was long ago, so thanks for the reminder. It's related in a helpfull way.

I just checked in with myself what the post above was for. I tink its part rant, part me clarifying my thoughs by writing them, and hopefully getting some reflections back. And it's also becasue maybe someone will find it usefull, but that's also maybe secretly about me, to create more conversation partners that track the things I think is important.

If I was writing a proper LW blogpost then [who is this for] should primarlely be the reader.

But in a shortform like this I feel like I'm allowed to do what I want. And also people can take away what they want. Tracking [who is this for] is much more important when people are in a live conversations, becasue that is a more trapped situation, requiring more concideration.

There are also the type of conversation where the other person pretends that it is about me, but acctually it is about their need to feel like a good person. These situatios are afull and terrible, and I will not play along.

When I'm in a conversation often track who the conversation is for. I.e. who is this conversation primerely seving, in this moment.

If I'm ranting, then this conversation is for me, to let me realsease some tension. I will express my self in a way that feels good to me.
If I'm sharing usefull information, then the conversation is for the other person. I will express my self in a way to make the information clear and accessable for them, and also pay attetion to if they even want this information.
I can explain myself becasue I need to be seen, or becasue another person wants to understand me. But these are diffrent things.
Sometimes more than one person is upset and have needs, and then you have to pick who gets help first. Who ever goes first, now the concersation is for them, untill they feel sufficiently better to swich. And if neithr person can set aside their needs, probably you should not talk right now, or bring in help.

I don't know how freqently or reliably I do this, because it's not deliberate, i.e. I never decided to do this, I just do it sometimes, because [who is this for?] is often a required imputs for my speach generator.

Do you usually track this? In what types of conversations? Do you think other people usually track this?

I vaugly remember having played with these rules, with you, more than once.

Another change (starting from the standard rules) that I think might speed games up, is the ability to spend multiple funding tokens to publish a paper out of turn. But I've only run this once, needing three tokens, and no one took advantage of it.

Typo react from me. I think you should call your links something informative. If you think the title of the post is clickbate, you can re-title it something better maybe?

Now I have to click to find out what the link is even about, which is also click-bate-y.

Estimated MSE loss for three diffrent ways of embedding features into neuons, when there are more possible features than neurons.

I've typed up some math notes for how much MSE loss we should expect for random embedings, and some other alternative embedings, for when you have more features than neurons. I don't have a good sense for how ledgeble this is to anyone but me.

Note that neither of these embedings are optimal. I belive that the optimal embeding for minimising MSE loss is to store the features in almost orthogonal directions, which is similar to random embedings but can be optimised more. But I also belive that MSE loss don't prefeer this solution very much, which means that when there are other tradeofs, MSE loss might not be enough to insentivise superposstion.

This does not mean we should not expect superpossition in real network.

Many networks uses other loss functions, e.g. cross-entropy.
Even if the loss is MSE on the final output, this does not mean MSE is the right loss for modeling the dynamics in the middle of the network.

Setup and notation

features
$D$ neurons
$z$ active featrues

Assuming:

$z ≪ D < T$

True feature values:

$y$ = 1 for active featrus
$y$ = 0 for inactive features

Using random embedding directions (superpossition)

Estimated values:

$^y = a$ + $ϵ$ where $E [ϵ^{2}] = (z - 1) a^{2} / D$ for active features
$^y = ϵ$ where $E [ϵ^{2}] = z a^{2} / D$ for active features

Total Mean Squared Error (MSE)

M S E_{r a n d} = z ((1 - a)^{2} + (z - 1) \frac{a^{2}}{D}) + (T - z) z \frac{a^{2}}{D} \approx z (1 - a)^{2} + z \frac{T}{D} a^{2}

This is minimised by

a = \frac{D}{T + D}

Making MSE

M S E_{r a n d} = z \frac{T}{T + D} = z (1 - \frac{D}{T + D})

One feature per neuron

We emebd a single feature in each neuron, and the rest of the features, are just not represented.

Estimated values:

$^y = y$ for represented features
$^y = 0$ for non represented features

Total Mean Squared Error (MSE)

M S E_{s i n g l e} = z \frac{T - D}{D}

One neuron per feature

We embed each feature in a single neuron.

$^y = a \sum y$ where the sum is over all feature that shares the same neuron

We assume that the probability of co-activated features on the same neuron is small enough to ignore. We also assume that every neuron is used at least once. Then for any active neuron, the expected number of inactive neurons that will be wrongfully activated, are $\frac{T - D}{D}$ , giving us the MSE loss for this case as

M S E_{m u l t i} = z ((1 - a)^{2} + (\frac{T}{D} - 1) a^{2})

We can already see that this is smaller than $M S E_{r a n d}$ , but let's also calculate what the minimum value is. $M S E_{r a n d}$ is minimised by

a = \frac{D}{T}

Making MSE

M S E_{r a n d} = z (1 - \frac{D}{T})

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Estimated MSE loss for three diffrent ways of embedding features into neuons, when there are more possible features than neurons.

Setup and notation

Using random embedding directions (superpossition)

One feature per neuron

One neuron per feature