Is there anyone in London who want to colaborate with me on some Comp-in-Sup reserch?
Prerequisits are knowing some Linear Algebra and some Python/Pytorch.
I like it! "SiLT" is also easier to say than "Es El Te"
Thanks for the link. I have read it but it was long ago, so thanks for the reminder. It's related in a helpfull way.
I just checked in with myself what the post above was for. I tink its part rant, part me clarifying my thoughs by writing them, and hopefully getting some reflections back. And it's also becasue maybe someone will find it usefull, but that's also maybe secretly about me, to create more conversation partners that track the things I think is important.
If I was writing a proper LW blogpost then [who is this for] should primarlely be the reader.
But in a shortform like this I feel like I'm allowed to do what I want. And also people can take away what they want. Tracking [who is this for] is much more important when people are in a live conversations, becasue that is a more trapped situation, requiring more concideration.
There are also the type of conversation where the other person pretends that it is about me, but acctually it is about their need to feel like a good person. These situatios are afull and terrible, and I will not play along.
When I'm in a conversation often track who the conversation is for. I.e. who is this conversation primerely seving, in this moment.
I don't know how freqently or reliably I do this, because it's not deliberate, i.e. I never decided to do this, I just do it sometimes, because [who is this for?] is often a required imputs for my speach generator.
Do you usually track this? In what types of conversations? Do you think other people usually track this?
I vaugly remember having played with these rules, with you, more than once.
Another change (starting from the standard rules) that I think might speed games up, is the ability to spend multiple funding tokens to publish a paper out of turn. But I've only run this once, needing three tokens, and no one took advantage of it.
Typo react from me. I think you should call your links something informative. If you think the title of the post is clickbate, you can re-title it something better maybe?
Now I have to click to find out what the link is even about, which is also click-bate-y.
I've typed up some math notes for how much MSE loss we should expect for random embedings, and some other alternative embedings, for when you have more features than neurons. I don't have a good sense for how ledgeble this is to anyone but me.
Note that neither of these embedings are optimal. I belive that the optimal embeding for minimising MSE loss is to store the features in almost orthogonal directions, which is similar to random embedings but can be optimised more. But I also belive that MSE loss don't prefeer this solution very much, which means that when there are other tradeofs, MSE loss might not be enough to insentivise superposstion.
This does not mean we should not expect superpossition in real network.
Assuming:
True feature values:
Estimated values:
Total Mean Squared Error (MSE)
This is minimised by
Making MSE
We emebd a single feature in each neuron, and the rest of the features, are just not represented.
Estimated values:
Total Mean Squared Error (MSE)
We embed each feature in a single neuron.
We assume that the probability of co-activated features on the same neuron is small enough to ignore. We also assume that every neuron is used at least once. Then for any active neuron, the expected number of inactive neurons that will be wrongfully activated, are , giving us the MSE loss for this case as
We can already see that this is smaller than , but let's also calculate what the minimum value is. is minimised by
Making MSE
I'm supprised that intrumental convergence wasn't covered in the book. I didn't even notice it was left out untill reading this review.
Here's some alternative sources in anyone prefeers text over video: