LESSWRONG
LW

368
Linda Linsefors
2698Ω383483613
Message
Dialogue
Subscribe

Hi, I am a Physicist, an Effective Altruist and AI Safety student/researcher.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
18AI Safety Camp 10 Outputs
2mo
0
72Circuits in Superposition 2: Now with Less Wrong Math
Ω
4mo
Ω
0
25Is the output of the softmax in a single transformer attention head usually winner-takes-all?
Q
9mo
Q
1
36Theory of Change for AI Safety Camp
9mo
3
36We don't want to post again "This might be the last AI Safety Camp"
9mo
17
60Funding Case: AI Safety Camp 11
10mo
4
38AI Safety Camp 10
Ω
1y
Ω
9
17Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025)
Ω
1y
Ω
2
75AISC9 has ended and there will be an AISC10
Ω
1y
Ω
4
46Some costs of superposition
Ω
2y
Ω
11
Load More
3Linda Linsefors's Shortform
Ω
6y
Ω
112
If Anyone Builds It Everyone Dies, a semi-outsider review
Linda Linsefors8h20

I'm supprised that intrumental convergence wasn't covered in the book. I didn't even notice it was left out untill reading this review.

Here's some alternative sources in anyone prefeers text over video:

  • https://www.lesswrong.com/w/instrumental-convergence
  • https://en.wikipedia.org/wiki/Instrumental_convergence
Reply
Linda Linsefors's Shortform
Linda Linsefors22d20

Is there anyone in London who want to colaborate with me on some Comp-in-Sup reserch?

Prerequisits are knowing some Linear Algebra and some Python/Pytorch.

Reply
Mateusz Bagiński's Shortform
Linda Linsefors1mo42

I like it! "SiLT" is also easier to say than "Es El Te"

Reply
Linda Linsefors's Shortform
Linda Linsefors1mo20

Thanks for the link. I have read it but it was long ago, so thanks for the reminder. It's related in a helpfull way.

Reply
Linda Linsefors's Shortform
Linda Linsefors1mo20

I just checked in with myself what the post above was for. I tink its part rant, part me clarifying my thoughs by writing them, and hopefully getting some reflections back. And it's also becasue maybe someone will find it usefull, but that's also maybe secretly about me, to create more conversation partners that track the things I think is important.

If I was writing a proper LW blogpost then [who is this for] should primarlely be the reader.

But in a shortform like this I feel like I'm allowed to do what I want. And also people can take away what they want. Tracking [who is this for] is much more important when people are in a live conversations, becasue that is a more trapped situation, requiring more concideration.

Reply
Linda Linsefors's Shortform
Linda Linsefors1mo20

There are also the type of conversation where the other person pretends that it is about me, but acctually it is about their need to feel like a good person. These situatios are afull and terrible, and I will not play along.

Reply
Linda Linsefors's Shortform
Linda Linsefors1mo30

When I'm in a conversation often track who the conversation is for. I.e. who is this conversation primerely seving, in this moment.

  • If I'm ranting, then this conversation is for me, to let me realsease some tension. I will express my self in a way that feels good to me.
  • If I'm sharing usefull information, then the conversation is for the other person. I will express my self in a way to make the information clear and accessable for them, and also pay attetion to if they even want this information.
  • I can explain myself becasue I need to be seen, or becasue another person wants to understand me. But these are diffrent things.
  • Sometimes more than one person is upset and have needs, and then you have to pick who gets help first. Who ever goes first, now the concersation is for them, untill they feel sufficiently better to swich. And if neithr person can set aside their needs, probably you should not talk right now, or bring in help.

I don't know how freqently or reliably I do this, because it's not deliberate, i.e. I never decided to do this, I just do it sometimes, because [who is this for?] is often a required imputs for my speach generator.

Do you usually track this? In what types of conversations? Do you think other people usually track this?

Reply
Zendo for large groups
Linda Linsefors1mo41

I vaugly remember having played with these rules, with you, more than once. 

Another change (starting from the standard rules) that I think might speed games up, is the ability to spend multiple funding tokens to publish a paper out of turn. But I've only run this once, needing three tokens, and no one took advantage of it.

Reply
the gears to ascenscion's Shortform
Linda Linsefors2mo20

Typo react from me. I think you should call your links something informative. If you think the title of the post is clickbate, you can re-title it something better maybe?

Now I have to click to find out what the link is even about, which is also click-bate-y.

Reply1
Linda Linsefors's Shortform
Linda Linsefors2moΩ240

Estimated MSE loss for three diffrent ways of embedding features into neuons, when there are more possible features than neurons.

I've typed up some math notes for how much MSE loss we should expect for random embedings, and some other alternative embedings, for when you have more features than neurons. I don't have a good sense for how ledgeble this is to anyone but me.

Note that neither of these embedings are optimal. I belive that the optimal embeding for minimising MSE loss is to store the features in almost orthogonal directions, which is similar to random embedings but can be optimised more. But I also belive that MSE loss don't prefeer this solution very much, which means that when there are other tradeofs, MSE loss might not be enough to insentivise superposstion. 

This does not mean we should not expect superpossition in real network.

  1. Many networks uses other loss functions, e.g. cross-entropy.
  2. Even if the loss is MSE on the final output, this does not mean MSE is the right loss for modeling the dynamics in the middle of the network.

 

Setup and notation

  • T features
  • D neurons
  • z active featrues

Assuming: 

  • z≪D<T

True feature values:

  • y = 1               for active featrus
  • y = 0               for inactive features

 

Using random embedding directions (superpossition)

Estimated values:

  • ^y=a + ϵ        where      E[ϵ2]=(z−1)a2/D          for active features
  • ^y=ϵ               where      E[ϵ2]=za2/D                    for active features

Total Mean Squared Error (MSE)

MSErand=z((1−a)2+(z−1)a2D)+(T−z)za2D≈z(1−a)2+zTDa2

This is minimised by 

a=DT+D

Making MSE

MSErand=zTT+D=z(1−DT+D)

 

One feature per neuron

We emebd a single feature in each neuron, and the rest of the features, are just not represented.

Estimated values:

  • ^y=y              for represented features
  • ^y=0              for non represented features

Total Mean Squared Error (MSE)

MSEsingle=zT−DD

 

One neuron per feature

We embed each feature in a single neuron.

  • ^y=a∑y              where the sum is over all feature that shares the same neuron

We assume that the probability of co-activated features on the same neuron is small enough to ignore. We also assume that every neuron is used at least once. Then for any active neuron, the expected number of inactive neurons that will be wrongfully activated, are T−DD, giving us the MSE loss for this case as

MSEmulti=z((1−a)2+(TD−1)a2)

We can already see that this is smaller than MSErand, but let's also calculate what the minimum value is. MSErand is minimised by

a=DT

Making MSE

MSErand=z(1−DT)
Reply
Load More
Comp-In-Sup
22 days ago
(+419)
Outer Alignment
2 years ago
(+9/-80)
Inner Alignment
2 years ago
(+13/-84)