Victor Lecomte — LessWrong

Sorry for the late answer! I agree with your assessment of the TMS paper. In our case, the L1 regularization is strong enough that the encodings do completely align with the canonical basis: in the experiments that gave the "Polysemantic neurons vs hidden neurons" graph, we observe that all weights are either 0 or close to 1 or -1. And I think that all solutions which minimize the loss (with L1-regularization included) align with the canonical basis.

Incidental polysemanticity

Victor Lecomte2yΩ341

Thanks for the feedback!

In particular, I think it is likely very sensitive to the implicit assumption that feature i and feature j never co-occur on a single input.

Definitely! I still think that this assumption is fairly realistic because in practice, most pairs of unrelated features would co-occur only very rarely, and I expect the winner-take-all dynamic to dominate most of the time. But I agree that it would be nice to quantify this and test it out.

Overall my expectation would be that without the L1 regularization on activations (and with the training dataset as described in this post), you'd get a complicated mess where every neuron is highly polysemantic, i.e. even more polysemanticity than described in this post. Why is that wrong?

If there is no L1 regularization on activations, then every hidden neuron would indeed be highly "polysemantic" in the sense that it has nonzero weights for each input feature. But on the other hand, the whole encoding space would become rotationally symmetric, and when that's the case it feels like polysemanticity shouldn't be about individual neurons (since the canonical basis is not special anymore) and instead about the angles that different encodings form. In particular, as long as mgen, the space of optimal solutions for this setup requires the encodings to form angles of at least 90° with each other, and it's unclear whether we should call this polysemantic.

So one of the reasons why we need L1 regularization is to break the rotational symmetry and create a privileged basis: that way, it's actually meaningful to ask whether a particular hidden neuron is representing more than one feature.

Incidental polysemanticity

Victor Lecomte2y31

Thank you, it's fixed now!

My views on “doom”

Victor Lecomte2y342

More geometric (but less faithful):

My Assessment of the Chinese AI Safety Community

Victor Lecomte2y*10

I would still like to try and understand, if that's okay. :)

Would you say the following captures some of it?

When you're a kid, altruism/volunteering is what adults / teachers / the government keep telling you "nice kids" do, so it's perceived as something uncool that you need to grow out of, and is only done by people who don't think for themselves and don't realize how the world really works.

My Assessment of the Chinese AI Safety Community

Victor Lecomte2y*10

It sounds like you're skeptical about EA field building because most Chinese people find "changing the world" childishly naive and impractical. Do you think object-level x-risk field building is equally doomed?

For example, if 看理想 (an offshoot of publishing house 理想国 that produces audio programs about culture and society) came out with an audio program about x-risks (say, 1-2 episodes about each of nuclear war, climate change, AI safety, biosecurity), I think the audience would be fairly receptive to it. 梁文道, one of the leaders of 看理想, has shared on his podcast 八分 (at the end of episode 114) that a big part of his worldview is “先天下之忧而忧，后天下之乐而乐” (“worry before the people fear something will happen, and be happy only after the people are happy”), a well-known quote describing ideals of Confucian governance, and which has similarities with EA ideals.

In general, I guess I would have expected Chinese people to be pretty receptive to altruism given the emphasis on contributing (贡献) to the greater good in the party line (e.g. studying the good example of Lei Feng), which gets reflected a lot in media/textbooks/etc. But maybe I spend too much time consuming Chinese media and not enough time talking to actual Chinese people.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments