Bary Levy

Wiki Contributions

Comments

Cross layer superposition

 

Had a bit of time to think about this. Ultimately because superposition as we know it is a property of the latent space rather than the neurons in the layer, it's not clear to me that this is the question to be asking. How do you imagine an experimental result would look like?

I want to generally encourage this kind of experiment-and-publish-quickly project. This might require a post of its own, but as someone with a background in both hacking and entrepreneurship, this kind of quick feedback loop is, in my opinion, an incredible strength of both, and I hope can be used to accelerate scientific progress, which is exactly what we need in alignment.

Might also be interesting to look at this from a Learned Helplessness point of view. Especially with helicopter parenting. Perhaps children aren't learning to solve their own problems independenly. I wouldn't be surprised if this contributes to the mental health epidemic.

A factor for why children are becoming less independent in the US might be car-centric city design. With unsafe streets, and no way to walk to school, friends or after-school activities, parents have no choice but to drive them around. Not Just Bikes has a great video on this

https://youtu.be/oHlpmxLTxpw

I've seen in the term "AI Explainability" floating around in the mainstream ML community. Is there a major difference between that and what we in the AI Safety community call Interpretability?