LESSWRONG
LW

Bary Levy

Posts

Sorted by New

15ChatGPT Plugins - The Beginning of the End

Wiki Contributions

Comments

A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team

Bary Levy4d10

Cross layer superposition

Had a bit of time to think about this. Ultimately because superposition as we know it is a property of the latent space rather than the neurons in the layer, it's not clear to me that this is the question to be asking. How do you imagine an experimental result would look like?

Tiny Mech Interp Projects: Emergent Positional Embeddings of Words

Bary Levy1y45

I want to generally encourage this kind of experiment-and-publish-quickly project. This might require a post of its own, but as someone with a background in both hacking and entrepreneurship, this kind of quick feedback loop is, in my opinion, an incredible strength of both, and I hope can be used to accelerate scientific progress, which is exactly what we need in alignment.

Spectrum of Independence

Bary Levy2y10

Might also be interesting to look at this from a Learned Helplessness point of view. Especially with helicopter parenting. Perhaps children aren't learning to solve their own problems independenly. I wouldn't be surprised if this contributes to the mental health epidemic.

Spectrum of Independence

Bary Levy2y33

A factor for why children are becoming less independent in the US might be car-centric city design. With unsafe streets, and no way to walk to school, friends or after-school activities, parents have no choice but to drive them around. Not Just Bikes has a great video on this

https://youtu.be/oHlpmxLTxpw

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Bary Levy2y32

I've seen in the term "AI Explainability" floating around in the mainstream ML community. Is there a major difference between that and what we in the AI Safety community call Interpretability?