LESSWRONG
LW

392
Zechen Zhang
13220
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT
Zechen Zhang2y30

This is an interesting point--when we did our causality studies across layers, we also found that the board state features in the middle layers are mostly used causally--not the deep layers. However, the probe accuracy does increase with depth.

I don't know how this translates to the fact that SAEs also find more of these features in the middle layers. Like, the "natural features" in some sense in the last few layers found by the SAEs do not have to contain much information about the board state but just partial information to make the decision. 

Reply
We Need To Know About Continual Learning
Zechen Zhang2y10

Continual learning is an alternative, I would argue, to solve long term planning and agency rather than necessary. Augmented LLMs with long term memory retrieval can do long term planning assuming the model is already powerful enough. Also agency just emerges from the simulator naturally.

I'm not convinced about continual learning as even the most likely path to AGI.

Reply
7Science for the Possible World
3y
0
7New Speaker Series on AI Alignment Starting March 3
4y
1