RM
Message
18
4
The helix is already pretty long, so maybe layernorm is responsible?
E.g. to do position-independent look-back we want the geometry of the embedding to be invariant to some euclidean embedding of the 1D translation group. If you have enough space handy it makes sense for this to be a line. But if you only have a bounded region to work with, and you want to keep the individual position embeddings a certain distance apart, you are forced to "curl" the line up into a more complex representation (screw transformations) because you need the position-embedding cu...
What CCS does conceptually is finds a direction in latent space that distinguishes between true and false statements. That doesn't have to be truth (restricted to stored model knowledge, etc.), and the paper doesn't really test for false positives so it's hard to know how robust the method is. In particular I want to know that the probe doesn't respond to statements that aren't truth-apt. It seems worth brainstorming on properties that true/false statements could mostly share that are not actually truth. A couple examples come to mind.
Would it be possible to set up a wiki or something similar for this? Pedagogy seems easier to crowdsource than hardcore research, and the sequences-style posting here doesn't really lend itself to incremental improvement. I think this project will be unnecessarily restricted by requiring one person to understand the whole theory before putting out any contributions. The few existing expository posts on infrabayes could already be polished and joined into a much better introduction if a collaborative editing format allowed for it. Feedback in the comments could be easily integrated into the documents instead of requiring reading of a large conversation, etc.
People here describe themselves as "pessimistic" about a variety of aspects of AI risk on a very regular basis, so this seems like an isolated demand for rigor.
... (read more)