DebugMyBrain — LessWrong

Hey,
I really like the idea and have been thinking about something similar lately - which is how I found your posts. However, I think it would be interesting to not only look at the inputs/outputs of the LLM, but at both feature activation, and the "dynamics" of them, along a longer input/chain-of-thought.
To me the real problem here, however, would be to define good quantities/observables which you would investigate for equivariance, as this seems very fuzzy and more ill-defined than the nice and simple representation of an image in the hidden layers of a CNN.
Would love to read your thoughts on this, because I really do think, that thinking about this and looking at some toy-models might be a worthwhile endeavour.

Cheers

Alignment from equivariance II - language equivariance as a way of figuring out what an AI "means"

DebugMyBrain1d10

Burdensome Details

DebugMyBrain2mo10

I believe there is a difference between e.g. Kahneman and mesmerized by futurist. The latter believes to have actually gained an argument for the plausibility of the supposed outcome. This is only fully irrational without gain of information about the system, whose future states I am assigning probabilities to.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments