LESSWRONG
LW

1479
DebugMyBrain
0020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Alignment from equivariance II - language equivariance as a way of figuring out what an AI "means"
DebugMyBrain1d10

Hey, 
I really like the idea and have been thinking about something similar lately - which is how I found your posts. However, I think it would be interesting to not only look at the inputs/outputs of the LLM, but at both feature activation, and the "dynamics" of them, along a longer input/chain-of-thought. 
To me the real problem here, however, would be to define good quantities/observables which you would investigate for equivariance, as this seems very fuzzy and more ill-defined than the nice and simple representation of an image in the hidden layers of a CNN.
Would love to read your thoughts on this, because I really do think, that thinking about this and looking at some toy-models might be a worthwhile endeavour.

Cheers

Reply
Burdensome Details
DebugMyBrain2mo10

I believe there is a difference between e.g.  Kahneman and mesmerized by futurist. The latter believes to have actually gained an argument for the plausibility of the supposed outcome. This is only fully irrational without gain of information about the system, whose future states I am assigning probabilities to. 

Reply