LESSWRONG
LW

900
Moughees Ahmed
1020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Transformers Represent Belief State Geometry in their Residual Stream
Moughees Ahmed1y10

Excited to see what you come up with! 

Plausibly, one could think that if a model, trained on the entirety of human output, should be able to decipher more hidden states - ones that are not obvious to us - but might be obvious in latent space. It could mean that models might be super good at augmenting our existing understanding of fields but might not create new ones from scratch. 

Reply
Transformers Represent Belief State Geometry in their Residual Stream
Moughees Ahmed1y20

This might be an adjacent question but assuming this is true and comprehensively explains the belief updating process. What does it say, if anything, about whether transformers can produce new (undiscovered) knowledge/states? If they can't observe a novel state - something that doesn't exist in the data - can they never discover new knowledge on their own?

Reply