LESSWRONG
LW

519
Erik Garrison
2010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Erik Garrison1y30

Could this affect distributed training that might make the assumption of rotational invariance?

Reply