x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Erik Garrison — LessWrong
Erik Garrison
Erik Garrison
Subscribe
Message
2
1
6y
All
⚙
Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Erik Garrison
2y
3
0
Could this affect distributed training that might make the assumption of rotational invariance?
Reply
Could this affect distributed training that might make the assumption of rotational invariance?