This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
519
Erik Garrison
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Erik Garrison
1y
3
0
Could this affect distributed training that might make the assumption of rotational invariance?
Reply
Could this affect distributed training that might make the assumption of rotational invariance?