No Really, Attention is ALL You Need - Attention can do feedforward networks — LessWrong