Addendum: More Efficient FFNs via Attention — LessWrong