LESSWRONG
LW

417
Zenin Easa Panthakkalakath
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
What’s the backward-forward FLOP ratio for Neural Networks?
Zenin Easa Panthakkalakath3yΩ010

I am not sure if the calculation in Appendix B is quite accurate; I would like to ask you for a better explanation if I am not quite right.

In the first line (calculation of 'm'), we can clearly see that there are 4 operations. Now, we could assume that (1-beta1) could be pre-calculated, and hence there are only 3 operations.

If we accept that argument, then in the calculations of 'm_hat' and 'v_hat', should be considered to have only 1 operation each. I do see the transpose there, which is weird to me too; although PyTorch's documentation gives the same set of mathematical equations, the default parameters use scalar values for beta1 and beta2.

I am really trying to make sense of the calculation here, but I really can't. Could you please provide more information on this?

Reply