In section one, where you define the action-value function, you use R for return where I believe you intended to use G.
In section one, where you define the action-value function, you use R for return where I believe you intended to use G.