Posts

Sorted by New

Wiki Contributions

Comments

KanHar1y50

Considering the fact that weight sharing between actor and critic networks is a common practice, and given the fact that the critic passes gradients (learned from the reward) to the actor, for most practical purposes the actor gets all of the information it needs about the reward.

This is the case for many common architectures.