LESSWRONG
LW

450
Kibidango
2Ω2010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Iterated Distillation and Amplification
Kibidango7yΩ230
AGZ’s policy network p is the learned model.

I found this bit slightly confusing. As far as I understand from the AGZ Nature paper, AGZ does not have a separate policy network p, but uses a single network fθ which outputs both the learned policy p and the estimated probability v that the current player will win the game. Is this what the sentence is referring to?

Reply