I'm intrigued by the distinction between the policy function and the program implementing it, as their structures seem different.
For example, a policy function might be said to have the structure of mapping all inputs to a single output, and the program that implements it is a Python program that uses a dictionary. Does this distinction matter?
When we talk about agent structure, I'd imagine that we care both about the structure of the actions that the agent takes and the structure of the computation the agent does to decide on the actions.
It is hard to pinpoint motivation here. If you are a top researcher at a top lab working on alignment and you disagree with something within the company, I see two categories of options you can take to try to fix things
Jan and Ilya left but haven't said much about how they lost confidence in OpenAI. I expect we will see them making more damning statements about OpenAI in the future
Or is there a possible motivation I'm missing here?