Thank you for this. This is very close to what I was hoping to find!
It looks like Benjamin Hilton makes a rough guess of the proportion of workers dedicated to AI x-risk for each organization. This seems appropriate for assessing a rough % across all organizations, but if we want to nudge organizations to employ more people toward alignment then I think we want to highlight exact figures.
E.g. we want to ask the organizations how many people they have working on alignment and then post what they say - a sort of accountability feedback loop.
You mention the number of people at OpenAI doing alignment work. I think it would be helpful to compile a list of the different labs and the number of people that can be reasonably said to be doing alignment work. Then we could put together a chart of sorts, highlighting this gap.
Highlighting gaps like this is a proven and effective strategy to drive change when dealing with various organizational-level inequities.
If people reading this comment have insight into the number of people at the various labs doing alignment work and/or the total number of people at said labs: please comment here!
Could you elaborate on "For NN Model 1, the belief is encoded in the learned parameters θ∈Θ. For NN Model 2, the belief is encoded in the architecture itself y_"?