The main overlap between Modeling the impact of AI safety field-building programs and the other two posts is the disclaimers, which we believe should be copied in all three posts, and the main QARY definition, which seemed significant enough to add. Beyond that, the intro post is distinct from the two analysis posts.
This post does have much in common with the Cost-effectiveness of student programs for AI safety research. The two post are structured in an incredibly similar manner. That being said, the sections, are doing the same analysis to different sets of programs. As such, the graphs/numbers/conclusions drawn may be different.
It's plausible that we could've dramatically shortened the section "The model" from one of the posts. Ultimately, we did not decide to and instead let the reader decide if they wanted to skip. (This has the added benefit of making each post most self-contained.) However, we could see arguments for the opposing view.
We ask practitioners who have direct experience with these programs for their beliefs as to which research avenues participants pursue before and after the program. Research relevance (before/without, during, or after) is given by the sum product of these probabilities with CAIS’s judgement of the relevance of different research avenues (in the sense defined here).You can find the explicit calculations for workshops at lines 28-81 of this script, and for socials at lines 28-38 of this script.
Using workshop contenders’ research relevance without the program (during and after the program period) as an example:
Clearly, this is far from a perfect process. Hence the strong disclaimer regarding parameter values. In future we would want to survey participants before and after, rather than rely on practitioner intuitions. We are very open to the possibility that better future methods would produce parameter values inconsistent with the current parameter values! Our hope with these posts is to provide a helpful framework for thinking about these programs, not to give confident conclusions.
Finally, it’s worth mentioning that the cost-effectiveness of these programs relative to one another do not rely very heavily on conversions. You can see this by reading off cost-effectiveness from the change in research relevance here. Further, research avenue relevance treatment effects across programs (excluding engineers submitting to the TDC, where we can be more confident) differ by a factor of ~2, whereas differences in e.g. cost per participant are ~20x and average scientist-equivalence are ~7x.
Yup! The bounty is still ongoing, now funded by a different source. We have been awarding prizes throughout the duration of the bounty and will post an update in January detailing the results.
Don't have a concrete definition off the top of my head, but I can try to give you a sense of what we're thinking about. "Alignment theory" for us refers to the class of work which is more "reason about alignment from first principles" rather than running actual experiments. (Happy to have a discussion on why this is our focus if the discussion would be useful?)
Examples: Risks from learned optimization, inaccessible information, most posts in Evan's list of research artifacts.